# The Role of Perception in Defining Tonal Targets and their Alignment

Abstract : Tonal targets can be defined in terms of two-dimensions, i.e., alignment'' and scaling'', where alignment specifies the exact temporal implementation of tonal highs (H) and lows (L) relative to structural elements (such as syllables and morae) and their segments. Alignment patterns might be constrained by various linguistic factors, such as phonological as well as phonetic factors. Among the phonological factors, the grammar of stress-accent languages specifies that the tones of a pitch accent must be aligned with those syllables that are marked as stressed in the lexicon. Moreover, syllable structure can constrain tune-text alignment. For instance, in Neapolitan Italian, the peak of a LH rising accent occurs closer to the offset of the stressed vowel when the vowel is in a closed syllable, and therefore short. Among the phonetic constraints, one finds facts about the perception of pitch and time, both for speech and for non-speech stimuli. This work investigates the role of alignment in determining tonal target perception for yes/no question and (narrow focus) statement contours in Neapolitan Italian. These contours are characterized by a melodic rise-fall, analyzed here as a sequence of a LH pitch accent plus a HL phrase tone. The separation of the rise and the fall is clear in the case of long focus constituents containing at least two words with independently stressed syllables. In more typical cases, however, this configuration is acoustically realized as a sequence of three tonal targets, LHL, due to merging'' of the H tone sequence in nuclear position. This study shows that the precise alignment of each of those tonal events influences the perception of the question/statement contrast. A read speech corpora, produced by two speakers of Neapolitan Italian, was first analyzed to acoustically characterize tonal targets in both yes/no questions and narrow focus statements, with target words differing in syllable structure and segmental environment. Later, a set of resynthesized stimuli was created, which constituted the basis for the perception experiments. Results show that, when tonal targets for the entire rise-fall are displaced later in time, more questions are identified. The results also suggest that F0 height has a minor role in signaling pitch accent differences, while rise and fall slope have no impact. Additionally, when the shape of the peak in the rise-fall is modified, so that a high plateau is created, more questions are perceived. This phenomenon cannot be accounted for in terms of a parsing difference between the question and the statement phonological tone structures, since those structures are the same. Moreover, the effect was also found for non-native listeners. Namely, American English listeners showed an effect of peak shape, as well as a similar use of the alignment contrast as a consequence of alignment modifications, when identifying questions vs. statements of Neapolitan. This result suggests a universal use of alignment and a psychoacoustic effect of perceived target displacement due to peak shape. Hence, despite acoustic and pragmatic differences between their rise-fall contrasts, American and Neapolitan listeners appear to employ similar perceptual strategies. The Neapolitan results for the syllable structure manipulation are difficult to interpret. While, on the one hand, the manipulation was not able to shift the crossover boundary between questions and statements, on the other hand the response curves for the open and closed syllable continua for the statement modality were significantly different. The results suggest that no look-ahead mechanism is employed when computing perceived target location. That is, question and statement tonal targets are computed relative to the left edge of the stressed syllable, so that stressed vowel duration (which is shorter in closed syllables) has no effect. A clear category boundary shift was found when stimuli were resynthesized from either a question base or a declarative base utterance. This suggests that cues other than target alignment are employed when computing perceived pitch accent contrast. In sum, this work proposes that temporal alignment, both as a production and a perception mechanism, must shape phonological systems of intonational contrast, both within and across languages.
Invited Talk, 2001, Department of Linguistics, University of Saarbruecken, Germany
