Speech Perception Laboratory
     Lab Publications
     Publication Abstracts
     Conference Abstracts
 
Psychology Department
NU Home

Abstracts of Representative Publications

Theodore, R.M., Miller, J.L., and DeSteno, D. (2009). Individual talker differences in voice-onset-time: Contextual influences. Journal of the Acoustical Society of America, 125, 3974-3982.

Previous research indicates that talkers differ in phonetically relevant properties of speech, including voice-onset-time (VOT) in word-initial stop consonants; some talkers have characteristically shorter VOTs than others. Previous research also indicates that VOT is robustly affected by contextual influences, including speaking rate and place of articulation. This paper examines whether these contextual influences on VOT are themselves talker-specific. Many tokens of alveolar /ti/ (Experiment 1) or labial /pi/ and velar /ki/ (Experiment 2) were elicited from talkers across a range of rates. VOT and vowel duration (a metric of rate) were measured for each token. Hierarchical linear modeling analyses showed that: (1) VOT increased as rate slowed for all talkers, but the magnitude of the increase varied significantly across talkers; thus the effect of rate on VOT was talker-specific; (2) the talker-specific effect of rate was stable across a change in place of articulation; and (3) for all talkers VOTs were shorter for labial than velar stops, and there was no significant variability in the magnitude of this displacement across talkers; thus the effect of place on VOT was not talker-specific. The implications of these findings for how listeners might accommodate talker differences in VOT during speech perception are discussed.


Schwab, S., Miller, J.L., Grosjean, F., and Mondini, M. (2008). Effect of speaking rate on the identification of word boundaries. Phonetica, 65, 173-186

Two experiments were conducted to determine whether listeners' ability to use allophonic variation to identify word boundaries is influenced by speaking rate.  Listeners in both experiments were presented two-word sequences (such as great eyes) spoken by naturally fast and naturally slow talkers; in one experiment the sequences were presented in quiet and in the other they were presented in noise.  The listeners' task was to identify the intended sequence from among four choices with alternative segmentations (e.g., great eyes, gray ties, great ties, gray eyes).  In both experiments performance was worse for the sequences produced by naturally fast talkers than for those produced by the naturally slow talkers.  This finding suggests that the extent to which allophonic variation contributes to the identification of word boundaries may depend on the rate at which the speech was produced.


Brancazio, L. and Miller, J.L. (2005). Use of visual information in speech perception: Evidence for a visual rate effect both with and without a McGurk effect. Perception & Psychophysics, 67, 759-769.

The McGurk effect, where an incongruent visual syllable influences identification of an auditory syllable, does not always occur, suggesting that perceivers sometimes fail to use relevant visual phonetic information. We tested whether another visual phonetic effect, which involves the influence of visual speaking rate on perceived voicing (Green & Miller, 1985), would occur in instances when the McGurk effect does not. In Experiment 1, we established this visual rate effect using auditory and visual stimuli matching in place of articulation, finding a shift in the voicing boundary along an auditory voice-onset-time continuum with fast versus slow visual speech tokens. In Experiment 2, we used auditory and visual stimuli differing in place of articulation, and found a shift in the voicing boundary due to visual rate when the McGurk effect occurred and, more critically, when it did not. The latter finding indicates that phonetically relevant visual information is used in speech perception even when the McGurk effect does not occur, suggesting that the incidence of the McGurk effect underestimates the extent of audio-visual integration.


Allen, J.S. and Miller, J.L. (2004). Listener sensitivity to individual talker differences in voice-onset-time. Journal of the Acoustical Society of America, 115, 3171-3183.

Recent findings in the domains of word and talker recognition reveal that listeners use previous experience with an individual talker's voice to facilitate subsequent perceptual processing of that talker's speech. These findings raise the possibility that listeners are sensitive to talker-specific acoustic-phonetic properties. The present study tested this possibility directly by examining listeners' sensitivity to talker differences in the voice-onset-time (VOT) associated with a word-initial voiceless stop consonant. Listeners were trained on the speech of two talkers. Speech synthesis was used to manipulate the VOTs of these talkers so that one had short VOTs and the other had long VOTs (counterbalanced across listeners). The results of two experiments using a paired-comparison task revealed that, when presented with a short- versus long-VOT variant of a given talkers speech, listeners could select the variant consistent with their experience of that talker's speech during training. This was true when listeners were tested on the same word heard during training and when they were tested on a different word spoken by the same talker, indicating that listeners generalized talker-specific VOT information to a novel word. Such sensitivity to talker-specific acoustic-phonetic properties may subserve at least in part listeners' capacity to benefit from talker-specific experience.


Brancazio, L., Miller, J.L., and Paré, M.A. (2003). Visual influences on the internal structure of phonetic categories. Perception & Psychophysics, 65, 591-601.

Previous work has demonstrated that the graded internal structure of phonetic categories is sensitive to a variety of contextual factors. One such factor is place of articulation: The best exemplars of voiceless stop consonants along auditory bilabial and velar voice onset time (VOT) continua occur over different ranges of VOTs (Volaitis & Miller, 1992). In the present study, we exploited the McGurk effect to examine whether visual information for place of articulation also shifts the best-exemplar range for voiceless consonants, following Green and Kuhl's (1989) demonstration of effects of visual place of articulation on the location of voicing boundaries. In Experiment 1, we established that /p/ and /t/ have different best-exemplar ranges along auditory bilabial and alveolar VOT continua. We then found, in Experiment 2, a similar shift in the best-exemplar range for /t/ relative to that for /p/ when there was a change in visual place of articulation, with auditory place of articulation held constant. These findings indicate that the perceptual mechanisms that determine internal phonetic category structure are sensitive to visual, as well as auditory, information.


Allen, J.S., Miller, J.L., and DeSteno, D. (2003). Individual talker differences in voice-onset-time. Journal of the Acoustical Society of America, 113, 544-552.

Individual talkers differ in the acoustic properties of their speech, and at least some of these differences are in acoustic properties relevant for phonetic perception. Recent findings from studies of speech perception have shown that listeners can exploit such differences to facilitate both the recognition of talkers' voices and the recognition of words spoken by familiar talkers. These findings motivate the current study, whose aim is to examine individual talker variation in a particular phonetically-relevant acoustic property, voice-onset-time (VOT). VOT is a temporal property that robustly specifies voicing in stop consonants. From the broad literature involving VOT, it appears that individual talkers differ from one another in their VOT productions. The current study confirmed this finding for eight talkers producing monosyllabic words beginning with voiceless stop consonants. Moreover, when differences in VOT due to variability in speaking rate across the talkers were factored out using hierarchical linear modeling, individual talkers still differed from one another in VOT, though these differences were attenuated. These findings provide evidence that VOT varies systematically from talker to talker and may therefore be one phonetically-relevant acoustic property underlying listeners' capacity to benefit from talker-specific experience.


Bürki-Cohen, J., Miller, J.L., and Eimas, P.D. (2001). Perceiving nonnative speech. Language and Speech, 44, 149-169.

In a series of experiments using monosyllabic words produced by a native and a non-native speaker of English, native English speakers monitored the word-initial consonants of the words to decide which of two consonants was present on each trial. In some of the experiments, a secondary task of a linguistic nature, deciding whether the target-bearing word was a noun or verb, was also required. When the words were presented in silence, the native and nonnative stimuli were processed in a like manner. Specifically, when the secondary task was not required, phonemic decisions tended to be made on the basis of prelexical information, whereas when the secondary task was required, they tended to be made on the basis of postlexical information (see Eimas, Marcovitz, Honstein, & Payton, 1990). However, when the listening conditions were degraded by presenting the words at a lower level and in noise, the two types of stimuli yielded different patterns. Native speech was processed as before, whereas for nonnative speech phonemic decisions now tended to be made on the basis of postlexical information both when a secondary task was required and when it was not. The contrasting results for native and non-native speech are discussed in terms of models of phoneme processing.


Allen, J.S., and Miller, J.L. (2001). Contextual influences on the internal structure of phonetic categories: A distinction between lexical status and speaking rate. Perception & Psychophysics, 63, 798-810.

Previous research has shown that phonetic categories have a graded internal structure that is highly dependent on acoustic-phonetic contextual factors, such as speaking rate; these factors alter not only the location of phonetic category boundaries, but also the location of a category's best exemplars. The purpose of the present investigation, which focused on the voiceless category as specified by voice onset time (VOT), was to determine whether a higher-order linguistic contextual factor, lexical status, which is known to alter the location of the voiced-voiceless phonetic category boundary, also alters the location of the best exemplars of the voiceless category. The results indicated that lexical status has a more limited and qualitatively different effect on the category's best exemplars than does the acoustic-phonetic factor of speaking rate. This dissociation is discussed in terms of a production-based account in which perceived best exemplars of a category track contextual variation in speech production.


Miller, J.L. (2001). Mapping from acoustic signal to phonetic category: Internal category structure, context effects and speeded categorisation. Language and Cognitive Processes, 16, 683-690.

The early stages of speech perception are often characterised in terms of a perceptual mapping between acoustic signal and prelexical phonetic representations. Although early research focused on the abstract categorical nature of these representations, more recent findings have shown that phonetic categories are internally structured in a graded fashion, with some members of the category perceived as better exemplars than others (e.g., Kuhl, 1991; Miller, 1994; Oden & Massaro, 1978; Samuel, 1982). In this paper I present highlights from two recent investigations in our laboratory that examine this structure and its role in processing. The first focuses on how different types of contextual factors influence internal category structure and the second focuses on the role of internal category structure in speeded phonetic categorisation.


Allen, J.S., and Miller, J.L. (1999). Effects of syllable-initial voicing and speaking rate on the temporal characteristics of monosyllabic words. Journal of the Acoustical Society of America, 106, 2031-2039.

Two speech production experiments tested the validity of the traditional method of creating voice-onset-time (VOT) continua for perceptual studies in which the systematic increase in VOT across the continuum is accompanied by a concomitant decrease in the duration of the following vowel. In Experiment 1, segmental durations were measured for matched monosyllabic words beginning with either a voiced stop (e.g., big, duck, gap) or a voiceless stop (e.g., pig, tuck, cap). Results from four talkers showed that the change from voiced to voiceless stop produced not only an increase in VOT, but also a decrease in vowel duration. However, the decrease in vowel duration was consistently less than the increase in VOT. In Experiment 2, results from four new talkers replicated these findings at two rates of speech, as well as highlighted the contrasting temporal effects on vowel duration of an increase in VOT due to a change in syllable-initial voicing versus a change in speaking rate. It was concluded that the traditional method of creating VOT continua for perceptual experiments, although not perfect, approximates natural speech by capturing the basic trade-off between VOT and vowel duration in syllable-initial voiced versus voiceless stop consonants.


Miller, J.L., and Grosjean, F. (1997). Dialect effects in vowel perception: The role of temporal information in French. Language and Speech, 40, 277-288.

The importance of vowel duration for specifying vowel contrasts differs across languages. In English, for example, a number of vowel pairs are acoustically differentiated by both temporal and spectral information, whereas in standard French temporal information plays a much more minor role. Gottfried and Beddor (1988) reported that the effectiveness of vowel duration in perception varies accordingly: For native speakers of English, but not native speakers of standard French, a change in vowel duration affected the perceptual identity of a vowel contrast. We tested the hypothesis that the relative prominence of vowel duration within different dialects of a given language also has perceptual consequences. Vowel duration plays a much more important role in the phonological system of Swiss French than standard French. Given this, we predicted that native speakers of Swiss French, unlike native speakers of standard French, would use temporal information when identifying vowels. Our prediction was confirmed. These findings indicate that just as there are cross-language differences in fundamental aspects of speech perception, so too are there cross-dialect differences, and they support the view that the perceptual mapping between acoustic signal and vowel category is sensitive to global aspects of the listener's phonological system.


Hodgson, P., and Miller, J.L. (1996). Internal structure of phonetic categories: Evidence for within-category trading relations. Journal of the Acoustical Society of America, 100, 565-576.

Phonetically relevant acoustic properties perceptually trade against each other at phonetic category boundaries. The present investigation used a category goodness judgment task to examine whether such properties also trade against each other in specifying the best exemplars of phonetic categories. The experiments focused on the say-stay distinction, specified by F1 onset frequency and silence duration preceding F1 onset. The main experiment demonstrated a within-category trading relation, such that as F1 onset frequency increased from 230 to 430 Hz a longer silence was required for stimuli to be judged the best exemplars of stay. Two follow-up experiments explored the robustness of this effect. Taken together, the findings underscore the importance of multiple acoustic properties in specifying the internal structure of phonetic categories.


Miller, J.L., and Eimas, P.D. (1996). Internal structure of voicing categories in early infancy. Perception & Psychophysics, 58, 1157-1167.

It is well established that young infants process speech in terms of perceptual categories that closely correspond to the phonetic categories of adult language users. Recently, Kuhl (1991) has provided evidence that this correspondence is not limited to the region of category boundaries: At least by 6-7 months of age, vowel categories of infants, like those of adults, have an internal perceptual structure. In the current experiments, which focused on a consonantal contrast, we found evidence of internally structured categories in even younger infants -- 3-4 months of age. The implications of these findings for the nature of the infants's earliest language-universal categories are discussed, as is the role of exposure to the native language in shaping these categories over the course of development.


Miller, J.L. (1994). On the internal structure of phonetic categories: a progress report. Cognition, 50, 271-285.

There is growing evidence that phonetic categories have a rich internal structure, with category members varying systematically in category goodness. Our recent findings on this issue, which are summarized in this paper, underscore the existence and robustness of this structure and indicate further that the mapping between acoustic signal and internal category structure is complex: just as in the case of category boundaries, the best exemplars of a given category are highly dependent on acoustic-phonetic context and are specified by multiple properties of the speech signal. These findings suggest that the listener's representation of phonetic form preserves not only categorical information, but also fine-grained information about the detailed acoustic-phonetic characteristics of the language.


Volaitis, L.E., and Miller, J.L. (1992). Phonetic prototypes: Influence of place of articulation and speaking rate on the internal structure of voicing categories. Journal of the Acoustical Society of America, 92, 723-735.

In this investigation, the effects of context on the perception of voicing contrasts specified by voice-onset-time (VOT) in syllable-initial stop consonants were examined. In an earlier paper [J.L. Miller & L.E. Volaitis, Percept. Psychophys. 46, 505-512 (1989)], it was reported that the listener's adjustment for one contextual variable, speaking rate, was not confined to the region of the phonetic category boundary, but extended throughout the phonetic category. The current investigation examines whether this type of perceptual remapping also occurs for another contextual variable, the place of articulation of the syllable-initial consonant. In a preliminary experiment that involved acoustic measurement of natural speech, it was confirmed that as place of articulation moves from labial to velar, VOT increases, and it was established that this occurs across a range of speaking rates (syllable durations). In the main experiments, which focused on the voiceless category, it was found that this acoustic change was reflected in perception not only as a shift in the location of the voiced-voiceless category boundary, but also a change in both the specific range of stimuli identified as members of the voiceless category and the set of stimuli judged to be the best exemplars, or prototypes, of the category. These findings extend earlier research by showing that a change in place of articulation, like a change in speaking rate, systematically alters the internal perceptual structure of voicing categories.


Miller, J.L., and Volaitis, L.E. (1989). Effect of speaking rate on the perceptual structure of a phonetic category. Perception & Psychophysics, 46, 505-512.

When listeners process temporal properties of speech that convey information about the phonetic segments of the language, they do so in a rate-dependent manner. This is seen as a shift in the location of the phonetic category boundary along a temporal continuum toward longer values of the acoustic property in question, as speech is slowed. In a series of experiments, we found that the adjustment for rate is not confined to the region of the category boundary, but extends throughout the phonetic category. Specifically, a change in rate modified the range of stimuli identified as members of a phonetic category, as well as which stimuli were overtly judged to be good exemplars of the category. These findings suggest that the listener's adjustment for speaking rate entails a comprehensive perceptual remapping between acoustic signal and phonetic structure.