Which of the following statements is consistent with Gibsons ecological theory of perception in infancy?

  • Journal List
  • HHS Author Manuscripts
  • PMC2773797

Lang Speech. Author manuscript; available in PMC 2009 Nov 5.

Published in final edited form as:

PMCID: PMC2773797

NIHMSID: NIHMS154563

Abstract

Numerous findings suggest that non-native speech perception undergoes dramatic changes before the infant’s first birthday. Yet the nature and cause of these changes remain uncertain. We evaluated the predictions of several theoretical accounts of developmental change in infants’ perception of non-native consonant contrasts. Experiment 1 assessed English-learning infants’ discrimination of three isiZulu distinctions that American adults had categorized and discriminated quite differently, consistent with the Perceptual Assimilation Model [PAM: Best, 1995; Best et al., 1988]. All involved a distinction employing a single articulatory organ, in this case the larynx. Consistent with all theoretical accounts, 6–8 month olds discriminated all contrasts. However, 10–12 month olds performed more poorly on each, consistent with the Articulatory-Organ-matching hypothesis [AO] derived from PAM and Articulatory Phonology [Studdert-Kennedy & Goldstein, 2003], specifically that older infants should show a decline for non-native distinctions involving a single articulatory organ. However, the results may also be open to other interpretations. The converse AO hypothesis, that non-native between-organ distinctions will remain highly discriminate to older infants, was tested in Experiment 2. using a non-native Tigrinya distinction involving lips versus tongue tip. Both ages discriminated this between-organ contrast well, further supporting the AO hypothesis. Implications for theoretical accounts of infant speech perception are discussed.

Keywords: articulatory phonology, cross-language, infant speech perception, non-native consonants, perceptual assimilation

Introduction

Adults have difficulty discriminating many consonant distinctions that are not contrastive in their own languages. Yet young infants show no such language-specific biases during the first half-year [e.g., Aslin & Pisoni, 1980b; Lasky, Syrdal-Lasky, & Klein, 1975; Trehub, 1976; cf. Eilers, Gavin, & Wilson, 1979; Streeter, 1976]. Infants under 6–8 months of age can discriminate both native and non-native consonant contrasts, while infants over 10 months apparently have difficulty discriminating non-native consonants that adult speakers in their language environment have difficulty with [see reviews by Best, 1994b; Werker, 1989]. This developmental pattern has been found for English-learning infants tested with the Hindi unaspirated dental versus retrofiex stop contrast [t̪] | [ʈ] and voiceless aspirated versus breathy voiced dental stop contrast [t̪h] [d̪h], and with the Nthlakampx [southwest Native Canadian language] velar versus uvular ejective contrast [k′|–[q′], none of which are phonologically distinctive in English [Best, McRoberts, LaFleur, & Silver-Isenstadt, 1995; Werker, Gilbert, Humphrey, & Tees, 1981; Werker & Lalondc, 1988; Werker & Tees, 1984a]. Japanese-learning infants show a similar developmental trend for English [ʴ]–[l], which is noncontrastive in Japanese [Tsushima, Takizawa, Sasaki, Shiraki, Nishi, Kohno, Menyuk, & Best, 1994]. English-learning infants likewise show a decline in discrimination of a non-native Mandarin fricative-affricate contrast, which older Mandarin-learning infants continue to discriminate [Tsao, Liu, Kuhl, & Tseng, 2000]. Similarly, English-learning infants have greater difficulty than Spanish-learning infants with the Spanish alveolar tap versus trill distinction [r]–[r] [Eilers, Gavin, & Oiler, 1982].

The explanation for this early developmental shift remains uncertain, however. Whether, and how, the onset of word comprehension may relate to changes in discrimination of non-native speech contrasts is not yet understood. The average 10-month-old does not yet produce any words, and has just begun to show word comprehension [Benedict, 1979]. However it is that early word comprehension may contribute to speech perception, by 10 months infants evidently have discovered some important properties of native consonants, and this has begun to constrain their perception of unfamiliar non-native ones. But just what have they learned, and why does it limit non-native speech perception? Has something changed regarding the type of information they perceive in speech? Is there, for example, a developmental shift from detecting surface acoustic or articulatory details, to perceiving phonetic or phonological information, that is, linguistically relevant information?

Insights may be gained by examining several findings that cHallénge the general claim of decline in discrimination of non-native contrasts by 10–12 months. Older infants apparently do not perceive all non-native contrasts in the same way, that is, discrimination does not always decline developmentally. The earliest report of an alternative developmental pattern found that isiZulu1 dental versus lateral click consonants [|]–[|||] arc discriminated quite well not only by English-speaking adults, but also by infants at least through 14 months, the oldest age tested [Best, McRoberts, & Sithole, 1988]. That is, there does not appear to be a developmental decline for this non-native contrast. A follow-up study [Best et al., 1995] reaffirmed that 10–12 month olds discriminate the clicks, even though the same infants failed on the Nthlakampx ejectives for which Werker and colleagues [1984a] had found a performance decline by 10 months. And another lab recently found that both English-and French-learning infants display fairly poor discrimination of the interdental fricative versus alveolar stop distinction [ð]–[d], which is phonologically contrastive in English but not in French [Polka, Colantonio, & Sundara, 2001]. Performance by 6–8 and 10–12 month olds from each language environment was comparable to French-speaking adults’ relatively poor discrimination, rather than approaching the ceiling-level performance of English-speaking adults as expected. Thus, there appears to be developmental improvement in perception of these consonants sometime after 12 months of age, but only if the language environment employs them contrastively.

So why do the developmental patterns differ across non-native consonant contrasts? Few accounts of infant speech perception have addressed this variability directly. Most assume a universal pattern of developmental change [see, e.g., Jusczyk, 1986; Jusczyk, 1993; Jusczyk & Bertoncini, 1988; Kuhl, 1993; Werker, 1989; Werker & Pegg, 1992]. However, two broader models do predict perceptual differences among non-native contrasts. One posits that non-native contrasts vary along a “fragile-robust” perceptual dimension [Burnham, 1986]. Fragile contrasts are defined as distinctions that are low in acoustic salience and rare across the world’s languages. Discrimination of these is predicted to decline in the first year if the language environment does not contrast them. Robust contrasts, on the other hand, involve highly salient acoustic distinctions, are common across languages, and show good discrimination until the early school years even without specific experience.

A difficulty with making fragile-robust predictions, however, is that it is not always clear how to determine the acoustic salience level of a given contrast while avoiding circularity [see Best et al., 1988; Polka et al., 2001]. Moreover, defining fragility-robustness in terms of both rarity and psychoacoustic salience can be problematic, and certain findings are inconsistent with model’s predictions. For example, click consonant contrasts are quite rare across languages. While the psychoacoustic properties of some clicks are presumably quite salient [palatal, alveolar, and lateral clicks], the properties of others are less salient [dental clicks] or even fairly weak [bilabial clicks]. More critically, place of articulation contrasts appear to be low in perceptual salience, compared to voicing or manner contrasts [Miller & Nicely, 1955]. Thus, click place of articulation contrasts would be fragile on both counts. Yet English-learning infants discriminate the isiZulu dental versus lateral clicks, a place distinction, quite well even beyond the first year, as do English-speaking adults; there appears to be no developmental decline for this contrast [Best et al., 1988]. Conversely, the English interdental fricative versus alveolar stop contrast [ð] – [d] is also quite rare. But wheareas [ð] is itself low in acoustic salience [Maddieson, 1984], [d] is robust, and the manner contrast of stop versus fricative is fairly salient perceptually [Miller & Nicely, 1955]. Moreover, this contrast also involves a place difference [interdental vs. alveolar], which should somewhat improve perceptual salience. Thus [ð]–[d] seems relatively robust on psychoacoustic grounds, but fragile in terms of its rarity. Very young infants show poor discrimination of this contrast, even when it is native. Discrimination improves with age if [ð]–[d] is contrastive in the native language, but not until after 12 months [Polka et al., 2001]. Thus, the relatively more perceptually robust of the two contrasts just discussed [[ð] – [d]] is more difficult for infants to discriminate. And neither shows a language-experience decline in discrimination in year one, as Burnham’s model predicts.

A second model that predicts variations in perception is the Perceptual Assimilation Model [PAM] [Best, 1994a, 1994b, 1995; Best et al., 1988], which is of primary concern to the present research. Sparked by their findings with the isiZulu dental versus lateral clicks, Best and colleagues [1988] developed the PAM to account for a wide range of performance on diverse non-native contrasts. Its central premise is that mature listeners have a strong tendency to perceptually assimilate non-native phones to the native phonemes they perceive as most similar. If there is no clear cut similarity to a single native consonant or vowel, the non-native phone may be perceived as falling in between native phonemic categories [i.e., only weak similarity to two or more], as an uncategorizable speech segment. Rarely, the non-native phone may be so dissimilar from anything in the native system that it is not heard as a phonological element at all, instead being perceived as a nonspcech sound. PAM defines “perceptual similarity” within the frameworks of Articulatory Phonology [e.g., Browman & Goldstein, 1989 e.g., Browman & Goldstein, 1990] and the ecological approach to speech perception [Best, 1984, 1994b; Fowler, 1986]. Thus, similarity in the PAM focuses on dynamic articulatory information, that is, on the ways in which articulatory gestures [specific active articulators, constriction locations, and degrees of constriction] shape the speech signal. This view differs from the alternative notion that perceptual similarity, indeed speech perception generally, is derived from source-neutral acoustic features, for example, properties such as psychoacoustic salience.

PAM proposes that discriminability of a non-native distinction will depend on how the listener assimilates the contrasting phones. Non-native consonants that arc both assimilated as equally-good tokens of a single native consonant should be discriminated poorly [Single-Category [SC] assimilation], whereas those that are assimilated to two different native consonants should show near-ceiling discrimination [Two-Category [TC] assimilation]. Contrasting non-native consonants that are assimilated to the same native consonant, but that differ in their perceived degree of similarity to it [Category-Goodness difference in assimilation [CG]], will display good discrimination, intermediate between the SC and TC cases. Thus, discrimination should follow the pattern: TC > CG > SC. This prediction was supported in a recent report on American English-speaking adults’ perception of three isiZulu consonant contrasts. The isiZulu voiced versus voiceless lateral fricatives, voiceless aspirated versus ejective velar stops, and plosive versus implosive bilabial stops, respectively, showed TC, CG and SC assimilation and excellent versus good versus poor discrimination [Best, McRoberts, & Goodell, 2001].

Other findings are also consistent with some PAM predictions [e.g., Best & Avery, 1999; Harnsberger, 2000; Polka, 1992; Polka et al., 2001]. In addition, English-speaking adults’ performance on Werker’s original stimulus contrasts appears to be compatible with the PAM prediction that discrimination is better for TC or CG assimilation than for SC assimilation. English-speaking adults show low discrimination levels and little benefit of perceptual training for Hindi unaspirated dental versus retroflex stops [t̪h]–[ ʈh] and Nthlakampx velar-uvular ejectives [k′]–[q′], but substantially better discrimination and training benefits for the Hindi voiceless aspirated versus breathy voiced dental stops [t̪h]–[d̪h] [Tees & Werker, 1984; Werker et al., 1981; Werker & Tees, 1984b]. Both members of the first contrast are likely assimilated as relatively good exemplars of English /d/, and both members of the second as non-prototypical exemplars of English /k/, both satisfying the criteria for SC assimilation. The third contrast, however, likely shows TC assimilation to English /t/–/d/ or CG assimilation as good versus poor /t/.

Tees and Werker themselves, however, offered a different explanation, which provides yet a third theoretical account of variation in perception of non-native phonetic contrasts. Specifically, they posited an allophonic account, in which variations in perception of non-native contrasts are the result of differential native allophonic exposure. They reasoned that English /t/ [[t̪h] in initial position] and /d/ provide a comparable voice onset time [VOT] distinction to Hindit [t̪h]–[d̪h], thus offering relevant allophonic experience, but that English provides little to no allophonic exposure to retroflex stops or to velar and uvular ejectives.

The allophonic account is weakened by several findings, however. For one, English listeners discriminate isiZulu clicks quite well, without training, despite the fact that they do not occur allophonically in English [Best et al., 1988]. Conversely, English listeners have difficulty discriminating Spanish prevoiced versus short-lag unaspirated stops [e.g., [b] versus [p]], despite the fact that English presents both as allophones of voiced stops [e.g., /b/] [MacKain, 1982]. And Werker herself [Pegg & Werker, 1997] reported more recently that while English-learning 6–8 month olds discriminate between English [d] [an initial /d/] and unaspirated [t] [a /t/ following an /s/], both of which occur as allophones of English /d/, 10–12 month olds fail to discriminate.

Still, accounts of variations in adults’ perception of diverse non-native contrasts may not apply directly to infants, who have not yet established usage of the native language or its phonology. Children take several years to learn the basic structures of their native language. Full use of the native phonological system––its contrastive functions, phonotactic patterns, and contextual allophonic variations––is not achieved until the early school years [see, e.g., Ferguson, Menn, & Stoel-Gammon, 1992]. Shifts in infants’ perception of non-native contrasts presumably reflect the state of their emerging knowledge about native speech. Prior to six months, infants discriminate both native and non-native phonetic contrasts, suggesting that they do not yet recognize native phonemes or contrasts as such.

PAM posits that young infants are simply detecting universal [language-neutral] articulatory patterns in both native and non-native speech. When infants begin to show language-specific effects in discrimination of some non-native contrasts during the second half-year, PAM posits that they have begun to recognize familiar articulatory patterns in native speech, due to perceptual learning or attunement [consistent with the principles of perceptual learning discussed in Gibson & Gibson, 1955]. However, this shift does not yet involve recognition of truly phonological information [i.e., segmental elements of an organized system of minimal contrasts]. Such phonological ability is likely to be associated with some critical level of lexical and/or morpho-phonemic development, and may not be complete until at least 5–6 years of age [Best, 1993, 1994b]. Compatible with the notion that the emergence of contrastive phonology cannot account for the 10–12 month decline in non-native speech discrimination, recognition of familiar words at this age appears to be phonologically underspecified [Hallé & de Boysson-Bardies, 1994; 1996]. That is, infants seem to recognize familiar words even when they are “mispronounced” with a phonetic feature change on the initial consonant [i.e., minimal contrast]. By 14 months, infants do show finer-grained phonetic representation of familiar words, in that they respond differently to familiar words that are correctly pronounced versus mispronounced [Fennell & Werker, this volume; Swingley & Aslin, 2002; see also Swingley, this volume]. But toddlers do not show sensitivity to native minimal contrasts in learning new words [artificial-language] prior to about 17–18 months [Stager & Werker, 1997], when the average child begins to produce simple morphology and syntax, display a spurt in vocabulary growth, and show systematic phonologically-motivated patterns in word production.

These observations raise the question of whether infant perceptual development may vary for non-native contrasts that adults assimilate differently. Although there are developmental variations among non-native and even native contrasts, it is not clear whether and how those differences relate to adult patterns of assimilation and discrimination. Nor is it clear what they indicate about the state of the infant’s perceptual learning of native speech properties. Whereas the fragile-robust hypothesis and the allophonic-experience hypothesis predict developmental differences for varying types of non-native contrasts, several findings have cast doubt on those predictions, as reviewed above. PAM, on the other hand, predicted the 10–12 month decline in discrimination for non-native contrasts that adults assimilate to a Single Category [SC], as well as the lack of developmental decline for perception of nonassimilable [NA] click consonants. If PAM predictions for adult non-native speech perception extend to other types of non-native contrasts, we should expect a similar lack of decline for Two-Category [TC] assimilations. However, the latter assumption may not be reasonable for infants, and PAM’s predictions regarding developmental changes in infants’ discrimination of Category Goodness differences [CG], or Uncategorized non-native phones, may also differ from adults’ performance patterns. Thus, it would be important to evaluate developmental changes in 6–12 month-old infants’ perception for a set of non-native contrasts that adults assimilate and discriminate differently.

If infants do not perceive native consonant distinctions as minimally contrastive elements within a phonological system, as suggested by the research on lexical development and learning summarized above, until about 17–18 months [or possibly as early as 14 months, for familiar words] [see Fennell & Werker, this volume; Hallé & de Boysson-Bardies, 1994; Hallé & de Boysson-Bardies, 1996; Stager & Werker, 1997; Swingley, this volume; Swingley & Aslin, 2002],2 then their perception of various types of non-native contrasts at 10–12 months must certainly deviate in some ways from adults’ assimilation patterns. Consistent with this notion, even though adult English speakers discriminate Hindi [t̪h]–[d̪h] better than Hindi [d̪h]-[ɖ] and Nthlakampx [k′]–[q′], English-learning infants showed an equivalent and simultaneous decline in discrimination of all three non-native contrasts by 10–12 months [Tees & Werker, 1984; Werker et al., 1981; Werker & Tees, 1984a, 1984b]. In addition, Polka and colleagues [2001] found poor discrimination in English-learning infants throughout the first year for a native contrast that adults discriminated at ceiling. A full understanding of the basis for the perceptual change around 10 months, however, requires systematic developmental comparisons of infants’ discrimination for several non-native contrasts on which adults show a wide range of performance. Developmental changes for non-native contrasts on which adults have shown excellent versus good versus poor discrimination would be especially informative. The TC, CG, and SC assimilation patterns in American adults’ perception of the three isiZulu consonant contrasts described earlier [Best et al., 2001] satisfy this requirement. Therefore, we tested 6–8 and 10–12 month-old American infants on those three isiZulu contrasts in Experiment 1.

To provide a foundation for predictions about younger versus older infants’ perception of these contrasts, we will summarize the adult perceptual findings for each stimulus contrast, the articulatory and acoustic properties of each consonant, and allophonic and nonspeech listening experience that might be relevant. With respect to perceptual findings, the lateral fricative voicing contrast /ɬ/–/ɮ/ [phonetically realized as [ɬ] – [/ɮ]] elicited TC assimilation to an English phonological contrast by adults, who showed near ceiling discrimination [Best et al., 2001]. The articulatory difference involved in the contrast is a laryngeal [glottal] gesture distinction involving vocal fold abduction for the voiceless fricative but not the voiced one, a distinction that is also found in English fricative voicing distinctions such as /s/–/z/. Lingual articulation for the lateral fricatives is similar to English /I/ [i.e., constriction of tongue tip and tongue dorsum, such that air flows laterally over the sides of the tongue]. However, lateral fricatives involve more constriction along the sides of the tongue than in /l/, resulting in noisy turbulence [i.e., frication]. Thus, in the lateral fricative contrast, it is the supralaryngeal articulatory organization that is non-native to English. As for acoustic properties, 24 measurements were made on the multiple tokens of each consonant. Systematic acoustic differences between the lateral fricatives were found on three other measures besides the obvious voicing difference: frication duration was longer and F0 and Fl frequencies at vocalic onset were higher in the voiceless lateral fricatives, consistent with acoustic differences between English voiced and voiceless fricatives [e.g., Pirello, Blumstein, & Kurowski, 1997; Slis & Cohen, 1969]. All other acoustic measures showed either partially or completely overlapping ranges of values [for further stimulus details, see Best et al., 2001]. We note, however, that voicing distinctions for fricatives as well as stops have been shown to be perceptually robust [Miller & Nicely, 1955]. As for English speakers’ experience with lateral fricatives, they do not occur as allophones in standard English. While /θ/ and /ð/ preceding /l/ [voiceless /θ/: ATHLETE; voiced /ð/: BLITHELY] may seem similar to isiZulu lateral fricatives, their tongue tip contact is dental rather than alveolar and they lack lateral frication as in [ɮ] – [/ɬ] In any case, interdental fricative +/l/ sequences are quite rare in English. No nonspeech listening experience appears to be relevant to perception of the lateral fricatives.3

The perceptual findings on the isiZulu voiceless aspirated versus ejective velar stop contrast /k/–/k′/ [realized as [kh]–[k′]] are that it was assimilated by adults as a CG difference in goodness of fit to English /k/. It was discriminated quite well, but significantly less well than the lateral fricatives. In terms of articulatory properties, supralaryngeal articulation of both velars is virtually identical to English /k/ when it is aspirated [[kh]]; it is the laryngeal distinction that is non-English. isiZulu and English [kh] involve the same glottal opening gesture, but the glottal closure for ejective [k′] is not employed contrastively in English. Acoustic measures showed that the release bursts differed systematically between the isiZulu velars: amplitude was higher, duration was longer, and mean weighted frequency was higher at early, mid and late portions of the burst for the ejective than for the voiceless aspirated items. As for allophonic experience with these phones, English provides much exposure to [kh], but none to ejective stops. With respect to perception of nonspeech properties of the velar stops, Americans typically hear the ejective glottal gesture in isiZulu [k′] as a nonspeech vocal tract event superimposed on a /k/, for example, choking, gagging, throat-clearing, clicking, clacking, clucking, gurgling [Best et al., 2001].

The perceptual findings for the third contrast, the isiZulu plosive versus implosive bilabial stop contrast /b/–/ɓ/[phonetically realized as [p]–[b]], are that the majority of adults showed SC assimilation of both as equally-good /b/’s, and discriminated them rather poorly, though above chance. Articulatorily, the supralaryngeal gesture for the bilabials is identical to English /b/; it is the laryngeal [glottal] distinction that is noncontrastive in English. The glottal setting is virtually identical for isiZulu plosive /b/ and one of the two primary allophones of English /b/: onset of voicing is simultaneous with biliabial release [i.e., voiceless unaspirated [p]] for isiZulu /b/ [Doke, 1926], as well as for voiceless unaspirated allophones [[p]] of English /b/. As for implosive /ɓ/, although older sources state that it is produced with rapid larynx-lowering resulting in negative oral airflow at closure release [Canonici, 1989; Doke, 1926; Maddicson, 1984; Poulos & Bosch, 1997; Van Wyck, 1979; Ziervogel, Louw, & Taljaard, 1985], more recent data indicate that it is no longer realized as an implosive but rather as a prevoiced plosive stop [Giannini, Pettorino, & Toscano, 1988; Traill, Khumalo, & Fridjhon, 1987]. The plosive characterization seems appropriate for our isiZulu /6/ stimuli, which are prevoiced and have prominent noise bursts at release [see Best et al., 2001].4 Thus, isiZulu /ɓ/ is apparently realized phonetically as prevoiced, plosive [b], which is the other of the two primary allophonic variants of English /b/. In terms of acoustic measures, there were four systematic differences between the isiZulu bilabials: release burst amplitude was higher and F0 and Fl onset frequencies were higher, and voice onset time [VOT] was negative [prevoiced] for /ɓ/ [[b]], but short-lag unaspirated for /b/ [[p]]. Based on these observations, English allophonic experience would be expected to be ample for both isiZulu /ɓ/ [[p]] and /ɓ/ [[b]]. As for nonspeech perception of this contrast, only /ɓ/ evoked nonspeech percepts, which were much less frequent and subtler than for the ejective velars, for example, pursed lips, “harder” pronunciation, or tenser speech muscles [Best et al., 2001].

To summarize, in each isiZulu contrast, both members involve identical gestures of the same supralaryngeal articulator[s] and differ by a minimal distinction in articulatory gestures made by a single articulatory organ, the larynx. In other words, none of the contrasts is based on distinction between gestures of different vocal tract articulators, as would be the case with, for example, /p/–/t/ [closure/release gesture of lips vs. tongue tip]. Let’s review the key properties of the laryngeal contrasts, for purposes of making predictions from various theoretical models. Only one laryngeal distinction is phonologically contrastive in English: that for voiced versus voiceless fricatives. As for the other two non-native laryngeal contrasts, the ejective gesture of the velar contrast is obviously non-English and has a marked effect on the aerodynamics/acoustics of the velar stop release, whereas both bilabials involve laryngeal settings that occur in English but as noncontrastive allophonic variants of a single English phoneme [/b/. To the extent that allophonic experience may affect perception of non-native contrasts, English allophonic experience is extensive for one member of the isiZulu velar contrast, but is lacking for its cognate. It is weak to nonexistent for both lateral fricatives. And it is substantial for both bilabial stops. Nonspeech qualities appear to contribute substantially to adults’ perception of the ejective velar stop, much less to perception of the implosive bilabial /ɓ/ [prcvoiccd [b]], and not at all to perception of the lateral fricatives.

What predictions may be made about developmental changes in infants’ perception of the three isiZulu contrasts, particularly between 6–8 months and 10–12 months, when discrimination has declined for a number of non-native consonant contrasts? A variety of theoretical views offer a range of scenarios. One set of approaches to infant speech perception may be broadly grouped by their common assumption that general information-handling mechanisms, rather than specialized linguistic ones, are responsible for infants’ perception of speech. These mechanisms are modified or adjusted by auditory-acoustic experience, as opposed to specifically linguistic forces.

Burnham’s [1986] fragile-robust hypothesis falls within this view. Though that proposal has been undercut by several findings, it may still be useful to attempt to locate the three isiZulu contrasts along the fragile-robust dimension, and to make predictions for the present study. Although the lateral fricative voicing contrast is quite infrequent in the world’s languages [Maddieson, 1984], voicing contrasts are generally quite perceptually salient [Miller & Nicely, 1955], as summarized earlier. Thus, this isiZulu contrast is probably robust on psychoacoustic grounds alone. By comparison, ejective stops are more frequent across languages, and their most common place of articulation is velar. Additionally, the languages that use ejective velar stops frequently contrast them with the homorganic voiceless plosive, thus /k′/–/k/ [[kh]] occurs more frequently than /ɮ/–/ɬ/ [Maddieson, 1984]. Given the aerodynamic/acoustic effects of ejective release, the velar stop contrast is also likely to be relatively salient in a psychoacoustic sense. The isiZulu bilabial contrast /b/–/ɓ/ appears to be realized, as summarized above, as voiceless unaspirated [p] versus voiced [b], a highly frequent contrast in the world’s languages [substantially more frequent than the English voiced/unaspirated vs. voiceless aspirated contrast] [Maddieson, 1984]. Regardless of whether the bilabial contrast is truly plosive versus implosive, or voiceless unaspirated versus prevoiced, both types of distinction are psychoacoustically salient according to Burnham [1986]. Thus, all three contrasts appear to be robust. According to the fragile-robust proposal, then, all three isiZulu laryngeal contrasts should still be discriminated well past 10–12 months, declining only later in early childhood.

A number of other general-mechanism accounts have been proposed, including auditory experience-based tuning of sensorineural, psychoacoustic, or attentional mechanisms [e.g., Aslin & Pisoni, 1980b; Kuhl, 1993; Tees & Werker, 1984; Werker & Tees, 1984a, 1984b; Werker et al., 1981; see also Harnsberger, 2000; Polka, 1992; Polka et al., 2001]. One such account is Kuhl’s Native Language Magnet [NLM] model [1993], which posits that exposure to the acoustic properties of native phonemes results in the formation of phonetic category prototypes that “warp” the surrounding perceptual space. The prototypes act like perceptual magnets for acoustically similar tokens of the same phonetic category, making the latter difficult to discriminate from them. Nonprototypical members of the category [i.e., poor exemplars] fail to act like magnets, as do nonexperienced non-native phones. Thus, discrimination of acoustically similar tokens is significantly better around nonprototypes and non-native phones [i.e., perceptual generalization is poorer] than around prototypes. This suggests that older infants [i.e., 10–12 months] should still be able to discriminate the lateral fricatives well as two nonexperienced nonprototypes, but they should show a modest age-related decline in discriminating the velar stops, which include a native-English prototype [[kh]] and a clear nonprototype. They should show a sharp decline in discriminating the bilabials, both of which are prototypical of English /b/.

Cognitive accounts also fall under the general-mechanism rubric. One proposal is that the emergence of certain basic cognitive abilities, which underlie increases in categorization and object search and detour navigation skills at around 10 months, may account for changes in non-native speech category recognition at that age [Diamond, Werker, & Lalonde, 1994; Lalonde & Werker, 1995]. Another view is that the developmental change in non-native speech perception is linked specifically to emerging abilities to use correlations among multiple features of experienced stimuli, in order to recognize category identity [e.g., Cohen, 1998; Younger & Cohen, 1983]. In the case of speech, infants are assumed to use the multiple features of native phones in order to recognize phonetic category identity. If either cognitive view is correct, discrimination should decline by 10–12 months for the lateral fricatives, which both deviate from familiar phonetic categories. Discrimination should also decline for the bilabial contrast, but in this case because both isiZulu phonemes occur as exemplars of a single native phoneme, such that both should lead to recognition of the same native phonemic category. However, discrimination should remain high for the velar contrast because it involves a comparison between a familiar phonetic category and an unfamiliar one.

In addition to the general-mechanism approaches, however, specialized linguistic accounts also offer developmental predictions for the three isiZulu contrasts. The phonological view posits that truly linguistic, phonemic segments and contrasts emerge in the second half-year. By one such account, young infants display nonlinguistic, psychoacoustic-based perception of speech, but this gives way to perception of linguistic units [i.e., phonemes] when comprehension of word meaning begins to appear around 10 months [WRAPSA model: Jusczyk, 1993; 1994; 1997]. Pegg and Werker [1997] offer another phonological account. Finding that English-learning 6–8 month olds, but not 10–12 month olds, discriminate English voiced [d] from unaspirated [t] [both of which occur as allophones of the phoneme /d/], they concluded that the native phonological status of a phonetic distinction governs older infants’ perception of noncontrastive native allophones, as well as of non-native contrasts. That is, native contrastive status fosters a phonological reorganization of speech perception at around 10–12 months. The phonological approaches predict that American English-learning infants of 10–12 months should perceive the non-native isiZulu contrasts as do American adults. That is, consistent with PAM findings on non-native speech perception in adults [e.g., Best, 1995; Best et al., 2001], older infants should perceive the lateral fricatives as corresponding to some native phonological distinction and discriminate them better than the velar stops, which they should perceive as a good versus less-good English /k/. They should show great difficulty with the bilabial stops, which they should hear as two good exemplars of English /b/. Note that this view predicts essentially the same pattern of developmental changes across the isiZulu contrasts as does NLM, although the rationale for the predictions is starkly different between the two views.

Phonetic accounts instead posit that although 10–12 month olds have become attuned to phonetic details of the language environment, they do not yet recognize abstract phonological contrasts per se. One such phonetic view is the allophonic experience hypothesis described earlier [e.g., Tees & Werker, 1984; Werker & Tees, 1984b; see also Maye, Werker, & Gerken, 2002], which would predict a failure in discrimination of the lateral fricatives at 10–12 months due to a dearth of allophonic experience. However, the velar stop contrast should be discriminated, though somewhat less well than at 6–8 months, because it pits an experienced English allophone against a nonexperienced phone. The bilabial contrast should be discriminated quite well, with no decline at 10–12 months, because the phonetic realization of both members of this contrast are frequent allophones of English /b/.

However, the simple allophonic experience hypothesis has been called into question, as noted previously. Werker’s current phonetic view instead focuses on the task of word-learning. Recently, she and Stager reported that infants discriminate native minimal contrasts in a pure speech perception task at 14 months, but cannot discriminate the same contrast in a word-learning task until 18–20 months [Stager & Werker, 1997; see also Fennell & Werker, this volume; Swingley, this volume]. They concluded that infants under 18 months recognize native phonetic categories but not yet phonological contrasts. It is not entirely clear what this view would predict for our contrasts, though by extrapolation 10–12 month olds should show poor discrimination of the lateral fricatives, neither of which corresponds to a native phonetic category. The bilabials should also be quite difficult for the older infants to discriminate, but because both correspond to the same native phonetic category. There should be only a modest developmental decline in discriminating the velars, which compare a native phonetic category against a non-native one.

PAM presents another phonetic view of early perceptual development, positing that speech perception becomes attuned to native articulatory-phonetic patterns by 10–12 months, specifically to native-language “constellations” of gestures at the segmental or syllabic level [a concept from Articulatory Phonology [AP]: Browman & Goldstein, in preparation; Studdert-Kennedy & Goldstein, 2003]. Prior to that attunement, younger infants are assumed to be more universally sensitive to detecting simple differences between single gestures [e.g., tongue tip closure vs. lip closure], rather than noting how gestures are combined into native constellations [e.g., tongue tip closure plus correctly-phased glottal opening for [th] versus [t]]. Truly phonological attunement, that is, to the native system of minimal contrasts and phonological alternations and morphophonemic patterning, is posited not to be evident until later in development [Best, 1994a, 1994b, 1995]. It was the developmental PAM viewpoint that was of primary interest to us here. Because of the articulatory assumptions of PAM, we believed it important to extend the model to make specific predictions based on AP principles regarding articulatory gestures. We did so by including Goldstein’s articulatory organ [AO] hypothesis [Browman & Goldstein, in preparation; Studdert-Kennedy & Goldstein, 2003], which he generated to extend the theoretical framework of AP to early development, specifically in order to address how infants learn speech by imitating articulatory gestures [see also Meltzoff & Moore, 1997; Studdert-Kennedy, 2002]. The organ hypothesis posits that what infants detect in a speech segment [or syllable/word] is the primary articulatory organ[s] [e.g., lips, larynx, velum] that produced it; infants are posited to be much less likely to recognize the parametric details of the gesture [speed, precise location]. Thus, they will have greater difficulty discriminating a minimal phonetic contrast distinguished by two different gestures made by the same primary articulator [i.e., within-organ contrasts], than discriminating a minimal contrast distinguished by a given gesture produced by different articulators [i.e., between-organ contrasts].

We combined the articulatory organ hypothesis with PAM’s assumption that infants become attuned to native gestural constellations by the end of the first year. This led to the prediction that discrimination of non-native within-organ contrasts will decline earlier and more dramatically than discrimination of non-native between-organ contrasts. The three isiZulu contrasts are each within-organ laryngeal distinctions, either involving a non-native laryngeal gesture [velar ejective], a native laryngeal distinction in the context of a non-native supralaryngeal gesture pattern [lateral fricatives], or a laryngeal distinction that occurs but is noncontrastive in the native language [bilabial stops]. Thus, according to the PAM/articulatory organ [PAM/AO] hypothesis, 10–12 month olds should show a decline in discrimination of minimal within-organ contrasts between non-native phones that they hear as members of a native phonetic category, which should be the case for both the isiZulu bilabial stops and velar stops. In the case of the lateral fricatives, neither member of this within-organ contrast corresponds to any native phonetic categories, and they differ from one another only by different laryngeal gestures. Thus, 10–12 month olds should also show a decline in discriminating this non-native distinction, even though adults assimilated it as a TC phonological contrast and discriminated it quite well. Thus, by our PAM/OA reasoning, 10–12 month olds should show a decline in discrimination, relative to 6–8 month olds, for all three isiZulu contrasts.

The goal of this report, then, was to evaluate the PAM/AO hypothesis against the other theoretical possibilities described earlier [see Table 1 for summary of predictions]. The findings should help to better determine the nature of the changes that occur in native phonological development during the first year. Experiment 1 focused on 6–8 versus 10–12 month olds’ discrimination of non-native single-organ [laryngeal] contrasts that American English-speaking adults had assimilated respectively as TC, CG, and SC contrasts. A non-native between-organ contrast, which American adults had also assimilated as a TC contrast, was tested for comparison in Experiment 2.

TABLE 1

Predictions from various theoretical perspectives regarding discrimination of the three isiZulu within-organ consonant contrasts at 10–12 months [Experiment l], by comparison to 6–8 months

Hypotheses:Discrimination At 10–12 months:Bilabial stopsVelar stopsLateral fricatives
General mechanism accounts:
Fragile-Robust [Burnham] very good very good very good
Auditory Tuning [Kuhl] failure some decline very good
General Cognitive [Werker et al.; Cohen et al.] failure very good failure
Linguistic accounts:
Phonological/PAM adult [Jusczyk; Pegg/Werker; Best] failure some decline very good
Allophonic Experience [Tees/Werker] very good some decline failure
Native Word Phonetics [Stager/Werker] failure some decline failure
PAM/Articulatory Organ [Best/Goldstein] decline decline decline

Experiment 1

2.1Method

Participants

The final data set included 11 infants at 6–8 months [Mage = 7 months 10 days; range = 6 months 8 days to 8 months 10 days] and 11 infants at 10–12 months [Mage = 11 months 15 days, range = 10 months 21 days to 12 months 27 days]. All were normal, full-term infants without gestational or labor/delivery complications, and were free of ear infections or colds on the day of testing. These infants had all successfully completed three tests within the study session, one for each of the isiZulu stimulus contrasts. An additional 24 infants at 6–8 months and 13 infants at 10–12 months were tested but later excluded from the study for crying, equipment failure, experimenter error, parental interference, or inattentiveness [i.e., 10 or more consecutive trials without visual fixation responses],5 or because of ear infection/cold on the test day, pregnancy/delivery complications, or familial speech/language disorders.

Stimulus materials

The stimuli were from Best et al. [2001]: [ɬ] – [ɮ][voiced vs. voiceless lateral fricatives], [kha] – [kia] [voiceless aspirated vs. ejective velar stops], and [pu] – [ɓu] [unaspirated plosive vs. implosive bilabial stops]. Different vowels were used for each contrast in order to maintain infants’ attention across the three required tests, as this was a within-subjects design.

The syllables all had high tone on the vowel6 and were spoken by an adult female native speaker of isiZulu from Durban, South Africa. Six tokens of each syllable had been selected from the recordings; the contrasting sets of tokens had been chosen to match as closely as possible on all acoustic dimensions other than those critical to the phonetic distinction. [For full details on stimulus development and acoustic measurements, see Best et al., 2001.]

Procedure

We employed the same infant-controlled visual fixation habituation procedure used in our previous studies [Best et al., 1988; Best et al., 1995; see also Miller, 1983]. Random tokens of one stimulus category were played to the infant over a hidden loudspeaker at a conversational listening level [65–70 db SPL] whenever the infant fixated on a colored checkerboard directly facing him/her, which was rear-projected onto a sound-attenuating window that separated the test room and the adjacent observation room. Thus, the infant was conditioned to fixate the checkerboard, which was reinforced with speech presentations, analogous to the experimental contingencies in the well-known high-amplitude sucking habituation procedure [e.g., Eimas, Siqueland, Jusczyk, & Vigorito, 1971]. A video camera hidden below the checkerboard display was connected to a video monitor in the observation room, allowing the experimenter to monitor the infant’s fixations of the checkerboard via corneal reflections and other visible indices of directed gaze [e.g., head/eye orientation]. Fixations and bouts of crying or sleeping were recorded via key press from an observer response keyboard connected to a computer that controlled the presentation of audio stimuli from an Otari 5050MXB reel-to-reel tape deck, dependent on the infant’s fixation pattern.

Trial duration was under infant control. For as long as the infant fixated the projected checkerboard, audio tokens from one stimulus category were presented [ISI = 750ms]. Audio presentations ceased [after completing the ongoing stimulus token] whenever the infant looked away. A given trial continued for as long as the infant maintained fixation, or if the infant returned to fixating after a brief look-away of less than 2 s. However, if the infant looked away for 2 s consecutively, the trial ended and the checkerboard disappeared during the 1 s intertrial interval, after which the checkerboard automatically reappeared, signaling the beginning of a new trial. Habituation was defined as two consecutive trials with fixation durations below 50% of the mean of the two highest preceding trials [Miller, 1983]. The habituation criterion was calculated and updated on a trial-to-trial basis by the experimental computer program. Once habituation was met during the first phase [familiarization], audio presentations shifted to the contrasting stimulus category for the test phase, which continued until the infant again met the habituation criterion.

During testing the infant sat in an infant seat or on the parent’s lap in a dimly lit 2 m × 1 m × 1 m testing booth, at a distance of approximately .5 m from the rear-projection window. The booth was open at the back and its sides were covered with black fabric. The wall at the front of the booth was also covered with black fabric, except for the 0.6 m × 0.6 m area directly in front of the infant where the checkerboard was projected; a small opening for the video camera lens was below the display. A loudspeaker [Jamo mini-speaker], attached to the wall 1 m above the projection window and hidden behind the black cloth covering, was used for stimulus presentations. Both the parent and the experimenter observing the infants’ fixations listened to music over circum-aural headphones [Scnnheiser HD440] during tests to prevent them from hearing the stimuli and inadvertently influencing the infant or the fixation observations.

Each infant completed a discrimination test on each of the three stimulus contrasts within a single session [see also Best et al., 1988, 1995]. Test order was randomized across infants within each age group. Short breaks of 5–10 mins were taken between tests if necessary to maintain infants’ attention and/or to soothe them if they had become irritable. Otherwise, the session proceeded from one test to the next with just a 1–2mins break to reposition the audio tape and restart the computer program. Infants were eliminated from the final data set if they cried for more than a cumulative 30 seconds during any test, or if they cried during any trials just before or after the test shift.

2.2 Results

lnterobserver reliability

The data for a random selection of 16 infants [i.e., 48 individual tests] were rescored by second observers, who reran the testing program while viewing the infants’ test session videotapes [i.e., off-line]. Interobserver reliability was evaluated statistically via rank-order correlations of the per-trial looking times registered by the original and second observers. Reliability was quite good [Mr = 0.97; range = 0.77 to 1.00].7

Discrimination results

Discrimination was assessed by comparing mean fixation duration during the last two trials of the familiarization phase [preshift block] to mean fixation during the first two trials of the test phase [postshift block]. The postshift block was defined as beginning with the first trial after the stimulus shift in which the infant fixated on the slide and thus had an opportunity to begin hearing the test stimuli [Best et al., 1988; Best et al., 1995]. A significant increase in fixation during the postshift block relative to the preshift block is taken as evidence that infants detected the stimulus change. The data were entered into an Age [6–8 months, 10–12 months] × Stimulus Contrast [lateral fricatives, velars, bilabials] × Trial Block [preshift, postshift] analysis of variance [ANOVA]. Test order was not included as a factor because preliminary analyses indicated it had no systematic effect on discrimination. Trial Block was the only significant overall effect: infants looked longer in postshift trials than preshift trials [M = 6.22 vs. 3.49s, respectively], indicating overall recovery of fixation during test trials [i.e., reliable discrimination], F[1, 40] = 12.01, p< .003.

Age groups

To more directly evaluate a priori hypotheses about non-native speech discrimination performance at each age [Table 1], separate Trial Block × Stimulus Contrast ANOVAs were conducted for each age group. For the 6–8 month olds, the Trial Block effect was significant, F[1, 10] = 8.413, p< .016, indicating reliable discrimination across all contrasts. However, for the 10–12 month olds the Trial Block effect was nonsignificant, F[1, 10] = 3.73, p > .082. No other effects approached significance at either age.

Stimulus contrasts

To further examine the a priori predictions [Table 1] about differences between 6–8 and 10–12 month olds’ responses to the individual stimulus contrasts, separate Age × Trial Block ANOVAs were conducted for each contrast, followed by simple effects tests on the interaction. This allowed a direct test of age differences in discrimination of each isiZulu contrast.

For the bilabial stop contrast, on which adults had shown SC assimilation and poor discrimination, the infants’ Trial Block effect was significant, F[1, 10] = 9.003, p < .008. Simple effects tests on the Age × Trial Block interaction revealed that there was significant postshift recovery of fixation [i.e., reliable discrimination] by the 6–8 month olds, F[1, 20] = 6.099, p< .025. But there was nonsignificant recovery by the 10–12 month olds, F[1, 20] = 3.146, p >.09, indicating that discrimination was unreliable in the older group.

Adults had shown CG assimilation of the velar stop contrast, with good discrimination. Across infant ages, the Trial Block effect for this contrast was significant, F[1, 20] = 6.637, p < .02. Simple effects tests on the Age × Trial Block interaction indicated that postshift recovery of fixation [i.e.. discrimination] was nearly significant for the 6–8 month olds, F[1, 20] = 4.051, p = .058, but not in the 10–12 month olds, 20] = 2.659, p = .12. No other simple effects approached significance for this contrast.

For the lateral fricatives, which had yielded TC assimilation with excellent discrimination in adults, the infants’ Trial Block effect was not significant. Simple effects tests on Age × Trial Block interaction indicated that the younger age showed significant recovery of postshift fixation. F[1, 20] = 6.46, p .85]. Moreover, while there was no reliable age difference in preshift fixation levels, the younger group displayed significantly greater postshift fixation than did the older group, F[1, 39] = 4.368, p< .045. Thus, the younger group discriminated the lateral fricative contrast, whereas the older group did not.

Figure 1 displays performance on each contrast by the two age groups. In sum, there was significant evidence of discrimination by the younger group for the bilabial and lateral fricative contrasts, with nearly significant discrimination for the velar contrast. By comparison, the older group failed to show reliable discrimination of any of the contrasts.

Discrimination of the three isiZulu consonant contrasts by 6–8 and 10–12 month old American infants, Experiment 1

2.3 Discussion

The results of Experiment 1 are consistent with the general expectation of most models reviewed, that 6–8 month olds would discriminate all three isiZulu contrasts, and that 10–12 month olds would show a decline in discrimination for one or more contrasts [see Table 1]. In actuality, the 10–12 month olds failed to discriminate not only the bilabial stops, on which adults had shown poor discrimination, but also the velar stops and the lateral fricatives, on which adults had respectively shown good and excellent discrimination [Best et al., 2001]. Considering the various hypotheses described earlier, these results are most consistent with PAM/AO, which hypothesized that older infants have become attuned to native phonetic constellations but not to phonological principles, and predicted that 10–12 month olds would show a decline relative to the 6–8 month olds in discriminating non-native within-organ distinctions. PAM/OA is the only hypothesis that predicted 10–12 month olds’ difficulties with all three isiZulu contrasts [see Table 1]. We note, in addition, that simple psychoacoustic principles do not seem to easily explain the decline in the older infants’ discrimination of all three minimal laryngeal contrasts, given that voicing contrasts have been found to be perceptually robust [Miller & Nicely, 1955], and that the ejective versus voiceless aspirated stop contrast and the plosive versus implosive stop contrast [phonetically, a prevoiced vs. voiceless unaspirated] were posited to be psychoacoustically robust [see Burnham, 1986]. Certainly, at the very least, the perceptual salience of the fricative voicing contrast should otherwise have made it quite discriminable to the older infants on a psychoacoustic basis.

While the results appear most compatible with the PAM/AO view, however, converging support is needed, especially given that no differences among the three contrasts were predicted for the older infants [or the younger infants, for that matter]. Moreover, additional possible explanations of the older infants’ difficulties must be considered. Toward those ends, Experiment 2 involved PAM/AO predictions of perceptual differences among non-native contrasts for 10–12 month olds. It focused specifically on teasing apart the factors that may contribute to developmental changes in discriminating the type of contrast represented by the lateral fricatives: those that adults assimilate as TC contrasts [i.e., as phonological distinctions] and discriminate quite well. In Experiment 1, it was the lateral fricative TC contrast that had yielded the most striking difference between older infants’ poor performance and prior findings of excellent performance in adults.

Experiment 2

To evaluate whether the older infants’ difficulty with the lateral fricatives might be attributable to the fact that these represent within-organ gestural distinctions, Experiment 2 incorporated a between-organ distinction that had also shown TC assimilation and excellent discrimination in American English-speaking adults: the bilabial versus alveolar ejective stop contrast/p′/–/t′/ of Tigrinya, a language spoken in Eritrea [Ethiopia] [Best et al., 2001]. Both [p′] and [t′] involve a non-native ejective laryngeal gesture like that in isiZulu [k′]. However, they differ in their supralaryngeal gestures, which involve two different articulatory organs: lips versus tongue tip. Because [p′]–[t′] is a between-organ contrast, the PAM/OA approach predicts good discrimination by older infants, in contrast to their decline in discrimination of the lateral fricatives relative to younger infants. For comparison, we also included a replication with the isiZulu lateral fricatives. This was a particularly important comparison, given that [1] we wished to utilize a pair of contrasts for which PAM/AO predicts different developmental patterns, and [2] we decided it was important to modify the habituation criterion [see below].

Several other factors besides the articulatory organ difference, in addition, could potentially have contributed to the older infants’ difficulty with the lateral fricatives, and needed to be explored. For example, their emerging sensitivity to English phonotactic constraints [see Jusczyk, 1994] may have interfered with their discrimination of the lateral fricative syllables, which violated the English phonotactic rule against open syllables ending in lax vowels such as [ε]. To evaluate this phonotactic hypothesis, we used [ε] in an open-syllable context for the new between-organ non-native contrast, also. If the violation of native phonotactic constraints was responsible for the 10–12 month olds’ difficulty, then they should show an equivalent level of difficulty with the Tigrinya ejectives in this context, contrary to the PAM/OA prediction of good discrimination.

Alternatively, a few researchers have claimed that both older and younger infants have difficulty discriminating fricative voicing distinctions, even including native fricative voicing contrasts [e.g., Eilers, 1977; see discussion by Burnham, 1986]. The fact that the younger infants in Experiment 1 discriminated the lateral fricative voicing contrast casts some doubt on this hypothesis. Nonetheless, we evaluated it for thoroughness, and to compare it against the PAM/AO prediction that older infants should show a decline in discrimination of within-organ [laryngeal] minimal contrasts, relative to younger infants. Therefore, we also included an English fricative voicing contrast, using the same vowel and syllable context as the lateral fricatives. This contrast employs the same within-organ contrast as the lateral fricatives, yet it occurs in English. Most of the viewpoints listed in Table 1 would predict that because this contrast occurs in the infants’ language environment, older infants should continue to discriminate it very well. On the other hand, if there is anything to the claim that infants have difficulty with fricative voicing distinctions, regardless of language experience, then infants of both ages should show a decline in discrimination of this native contrast as well as the non-native lateral fricative voicing contrast. With respect to the phonotactic hypothesis summarized in the preceding paragraph, if older infants’ growing sensitivity to native phonotactic violations affects discrimination even for native consonants, they may show a decline in discrimination of all three Experiment 2 contrasts.

See Table 2 for a summary of the properties and various theoretical predictions for the non-native and native contrasts tested in Experiment 2.

TABLE 2

Critical properties of the consonant contrasts used in Experiment 2, and predictions of various models

Tigrinya ejectives: place of articulationEnglish fricatives: voicingisiZulu fricatives: voicing
Properties of contrasts:
Language experience: non-native native non-native
Articulatory organ distinction: between-organ [lips vs. tongue tip] within-organ [laryngeal] within-organ [laryngeal]
Predictions:
phonotactic learning [Jusczyk et al.] decline at 10–12 months decline at 10–12 months decline at 10–12 months
fricative voicing [Eilers] [irrelevant to hypoth.] failure at both ages failure at both ages
PAM/AO [Best/Goldstein] very good at both ages decline at 10–12 months decline at 10–12 months

3.1 Method

Subjects

The final data set included 15 infants at 6–8 months [Mage = 7 months 5 days, range = 5 months 28 days to 7 months 25 days] and 14 infants at 10–12 months [Mage = 11 months 4 days, range = 10 months 2 days to 12 months 28 days]. Criteria for inclusion were as described in Experiment 1. An additional 48 infants were excluded from the final set [22 at 6–8 months and 18 at 10–12 months].

Stimulus materials

The non-native contrasts were the isiZulu lateral fricative contrast from Experiment 1, and the Tigrinya bilabial versus alveolar ejective stops [p′]–[t′] produced by a male native speaker from Eritrea [Ethiopia]. The native contrast was English /s/-/z/produced by a female native speaker [author CTB] [Best et al., 2001]. The native contrast was chosen to be similar, acoustically and articulatorily, to the isiZulu lateral fricatives: a fricative voicing contrast involving a tongue constriction gesture. Both the Tigrinya and the English syllables employed the same vowel [[E]] as the isiZulu syllables. Thus, any differences in discrimination among the three contrasts would be due to the consonants and not to vowel or phonotactic effects. As before, there were six tokens per category, matched as closely as possible between the contrasting syllables of each pair for overall duration, fundamental frequency and contour, and vowel formant frequencies [see Best et al., 2001].

The primary acoustic differences between the Tigrinya ejectives were in the spectrum, duration, and amplitude of the release bursts. The English fricatives differed primarily in voicing and amplitude of the fricatives and F0 and F1 onset frequencies, comparable to the difference between the isiZulu lateral fricatives.

Procedure

We employed the same procedure as in Experiment 1, except that the habituation criterion was made more stringent in order to assure full habituation, which in turn optimizes the likelihood of response recovery during the test phase. Optimizing the chance of response recovery during the test phase was crucial, given predictions that the older infants should show a decline, relative to younger infants, in discriminating one or more of the contrasts. The three highest-looking trials of the familiarization phase were used to calculate the habituation criterion, rather than only two trials as in Experiment 1. Also, three consecutive trials with looking durations below the habituation criterion [rather than only 2 such trials] were required for the shift to the test phase. We note that this more stringent habituation criterion may actually increase the possibility of spontaneous response recovery, that is, of spurious evidence for discrimination. This observation is particularly relevant to older infants’ performance on the two fricative voicing contrasts.8

3.2 Results

Interobserver reliability

The data for a random selection of 13 infant subjects [i.e., 39 individual tests] were second-observed from the session videotapes, as in Experiment 1. Rank-order correlations of the per-trial looking times registered by the first and second observers indicated that interobserver reliabilities were excellent [Mr = 0.98, range = 0.91 to 0.99].

Discrimination results

Discrimination was assessed as in Experiment 1, using an Age [6–8 months, 10–12 months] × Stimulus Contrast [Zulu, Tigrinya, English] × Trial Block [preshift, postshift] ANOVA. Test order was not included as a factor because preliminary analyses showed that it did not have any systematic effect on discrimination. Trial Block [Preshift vs. Postshift] was the only significant effect, F[1, 28] = 50.24, p

Chủ Đề