Which of the following auditory cues can be used to perceive the distance of a sound source by the listener?

Introduction

Perceiving accurately the location of a sound source is an essential capability of the human hearing system, enhanced through selective pressure due to its survival value [when the source is out of view or occluded, the auditory modality often plays a crucial role on assessing the location of the source]. In addition to the perceived source direction, human hearing is sensitive to the source distance.

Our everyday experience shows us that there are large variations in the stimulus intensity and quality depending on the distance to an acoustic source that are potential cues for distance estimation. Among those, sound intensity is a primary cue based on the variation of this magnitude following the inverse-squared distance law in the free field [Coleman, 1963]. In reverberant environments, there is also a systematic relation between the distance to the source and the reverberation amount relative to the level of the direct sound [energy that is transmitted directly from the source to the listener without interacting with any surfaces of the environment; Mershon and Bowers, 1979], leading to another relevant distance cue: the direct-to-reverberant energy ratio [DRR]. In addition, cumulated evidence shows that, for near-field sources located outside the median plane, auditory distance perception [ADP] relies on low-frequency interaural level differences, an acoustical cue that rapidly increases its relative importance when the source approaches within 1 m of the listener's head [Brungart, 1999; Brungart and Rabinowitz, 1999; Brungart et al., 1999; Kopčo and Shinn-Cunningham, 2011]. Finally, several studies have shown that spectral cues have a relevant influence on ADP both in the near field [Levy and Butler, 1978; Brungart, 1999; Brungart et al., 1999; Kopčo and Shinn-Cunningham, 2011] and in the far field [Coleman, 1968; Lounsbury and Butler, 1979; Butler et al., 1980; Petersen, 1990; Little et al., 1992]. Both near- and far-field spectral cues are described in detail in the following section.

Spectral Cues

Spectral content provides a physically measurable cue for ADP only for very short [15 m] distances to the source. In the first case, the diffraction of sound around the head causes a relative low-to-high frequency gain as the source approaches the listener [Brungart and Rabinowitz, 1999], providing a reliable cue to perceive distance in the near field. For long distances, as a sound wave propagates through the atmosphere, high-frequency components become more attenuated than low-frequency ones due to heat conduction, shear viscosity, and relaxation losses [Bass et al., 1995], low-pass filtering sound coming from distant sources. However, this effect is moderate [15 m for a listener to detect changes in the sound spectrum [Ingard, 1953; Blauert, 1997]. No measurable ADP cues were studied for sounds located in the range 1–15 m [Kolarik et al., 2015] since low-frequency head-diffraction-induced changes are too small to be detected for distances over 1 m and the sound has not traveled far enough for the high-energy loss to be detected for distances 1.2 ms after the first peak, was considered as reverberant field. This time was chosen such that it separates the direct sound from all reflections in the BRIR [including the floor reflection] for all source distances.

Once the direct and reverberant fields were separated, we calculated the DRRs by convoluting each portion of the BRIR with the filtered noise bands, and then computing the ratio between the total energy contained in the two portions. Finally, we obtained a unique value of the DRR for each position and type of stimulus by averaging the ratios obtained for each ear.

One of the main concerns when dealing with DRR calculation in a noisy environment is not to consider the background noise as reverberant field. In order to assess if it was our case we calculated the cumulative energy decay functions [Schroeder, 1979] for pink noise [Figures 2B,C for the left and right ear, respectively]. In these curves, even for the worst case [farthest source distance], the energy decays almost 40 dB before flattening. As a secondary check, we recalculated all the DRR values cutting the IRs at a time in which all the curves are still decaying [150 ms] and found that the differences were almost negligible [mean = 4.4 × 10−3 dB, SD = 5.0 × 10−3 dB, range = 0–2.1 × 10−2 dB].

Results

Psychophysical Experiments

Experiment 1: Apparent Source Distance for Pure Tones

Eight subjects participated in the experiment [7 men, Mage = 27.0 y.o., SDage = 8.3 y.o.]. The across-subject mean of the logarithm of subjective distance judgments in response to pure tones of 0.5, 1, 2, and 4 kHz are shown in Figure 3B as a function of the physical distance to the source. For 0.5- and 1-kHz tones there is a slight increase of the response with the distance, however, it is not clear that a monotonic relation exists between the physical and perceived distance. For example, for a pure tone of 0.5 kHz subjects did not differentiate the distance to sources located at 1 or 4 m [responses were 1.95 ± 0.61 and 2.16 ± 0.64 m, respectively]. The same occurs for 1-kHz tones located at 2, 4, and 6 m [3.74 ± 1.29, 3.81 ± 0.63, and 3.58 ± 1.05 m, respectively]. In contrast, as we increase the stimulus frequency [2- and 4-kHz tones] the results show a monotonically-increasing relation between the physical and perceived distance. Moreover, for stimuli of 2 and 4 kHz subjects tend to underestimate, in average, the distance to the source for D > 2 m and D > 1 m, respectively.

FIGURE 3

Figure 3. Results of Experiment 1. [A] Pearson linear correlation coefficient [r] between source distance and response distance [in log-scale] as a function of the stimulus type. Black symbols represent across-subject averages, shaded symbols show the individual subject data and bars denote standard errors. [B] Across-subject mean of the log-scale subjective distance judgments as a function of the physical distance to the source. Perfect performance is indicated by a black dashed line. [C] Across-subject average of the slopes obtained by means of least-squares linear fits between the source and the individual response distances in logarithmic scale. [D] Across-subject average of the responses' standard deviation as a function of the physical distance to the source. [E] Standard deviations collapsed over distances averaged across subjects. In [C,E] bars denote SEM.

In order to quantify the performance of the subjects for each frequency, we calculated the Pearson linear Correlation Coefficient [r] between individual responses and target distances in logarithmic scale [see Brungart, 1999; Brungart et al., 1999; Kopčo and Shinn-Cunningham, 2011], which are displayed in Figure 3A. The results show that the correlation coefficient increases with stimulus frequency. A within-subjects ANOVA with “frequency” as fixed factor revealed an effect of the stimulus frequency on the correlation coefficients [F[3, 21] = 3.8, p = 0.026 and ηp2 = 0.35 with 90% CIs [0.029; 0.50]] along with a fairly large effect size, indicating that nearly 35% of the total variance observed in r is due to changes in the frequency.

The correlation coefficient depends on the magnitude of the change of the response with the distance, the variability of the response, or a variation of both [Brungart et al., 1999; Kopčo and Shinn-Cunningham, 2011]. To examine the cause [or causes] of the effect of the frequency of the pure tones on the correlation coefficients, we analyzed the values of these variables for each kind of stimulus.

To quantify the magnitude of the change we obtained the slopes corresponding to least-squares linear fits between the source and the individual response distances in logarithmic scale. In Figure 3C we show the between-subjects average of the slopes [bars denote SEM]. The lowest slope value was obtained in response to the 0.5-kHz pure tone [0.33 ± 0.07 m]. For the remaining stimuli, the slopes were fairly similar [0.50 ± 0.08, 0.52 ± 0.09, and 0.52 ± 0.08 m for 1, 2, and 4 kHz, respectively]. A within-subjects ANOVA with factor “frequency” was performed on the slopes resulting in no significant effect [F[3, 21] = 2.2; p = 0.12].

In Figure 3D, averages of the responses' standard deviations for each subject are shown as a function of the distance to the source. If we collapse the variability across distances [i.e., if we average the standard deviation of the response over all distances], we obtain a single measure of the variability for each stimulus. Collapsed variability is shown in Figure 3E. A within-subjects ANOVA with factor “frequency” resulted in no significant effect of this factor [F[3, 21] = 0.169; p = 0.92] on the collapsed variability.

Experiment 2: Apparent Source Distance for Noise-Bands

The aim of this experiment was to study whether the center frequency and bandwidth of auditory stimuli affects ADP. Details of the stimuli spectral content are shown in Table 2. Fifteen subjects participated in the experiment [8 men, Mage = 25.5 y.o., SDage = 5.6 y.o.], none of which participated in Experiment 1.

Figures 4A–D show the across-subject mean subjective distance judgments in response to filtered pink noise bands centered at 0.5, 1.5, and 4 kHz [Figures 4A–C, respectively] with bandwidths 1/12, 1/3, and 1.5 octave; and to pink noise [Figure 4D], as a function of the physical distance to the source [both in log scale].

FIGURE 4

Figure 4. Results of Experiment 2 [average distance and variability]. [A–D] Across-subject mean of the logarithm of the subjective distance judgments as a function of the physical distance to the source in response to bands centered at 0.5, 1.5, and 4 kHz [A–C, respectively] and pink noise [D]. Perfect performance is indicated by a black dashed line. [E–H] Averages of the responses standard deviations obtained in response to bands centered at 0.5, 1.5, and 4 kHz [E–G, respectively] and pink noise [H]. The code symbols of different bandwidths are red diamonds, blue squares, and green circles for 1/12, 1/3, and 1.5 octave, respectively.

The response obtained with PN was accurate for the first three distances while the distance to the source was underestimated for distances D = 4, 5, and 6 m. The responses for the remaining conditions were less accurate and showed a common pattern: distance was underestimated for sources farther than 1 m except for bands centered at 4 kHz, for which the responses show underestimation at all tested distances. Stimuli centered at 0.5 kHz showed a slight increase of the response with the distance and, unlike the bands centered at 1.5 and 4 kHz, they did not show a monotonic increase with increasing distance from the sound source. Responses for bands centered at 0.5 kHz with bandwidths 1/12 and 1/3 oct. show a non-homogeneous increase, with jumps and even decreases in the perceived distance when the distance from the source also increases. For example, the reported distance for 1/3-oct. bands was 2.11 ± 0.61 and 1.65 ± 0.47 m for source distances of 2 and 3 m, respectively.

Performance, measured as the Pearson linear correlation coefficient [r] between individual responses and source distances in log scale for each stimulus, is shown in Figure 5A. The results of the analysis show a consistent effect, as observed in Experiment 1: r values are lower for bands centered at 0.5 kHz [mean across bandwidth = 0.759, 95% CIs [0.754, 0.763]] than for bands centered at 4 kHz [mean across bandwidth = 0.861, 95% CIs [0.857, 0.865]]. The greatest value of r was obtained in response to PN [mean = 0.916, 95% CIs [0.894, 0.934]]. Interestingly, the values of r obtained in response to bands centered at 0.5 kHz [r seems to decrease with the bandwidth] and 1.5 kHz [r seems to increase with the bandwidth] suggest an opposite effect of bandwidth on performance.

FIGURE 5

Figure 5. Results of Experiment 2 [statistical analysis]. [A] Pearson linear correlation coefficient [r] between source distance and response distance [in log scale] as a function of the stimulus type. Across-subject averages in response to noise bands centered at 0.5, 1.5, and 4 kHz and pink noise are indicated with red diamonds, blue squares, green circles, and a black triangle, respectively. Shaded symbols show the individual r's. [B] Across-subject average of the slopes obtained by means of least-squares linear fits between the source and the individual response distances in logarithmic scale. [C] Across-subject standard deviations collapsed over distances. In [B,C] symbols correspond to the same conditions described in [A]. In all panels bars denote SEM.

A two-way, repeated-measures ANOVA with within-subjects factors “center frequency” and “bandwidth” was performed on the correlation coefficients. The test yielded a significant main effect of the center frequency [F[2, 28] = 16, p = 2.2 × 10−5, η p2 = 0.54, 90% CIs [0.28, 0.65]] but not of the bandwidth [F[2, 28] = 1.1, p = 0.33] nor the interaction [F[4, 56] = 2.4, p = 0.057]. Since there is a statistical tendency in the interaction, we performed a linear regression analysis to characterize the influence of the bandwidth on the correlation coefficient for each center frequency. For the 1.5-kHz bands we obtained a positive [and significantly non-zero] slope [mean slope = 0.229, SEM = 0.0638, p-value = 0.0030] but for the 0.5- and 4-kHz bands the slope was not significantly different from zero [500 Hz: mean slope = −0.0797, SEM = 0.0515, p-value = 0.14; 4 kHz: mean slope = 0.0340, SEM = 0.0689, p-value = 0.63]. These results indicate that, while for 0.5- and 4-kHz bands there is no observable effect of the bandwidth, for 1.5-kHz bands an increase in bandwidth entails an increase in the correlation coefficient.

Between-subjects average of the slopes are shown in Figure 5B. Like for the r values, the response for bands centered at 0.5 and 4 kHz presented the lowest and the highest slope values [with the exception of PN], respectively, and the slopes in response to bands centered at 0.5 kHz seem to decrease with the bandwidth while they seem to increase for bands centered 1.5 kHz. Finally, the slope obtained in response to bands centered at 4 kHz does not seem to depend on the bandwidth of the auditory stimulus.

A two-way, repeated-measures ANOVA with within-subjects factors “center frequency” and “bandwidth” was performed on the slopes. Similarly to that observed for r values, we found an effect of the center frequency [F[2, 28] = 10, p = 4.2 × 10−4, ηp2 = 0.43, 90% CIs [0.16, 0.57]] but not of the bandwidth [F[2, 28] = 0.69, p = 0.51]. However, in this case we found a significant effect of the interaction between the factors [F[4, 56] = 5.4, p = 9.8 × 10−4, ηp2 = 0.28, 90% CIs [0.043, 0.44]]. The simple-effect analysis [Holm–Bonferroni corrected; Holm, 1979] showed a significant effect of the frequency for bands with 1.5 oct. bandwidth [F[2, 28] = 16, p = 1.8 × 10−5, ηp2 = 0.54, 90% CIs [0.29, 0.66]]; and of the bandwidth for bands centered at 0.5 kHz, [F[2, 28] = 7.43, p = 2.6 × 10−3, ηp2 = 0.35, 90% CIs [0.092, 0.50]].

In Figures 4E–H averages of standard deviations of the response of each subject are shown. We collapsed the variability between distances to obtain a single measure of the variability for each stimulus [Figure 5C]. Stimuli centered at 1.5 kHz showed the lowest values of SD, while stimuli centered at 4 kHz showed the highest. The SD obtained with stimuli centered at 0.5 kHz showed intermediate values.

A two-way, repeated-measures ANOVA with within-subjects factors “center frequency” and “bandwidth” was performed on the collapsed SD, which revealed a significant effect of the center frequency [F[1.35, 18.9] = 4.13, p = 0.046, ηp2 = 0.23, 90% CIs [0.002, 0.434]] but not of the bandwidth [F[2, 28] = 1.21, p = 0.31]. The analysis also showed a non-significant effect of the interaction between the factors [F[4, 56] = 0.634, p = 0.64].

Results obtained in Experiment 2 show an effect of the frequency on the three obtained measures of the performance: correlation coefficient, slope and intra-subject standard deviation. Although both the slope and the standard deviation were significantly affected by the frequency of the stimuli, the fact that the frequency induces a similar trend in the correlation coefficients and the slope suggests that the observed effect of the frequency on the correlation could account mainly for the changes in the compression of the response. Regarding the effect of the bandwidth on the responses, the results were ambiguous. First, analysis of variance performed on the correlation coefficients showed a non-significant main effect of bandwidth. However, the statistical tendency in the interaction observed in the correlation coefficient in Experiment 2, the positive slope in the linear regression analysis for 1.5-kHz bands, and the significant effect of bandwidth on the slope for 0.5-kHz bands indicate that further analysis is needed to better understand the influence of bandwidth on the subjects' responses.

Comparisons with Pink Noise as a Control

The results of Experiment 2 showed that the most accurate response was obtained for PN, showing a greater correlation and slope and a lower standard deviation across subjects. In this section, we compare the responses obtained in Experiment 2 for PN [control stimulus] with the response obtained for the 9 filtered noise bands. We performed two-tailed paired t-tests on the correlation coefficients using Dunnett's test for controlling the family-wise type I error [Dunnett, 1964], which is an appropriate procedure for comparing several treatments against a control. Results are displayed in Table 3. All except the 1.5-kHz, 1.5-oct. band were significantly different from the control, which is consistent with a non-additive effect of frequency and bandwidth on the correlation.

TABLE 3

Table 3. Comparisons with pink noise as a control.

ADP Acoustical Cues

Binaural Intensity

The binaural intensity for PN and the filtered bands are plotted as a function of distance on Figure 6. Since sound intensity can be considered as a relative ADP cue [Mershon and Bowers, 1979], the global intensity for all bands was set as 0 dB at 1 m, letting us to focus on the relative decay instead of on the absolute values.

FIGURE 6

Figure 6. Acoustical magnitudes. [A–D] Binaural intensity and [E–H] direct-to-reverberant energy ratio [DRR] as a function of the physical distance to the sound source. In the rightmost panels [D,H] we can see these magnitudes for a pink noise stimulus. The rest of the panels [A–C,E–G] present the results for the pink noise filtered bands used as stimuli in Experiment 2 ordered by increasing center frequency. The code symbols of different bandwidths are red diamonds, blue squares, and green circles for 1/12, 1/3, and 1.5 octave, respectively. In panels [A–D] the curves were normalized to BI = 0 dB for the nearmost source.

For the 1/12- and 1/3-oct. bands with center frequency of 0.5 kHz [Figure 6B], the BI shows a non-monotonous decay with abrupt jumps and even increases of intensity with distance. For example, for the 1/3-oct. band, when the source changes from 2 to 3 m, the BI increases 0.45 dB instead of decreasing. This behavior, that seems counterintuitive at first sight, is related to the existence of prominent and sparse modal resonances in the lower part of the frequency response of the room. As the frequency of the sound source is lowered, there are two effects general to all rooms, that cause the modal resonances to be more noticeable: On the one hand, the sound absorption at the walls, floor, and ceiling is reduced for lower frequencies, therefore the resonance peaks in this part of the spectrum become narrower in bandwidth and higher in amplitude and, on the other hand, the number of modal resonances per octave of the room decrease as the frequency is lowered [Kuttruff, 2016]. Hence the frequency response of the room is not uniform for the lower frequency region. Moreover, for sounds with frequencies corresponding, or neighboring, to the modal-resonance frequency, where a standing wave is excited, the spatial distribution of the energy is also non-uniform. The standing wave creates spatial regions with peaks [antinodes] and dips [nodes] in the sound intensity. As a consequence, when the source emits low-frequency narrowband noise, the room response will be non-uniform both in frequency and space, since only a few modal resonances will be excited. In that situation, if the listener is seated close to a node or antinode of the created standing waves, the BI will be significantly lowered or increased, respectively, compared to the neighboring region.

For the 1/12-oct. bands, as the central frequency increases, the decay becomes more homogeneous because: [1] The intensity of the reverberant field does not account much on the global intensity due to the reduction of the reverberation for frequencies above 0.5 kHz; and [2] the resonances of the room become more dense and wide, hence the frequency response of the room becomes more homogeneous both spatially and spectrally. A transition is present for the 1.5-kHz center frequency [Figure 6C], where we can see an almost homogeneous BI decay for all distances. For stimuli with center frequency of 4 kHz [Figure 6D] the BI decay with the distance is almost independent of the bandwidth.

Direct-to-Reverberant Energy Ratio

The DRR for PN and each noise band are plotted on Figures 6E–H as a function of the physical distance to the sound source. As it was exposed in Section Experiment 2: Apparent Source Distance for Broadband Noises, the room presents stronger reverberation for frequencies below 0.5 kHz and this contributes to lower the values of the DRR [the energy on the reverberant field becomes higher in proportion] for stimuli containing energy below this frequency. This is the main reason why, for a fixed bandwidth, as the center frequency becomes higher, the DRR values increases. Also, when the bandwidth increases the reverberant field gets more constant across distances yielding to a more homogeneous decay of the DRR.

Binaural Room Frequency Response

In order to corroborate that the prominence of the resonances of the room for the lower frequencies is the main cause of the non-homogeneous decay of the BI, we calculated the frequency response of the room at the listener's ears: the binaural room frequency response [BRFR]. In Figure 7A we display the BRFR of the room at the listener position for the six positions of the source, along the frequency range corresponding to the stimuli. The BRFR was obtained for each source position after Fourier transformation of the binaural room impulse response [BRIR] measured with the dummy head, as follows:

FRl,r[f]=1L∫0Lhl,r[t]e-i2πftdtBRFR[f]=10log10[FRl[f]FR r[f]]

where hl,r[t] correspond to the BRIR. This magnitude is computed in dB using an arbitrary reference and corresponds [up to a fixed constant in dB] to the BI elicited at the listener position for each frequency component in the room excited by the source for a given location. If a modal resonance is near that frequency component, it is expected that the BRFR will display large variations in its magnitude depending on the position, showing peaks if the source and listener are close to antinodes of the resulting standing wave, or valleys if they are close to nodal positions of the standing wave. This was the case of the low-frequency range, as can be appreciated in the two top panels of Figure 7A where the BRFR curves display strong variations in magnitude, with differences between peaks and valley as high as 30 dB. As the density of the modal resonances gets higher and the resonance peaks become shallower and overlap in the BRFR, the curve becomes smoother [see lower panel in Figure 7A].

FIGURE 7

Figure 7. Analysis of the binaural room frequency response. [A] Binaural Room Frequency Response [BRFR] in dB, measured at the listener position for the six distances of the source [1–6 m], centered in the three stimulus frequencies [0.5, 1.5, and 4 kHz] in log scale. The vertical lines indicate the limits of the three stimulus bandwidths [1/12, 1/3, and 3/2 octave]. The BRFR was obtained after Fourier transformation of the binaural impulse responses. [B–D] Integrating the BRFR in linear scale along the three different bands an independent measurement of the BI can be obtained. These BI values can be compared to the corresponding BIs obtained from the stimuli [Figures 6A–C]. Approximated limits of the integration are shown as arrows at the bottom of [A]. The code symbols of different bandwidths are red diamonds, blue squares and green circles for 1/12, 1/3, and 1.5 octave, respectively.

From the curves displayed in Figure 7A it can also be seen that the narrower bandwidths were much more sensitive to the BRFR fluctuations. For the 1/12-oct. bandwidth and the two lower central frequencies, for example, a single normal mode of the room can alter the “normal” arrangement of the curves [from lower to higher distances]. In this way, for such bands the BI dependence with distance can become non-monotonic. This can be tested by integrating the BRFR along each band. The resulting magnitude corresponds [up to a fixed constant in dB] to the BI for that band. These results are displayed in Figures 7B–D and can be compared to the corresponding BIs obtained from the recorded stimuli [Figures 6B–D]. From this comparison, the non-monotonic BI curves [1/12 octave bandwidth for 0.5 and 1.5 kHz] obtained from the stimuli can be explained from the non-monotonic behavior of the integrated BRFR for the corresponding frequency bands.

Correlations between Acoustical Cues and Subjects' Responses

To evaluate the relation between the previously obtained distance cues [BI and DRR] and the subjects' responses we calculated the partial correlation coefficients between the distance-dependent values of the cues and the mean distance responses [in logarithmic scale] of each subject. Partial correlation is the correlation between a given predictor variable and the dependent variable while holding contributions of all other predictor variables constant, and is required in this application because the predictor variables [BI and DRR] present a high degree of correlation with values ranging from 0.657 for the 0.5-kHz, 1/12-oct. band, to 0.997 for pink noise [multicollinearity].

In Figure 8A we show the across-subject average of the individual partial correlation coefficients between the subjects' log-responses and the binaural intensity controlling for the DRR; and between the log-responses and the DRR controlling for the binaural intensity, for each band and for PN. Two observations apply to all results. First, the BI showed a majority of negative correlation coefficients, which indicates that less intense noises are consistently associated with farther distances; and second, the DRR showed a more inhomogeneous pattern among noise bands, being positive in some cases, therefore not indicating a clear relation between the magnitudes.

FIGURE 8

Figure 8. Partial correlation analysis. [A] Partial correlation coefficients between the log-scaled response and the: binaural intensity [open symbols]; and DRR [filled symbols], for noise bands centered at 0.5, 1.5, and 4 kHz [red diamonds, blue squares, and green circles, respectively] and pink noise [black triangles]. Error bars indicate confidence intervals at the 5% level. Asterisks indicate statistically significant [non-zero] correlations, analyzed by means of two-tailed one-sample t-tests. The overall level of significance was 5% adjusted by means of the Holm–Bonferroni correction. [B] Partial correlation coefficients showed in [A] represented as quasi-ellipses along x-axis [BI partial-r] and y-axis [DRR partial-r]. The quasi-ellipses are centered on the mean partial correlation coefficients and the semi-axes indicate the confidence intervals at the 5% level. In [B] the center frequency is color-coded as in [A], and the bandwidth is indicated as follows: diamond = 1/12 oct., square = 1/3 oct., triangle = 1.5 oct., and circle = PN.

In order to test whether the partial correlation coefficients were different from zero, we performed a set of two-tailed one-sample t-tests for each noise band [including pink noise] on the individual data. We found that the partial correlation is consistently lower than zero for BI [the only exception being PN] but not for DRR [the only exception being the 1.5-kHz, 1/3-oct. band], suggesting that, under the conditions of Experiment 2, the BI had a stronger and more reliable relation with the logarithm of the responses than the DRR.

A particularly striking case of this difference in the partial correlation coefficients occurs for the 0.5-kHz, 1/12-oct. band, where the correlation between BI and DRR is the lowest [r = 0.66, low collinearity]. For this noise band the perceived auditory distance shows a non-monotonic increase with the distance. A similar non-monotonic behavior is observed for the binaural intensity as a function of distance, but not for the DRR curve. Moreover, the partial correlation between the log of the responses and the BI is high [rBI = −0.91] while is low for DRR [rDRR = −0.19], suggesting that the non-monotonic behavior of the responses can be explained by the non-monotonicity of the BI. For example, listeners could not perceive differences in distance for sources located at 3 and 2 m [mean reported distance 1.77 m, 95 % CIs [1.24, 2.30]; and 1.94 m, 95 % CIs [1.41, 2.47], respectively, p = 0.63] despite the above-threshold [Larsen et al., 2008] change in DRR [BI = 0.71 and −3.11 dB, respectively]. This suggests that the response could be explained by the fact that when the sound source gets farther, moving from 2 to 3 m, the BI remains almost equal [BI = −7.37 and −7.70 dB, respectively].

Another way of organizing the data is to plot the partial correlation for one cue against the other. This is shown in Figure 8B. This representation exposed the general pattern of association across stimuli. A two-tailed one-sample t-test showed that the average partial correlations between responses and BI are different than zero [t[9] = −8.4, p = 1.5 × 10−5, Cohen's d = −2.65, 95% CIs [−3.95, −1.21]] while this is not the case for the DRR [t[9] = −2.1, p = 0.07]. These results indicate that subjects tend to rely more consistently on the binaural intensity in order to judge the distance to a sound source.

Discussion

The results obtained in this work indicate that the spectrum of a sound can significantly affect ADP of far-field [1–6 m] sound sources located in reverberant environments. Results of both psychophysical experiments showed an effect of the stimulus' frequency on the response for both pure tones and filtered noise bands: ADP was less accurate for stimuli containing energy mainly in the low-frequency range. In agreement with this, the three performance measures studied in Experiment 2 [correlation coefficient, slope, and standard deviation] were significantly affected by the center frequency of the auditory stimuli.

Unlike the clear effect of the center frequency, the effect of bandwidth was less straightforward. The results of Experiment 2 showed a non-significant main effect of bandwidth on the correlation coefficient. Similar results were obtained in the near field by Brungart [1999] and Kopčo and Shinn-Cunningham [2011]. However, two complementary analysis suggested an effect of bandwidth on the response. First, for 1.5-kHz bands the correlation coefficient significantly increased with the bandwidth, as demonstrated by the positive slope obtained in the linear-regression analysis performed in Section Experiment 2: Apparent Source Distance for Broadband Noises. Second, the analysis of the slopes performed in Experiment 2 showed a significant effect of bandwidth for 0.5-kHz bands [the slope decreased as the bandwidth increased]. These results indicate that, depending on the band center frequency, an increase in bandwidth induces different effects on the apparent distance of the source. The question arising from this observation is whether this effect is due to the change in the bandwidth per se, or to the inclusion and exclusion of certain frequency ranges in the signal as a consequence of changing the bandwidth. A rationale for the second hypothesis can be supported by the fact that increasing the bandwidth can enhance ADP performance, as seen in Figure 5A for 1.5-kHz bands, but can also worsen the response, as seen in Figure 5B for 0.5-kHz bands. The difference between the 0.5- and 1.5-kHz bands of the same bandwidth lies in the frequency region covered in each case. From the psychophysical data one could infer that the frequencies included when increasing bandwidth are beneficial for certain bands, while are detrimental for others.

Indeed, from the results displayed in Figure 7A it turns out that increasing the bandwidth has a different effect for 0.5-kHz bands compared to 1.5-kHz bands. Modal resonances are sparser at low frequencies and consequently the frequency response is more erratic in this range. This is most evident in the low-frequency region of the response for the 1.5-oct. band centered at 0.5 kHz, where the ordering of the frequency response curves with respect to source distance seems almost capricious [e.g., in some regions of the spectrum the frequency responses for sources located at D = 2 and 3 m is higher than for D = 1 m]. Therefore, adding this frequency region, as the bandwidth increases, is certainly of little benefit for the reliability of binaural intensity as an accurate ADP cue. In fact, even when by increasing the bandwidth the BI decay with source distance becomes more monotonic, the bandwidth increase also entails a decrease in the slope of the integrated BI decay [Figure 7B] which can be linked to the effect of the bandwidth on the slope of the ADP response for 0.5-kHz bands [see Figure 5B]. For the case of the 1.5-kHz bands, the possible explanation is less clear, since the ordering of the frequency-response curves with respect to source distance is also non-monotonic in certain regions. However, it is noteworthy that a substantial dip in the frequency response falls within the 1/12-octave bandwidth, hence compressing the response curves and reducing the possibility of discrimination between them. Therefore, for the 1.5-kHz bands, it is reasonable to expect an improvement in ADP as the bandwidth is increased. This is reflected in a clear increase in the linearity of the integrated BI decay [Figure 7C] while increasing the bandwidth from 1/12 to 1/3 oct. For 4-kHz bands, and since the frequency response shows a more homogeneous behavior, there is a much less significant increase of linearity of BI decay with bandwidth [Figure 7D].

Nevertheless, an increase in high-frequency content is not sufficient to obtain the ADP responses comparable to PN. As revealed by comparing the bands from Experiment 2 with PN, the performance for the stimuli with the higher frequency content [4-kHz bands] was lower than for PN, while the higher correlation coefficients were obtained for the 1.5-kHz, 1.5-oct. band and, certainly, PN. The common characteristic of these stimuli is that they contain energy both in the low [2 kHz] regions of the audible spectrum [PN from 0.02 to 20 kHz, and the 1.5-kHz, 1.5-oct. band from 0.89 to 2.82 kHz]. This result shows that, in order to obtain an accurate perception of the auditory distance in a room, not only high-frequency, but also low-frequency components are required. The former requirement allows the sound level to be less affected by the modal resonances of the room, and therefore provides a consistent [i.e., decreasing and monotonic] relation between target distance and sound intensity, while the later requirement contributes to the reverberant energy, reinforcing the DRR cue. This also implies that the minimum bandwidth required to obtain good ADP performance depends on the central frequency of the stimulus, since the 1.5-kHz, 1.5-oct. band showed a response similar to PN, while the 0.5- and 4-kHz bands of the same bandwidth did not.

In relation to the influence of the acoustical cues involved, the partial-correlation analysis suggests that, regardless of stimulus frequency and bandwidth, participants relied mostly on the BI rather than on DRR. This occurs even though the variation of the BI could lead to misjudgments of the source distance, and even though the variation of the DDR over the entire range of target distance was largely above threshold. This effect was evident in the response to 0.5-kHz bands where non-monotonic changes of the BI correlate well with the response. However, although BI appears to be a good candidate to explain the frequency-dependent effect of room resonant modes on ADP, the correlational approach of our analysis does not allow us to be conclusive about the exact contribution of DRR and BI in the obtained response. We consider then that future studies would be necessary where each of these cues could be manipulated in isolation to accurately study how BI and DRR are affected by the sound spectrum for far-field sources located in reverberant environments.

Relation to Past Results

A direct comparison of our results with previous literature is not straightforward due to differences in methodology, stimuli characteristics, and acoustical cues involved. Although none of the studies that used several stimuli of different spectrum in both the near and the far field considered both the intensity and the DRR cues simultaneously, it is nonetheless interesting to look for coincidences and differences with our results.

Two rigorous studies where ADP was measured at different distances in response to various stimuli of different spectrum were conducted in the near field by Brungart [1999] and Kopčo and Shinn-Cunningham [2011]. Although both studies show an effect of frequency on ADP, the reported effect was exactly the opposite to that obtained here: the correlation coefficient between distance and response was smaller for high-frequency stimuli than for the low-frequency ones [particularly for sources in front of the listener]. In addition, both studies did not find a relationship between bandwidth and the response of listeners. These studies also show that the relative importance of the acoustical cues depends both on their availability and reliability. For example, Brungart [1999] found that amplitude-related cues dominate ADP in the median plane, while outside the median plane the distance perception depends primarily on low-frequency binaural cues. On the other hand, Kopčo and Shinn-Cunningham [2011] found that the response in a virtual semi-reverberant environment can be explained by assuming a simple relationship between the near-ear DRR and the mean distance judgments. It is difficult to compare the results of these studies with those obtained here mainly because both were made in the near field [where the low frequency ILD cue dominates] and the stimulus intensity was roved [with exception of the broadband stimulus in Brungart, 1999], excluding intensity from the available acoustical cues. In contrast, our results showed that, for the far field, stimuli containing only low-frequency components induced the lowest values of correlation coefficient.

As in all preceding studies, we found a systematic underestimation of the source distance for high-frequency stimuli [4-kHz bands, Figure 4C]. There are two possible reasons to explain this underestimation. The first one is related with the decrease of high-frequency content, relative to low-frequency, when the sound travels through air. It is possible then that, like that reported by Coleman [1968], listeners have associated high-frequency stimuli with shorter distances to the source. The same hypothesis was elaborated by Butler et al. [1980]. Although, in order to decrease this effect, we tested each type of stimulus in separate blocks, we cannot rule this hypothesis out. Another possible explanation is that the underestimation for 4-kHz bands was induced by the lower amount of reverberation [compared to wide-band noise] caused by the frequency response characteristic of the room [see Table 1]. Previous studies have reported a systematic relationship between perceived distance and reverberation; therefore, the lower the reverberation, the closer the source is perceived. This explanation was also posed by Butler et al. [1980] to explain the greater effect of the frequency on the apparent distance obtained in an echoic, compared to an anechoic, environment. The results obtained for the 4 kHz bands are interesting, because they suggest that the spectrum can affect the ADP through the amount of reverberation present in the perceived stimulus. Here, the most accurate responses were obtained for stimuli containing energy both in the high and low regions of the audible spectrum, showing that reverberation was an important factor in the ADP response. However, as discussed before, reverberation per se does not guarantee an accurately-perceived distance.

Contrary to what was reported in previous studies [Butler et al., 1980; Nielsen, 1992], our results do not show an overestimation on the perceived distance for low-frequency stimuli. This discrepancy can be partially explained by methodological differences between our experiment and the previous ones. While in previous studies the amplitude of the stimuli was fixed at the ears of the listeners, in our experiment we let the intensity and the DRR vary [as it happens in a real environment] and, as we have shown, listeners' responses were mainly driven by intensity changes. For low-frequency stimuli, the reverberant energy is higher, hence the global intensity at the ears of the subject also increases. Therefore, even when more reverberation would induce an increase of the perceived distance, the rise of the global intensity dominates, inducing subjects to report shorter distances.

Our results showed that the resonant modes of the room strongly affected the apparent distance of the source for low frequency centered stimuli. Interestingly, for these stimuli, the room modes induced non-monotonic BI changes that correlate very well with the listeners' response. Previous works have shown that, in isolation, intensity provides more reliable distance information than DRR [Zahorik et al., 2005; Kolarik et al., 2013]. However, in this case BI was not in isolation. In contrast, stimuli centered at 0.5 kHz induced the highest levels of reverberant energy within the room. Experiments by Kolarik et al. [2013] showed that for broadband sounds the perceptual weight of DRR as an ADP cue considerably increases in highly reverberant environments providing as accurate information as the intensity. Moreover, several studies have shown that ADP is most accurate when both DRR and intensity are available [Nielsen, 1992; Bronkhorst and Houtgast, 1999; Ronsse and Wang, 2012]. While our results do not contradict those obtained in the aforementioned studies, they show that the relative influence of intensity and DRR cues in ADP also depends on the spectrum of the auditory stimulus. This fact is evident in the inaccurate responses obtained with noise-bands centered at 0.5 kHz despite that both the intensity and high levels of reverberation were available for the listeners. Perhaps the reason why DRR was not a reliable ADP cue in our study is that, for 0.5-kHz bands, the spectra of direct and reverberant sound closely resemble each other, making it difficult to discriminate between them. This could induce listeners to interpret reverberation as part of the direct sound. In connection to this possible explanation, there is also a debate about the ability of the nervous system to segregate the direct and reverberant sounds and compute the DRR. It was proposed that the auditory system derives this cue from the physical characteristics of the signal that covary with the direct and the reverberant sound, such as changes in the spectrum, temporal pattern, monaural changes in the spectral centroid or in frequency-to-frequency variability in the signal [Larsen et al., 2008] and interaural coherence [Bronkhorst, 2002]. In this line, our results suggest that for reverberation to be an effective ADP cue, the auditory stimulus must contain energy in both the low and high regions of the spectrum.

Many previous studies support the idea that the presence of reverberation enhances the auditory perception of distance [Mershon and King, 1975; Mershon et al., 1989; Bronkhorst and Houtgast, 1999; Zahorik, 2002a,b; Kopčo and Shinn-Cunningham, 2011; Kolarik et al., 2015]. However, the results obtained here show that, when the frequency and bandwidth of the stimuli are varied, it is not always true that more reverberation leads to a better estimation of the distance to a sound source. For example, we observed that reducing the frequency of the stimuli for a given bandwidth is always detrimental in terms of the accuracy of the response. This detrimental effect is, in part, a consequence of the existence of narrow, sparse and prominent resonant peaks in the frequency response of the room, that causes a non-monotonous behavior of the BI of the stimulus with distance. The magnitude of the effect and the frequency range will depend on the characteristics of the particular room, but for sufficiently low and narrow noise bands it will be very likely to find a negative effect of the reverberation on the accuracy of the distance estimate. Therefore, the benefit of DRR as an ADP cue cannot be generalized. This cue is useful also as long as the listener is able to discriminate between direct and reverberant sound, an issue that is not currently addressed in the literature. Further experiments are necessary to determine the influence of the spectrum on this ability and the effectiveness of the DRR cue for estimating auditory distance.

Author Contributions

IS, PE, and RV designed the study. IS, EC, EA, and RV performed the experiments. IS and ME performed the acoustical recordings and analysis. PE performed the statistical analysis of the behavioral data. IS, PE, ME, and RV wrote the paper.

Funding

This work was supported by grants from Universidad Nacional de Quilmes [UNQ: PUNQ 1394/15] and the Consejo Nacional de Investigaciones Científicas y Técnicas [CONICET: PIP-11220130100573 CO]. Both institutions were not involved in study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

We wish to thank Dr. Ramiro Aguilar for his useful technical help and constructive comments on the manuscript.

References

Antsalo, P., Karjalainen, M., Makivirta, A., and Valimaki, V. [2004, May]. “Perception of temporal decay of low-frequency room modes,” in Audio Engineering Society Convention 116 [Berlin: Audio Engineering Society].

Google Scholar

Bass, H. E., Sutherland, L. C., Zuckerwar, A. J., Blackstock, D. T., and Hester, D. M. [1995]. Atmospheric absorption of sound: further developments. J. Acoust. Soc. Am. 97, 680–683. doi: 10.1121/1.412989

CrossRef Full Text | Google Scholar

Blauert, J. [1997]. Spatial Hearing [Revised Edition]. Cambridge, MA: MIT Press.

Bronkhorst, A. W. [2002, September]. “Modeling auditory distance perception in rooms,” in Proceedings of the EAA Forum Acusticum Sevilla [Sevilla: European Acoustics Association].

Google Scholar

Brungart, D. S., Durlach, N. I., and Rabinowitz, W. M. [1999]. Auditory localization of nearby sources. II. Localization of a broadband source. J. Acoust. Soc. Am. 106, 1956–1968.

PubMed Abstract | Google Scholar

Butler, R. A., Levy, E. T., and Neff, W. D. [1980]. Apparent distance of sounds recorded in echoic and anechoic chambers. J. Exp. Psychol. Hum. Percept. Perform. 6, 745.

PubMed Abstract | Google Scholar

Cohen, J., Cohen, P., West, S. G., and Aiken, L. S. [2003]. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Mahwah, NJ: Routledge.

Google Scholar

Farina, A. [2000]. “Simultaneous measurement of impulse response and distortion with a swept-sine technique,” in Audio Engineering Society Convention 108 [Paris: Audio Engineering Society].

Google Scholar

Fazenda, B. M., Stephenson, M., and Goldberg, A. [2015]. Perceptual thresholds for the effects of room modes as a function of modal decay. J. Acoust. Soc. Am. 137, 1088–1098. doi: 10.1121/1.4908217

PubMed Abstract | CrossRef Full Text | Google Scholar

Holm, S. [1979]. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70.

Google Scholar

Ingard, U. [1953]. A review of the influence of meteorological conditions on sound propagation. J. Acoust. Soc. Am. 25, 405–411.

Google Scholar

Kolarik, A., Cirstea, S., and Pardhan, S. [2013]. Discrimination of virtual auditory distance using level and direct-to-reverberant ratio cues. J. Acoust. Soc. Am. 134, 3395–3398. doi: 10.1121/1.4824395

PubMed Abstract | CrossRef Full Text | Google Scholar

Kolarik, A. J., Moore, B. C., Zahorik, P., Cirstea, S., and Pardhan, S. [2015]. Auditory distance perception in humans: a review of cues, development, neuronal bases, and effects of sensory loss. Attent. Percept. Psychophys. 78, 373–395. doi: 10.3758/s13414-015-1015-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Kuttruff, H. [2016]. Room Acoustics. Boca Raton, FL: CRC Press.

Google Scholar

Larsen, E., Iyer, N., Lansing, C. R., and Feng, A. S. [2008]. On the minimum audible difference in direct-to-reverberant energy ratio. J. Acoust. Soc. Am. 124, 450–461. doi: 10.1121/1.2936368

PubMed Abstract | CrossRef Full Text | Google Scholar

Levy, E. T., and Butler, R. A. [1978]. Stimulus factors which influence the perceived externalization of sound presented through headphones. J. Audit. Res. 18, 41–50.

PubMed Abstract | Google Scholar

Mershon, D. H., Ballenger, W. L., Little, A. D., McMurtry, P. L., and Buchanan, J. L. [1989]. Effects of room reflectance and background noise on perceived auditory distance. Perception 18, 403–416. doi: 10.1068/p180403

PubMed Abstract | CrossRef Full Text | Google Scholar

Mershon, D. H., and King, L. E. [1975]. Intensity and reverberation as factors in the auditory perception of egocentric distance. Attent. Percept. Psychophys. 18, 409–415. doi: 10.3758/BF03204113

CrossRef Full Text | Google Scholar

Nielsen, S. H. [1992]. “Auditory distance perception in different rooms,” in Audio Engineering Society Convention 92 [Vienna: Audio Engineering Society].

Google Scholar

Petersen, J. [1990]. Estimation of loudness and apparent distance of pure tones in a free field. Acta Acust. United Acust. 70, 61–65.

Google Scholar

Ronsse, L. M., and Wang, L. M. [2012]. Effects of room size and reverberation, receiver location, and source rotation on acoustical metrics related to source localization. Acta Acust. United Acust. 98, 768–775. doi: 10.3813/AAA.918558

CrossRef Full Text | Google Scholar

Schroeder, M. R. [1979]. Integrated-impulse method measuring sound decay without using impulses. J. Acoust. Soc. Am. 66, 497–500. doi: 10.1121/1.383103

CrossRef Full Text | Google Scholar

Spiousas, I., Etchemendy, P. E., Vergara, R. O., Calcagno, E. R., and Eguia, M. C. [2015]. An auditory illusion of proximity of the source induced by sonic crystals. PLoS ONE 10:e0133271. doi: 10.1371/journal.pone.0133271

PubMed Abstract | CrossRef Full Text | Google Scholar

Steiger, J. H. [2004]. Beyond the F test: effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychol. Methods 9:164. doi: 10.1037/1082-989X.9.2.164

PubMed Abstract | CrossRef Full Text | Google Scholar

What is the best cue for determining the distance to a sound source?

Intensity works best as a distance cue when the sound source or the listener is moving.

What is auditory distance perception?

Auditory distance perception plays a major role in spatial awareness, enabling location of objects and avoidance of obstacles in the environment. However, it remains under-researched relative to studies of the directional aspect of sound localization.

What cues are used for auditory localization?

Sound localization plays a critical role in animal survival. Three cues can be used to compute sound direction: interaural timing differences [ITDs], interaural level differences [ILDs] and the direction-dependent spectral filtering by the head and pinnae [spectral cues].

What are the distance cues?

any of the auditory or visual cues that enable an individual to judge the distance of the source of a stimulus. Auditory distance cues include intensity of familiar sounds [e.g., voices], intensity differences between the ears, and changes in spectral content.

Chủ Đề