AR / VR Science Note 004: Sound Cues and 3D Localization
Hearing is arguably one of the human body’s most important primary senses. Hearing allows us to communicate with others, is often the first source of information about threats to safety, and reveals extraordinary levels of detail about our surroundings. The careful addition of a spatial audio component to virtual and augmented reality sims can, in a variety of application areas, enhance usability and the overall sense of “presence” within an artificial space.
The Ears Point the Eyes
The human brain has the capability to identify the location of sound sources in the environment with considerable precision. Neuroscientists refer to this capability as localization. At the heart of this capacity is the fact that we have two spatially separated ears, elegant inner ear transducers and sensors, as well as powerful cognitive centers in our brain for analysis of the incoming acoustic signals. Most animals capable of hearing also have this ability, although acuity varies widely. For example, owls have exceptional sound localization abilities, as do elephants, cats, and mice. Conversely, horses, cows, and goats are notable for poor capabilities in this area (Heffner and Heffner, 1992).
There are three primary “cues” identified as used by the human brain to establish the spatial position of sound sources in our environment: interaural time differences (ITD), interaural intensity differences (IID), and spectral cues.
Two of these primary sound cues are most effective in determining the location of sounds in the horizontal plane.
Interaural time difference (ITD) is the difference in the arrival time of a sound between the two ears. When sound arrives at our head from either side of the median plane, there is a difference in length of the path a sound travels to each ear. Thus, the sound will arrive at one ear before the other. The greatest ITD is experienced when a sound source is directly to the left or right side of a listener’s head.
If the speed of sound at sea level is ~340 m/s (761 mph), it takes approximately 0.6 ms for sound to travel the width of the average adult human head. It is important to note that when a sound source is directly in front, back, above, or beneath you (that is, from anywhere on the median plane), the interaural time difference is zero, meaning there is no directional information derived.
Interaural intensity difference (IID) is the difference in the intensity of a sound between the two ears. If a sound originates from directly in front of or behind a listener, the intensity will be the same in both ears. On the other hand, if a sound originates from a position off to the side, the intensity will be slightly different in each ear. This, as the distance the sound must travel is different for each, and sound intensity decreases with distance. Second, the head interferes with the sound wave and forms what is referred to as an acoustic shadow near the distant ear.
This phenomenon is most common with higher frequency sounds because the sound waves are of short enough wavelengths that they are effectively blocked by the head (Heeger, 2006), (Van Wanrooij et al., 2004). Low-frequency sounds (<1500 Hz) have wavelengths longer than an average adult head is wide. These waves effectively bend around the head (refraction) and do not produce shadows (Harding, 2006).
Judging the elevation of a sound source is a different matter, and in particular, anywhere along the mid-line of your head, and requires more sophisticated processing as there are no variations in arrival time or intensity. This brings us to the third primary sound cue.
Pinna spectral cues, known to sound engineers as head-related transfer functions, are changes in the frequency profile of sounds resulting from the size and shape of your head and shoulders, as well as the curves and ducts of your outer ear—the pinna. The physical geometry of the pinnea amplifies certain frequencies and attenuates others, in effect changing the shape of the frequency spectrum. This reliance on the unique shape of the outer ear, as opposed to differences in the sound arriving at different points along the binaural axis, makes this a monaural cue. Neuroscientists have discovered neurons in the audio centers of the brain that appear to be tuned to these spectral changes (Letowski and Letowski, 2012).
In addition to the primary sound cues, there are several other phenomena identified by scientists as also contributing to our overall sound localization abilities.
Attenuation, or the reduction in the strength of a sound, can also aid in sound localization because high frequencies in air are dampened faster than low frequencies. Thus, muffled sounds can indicate a distant source. Conversely, high-frequency content of familiar sounds can indicate a source that is closer.
Reverberation levels can also aid in localization as reverberant energy builds over time, allowing source location to be represented relatively faithfully during the early portion of a sound, although this representation becomes increasingly degraded later in the stimulus (Devore et al., 2009).
Doppler shift is particularly important for sound sources that are moving toward or away from a listener (Schasse et al., 2012). This movement of the sound source results in spectrum shifts between higher and lower pitch (or lower to higher) and provides the listener additional information on where the sound source will be in succeeding moments.
Head movement has been shown to improve the accuracy of sound localization (McAnally and Martin, 2014), particularly in those situations described earlier where sound sources are located along the medial plane or within the cone of confusion.
This AR / VR Science Note is based on content drawn from the book Practical Augmented Reality: A Guide to the Technologies, Applications and Human Factors for AR and VR (Pearson / Addison Wesley Professional, Fall, 2016. Reproduced by the author with permission, Pearson © 2017).