This study examines the plausibility of Auditory Augmented Reality (AAR) realized with position-dynamic binaural synthesis over headphones. An established method to evaluate the plausibility of AAR asks participants to decide whether they are listening to the virtual or real version of the sound object. To date, this method has only been used to evaluate AAR systems for seated listeners. The AAR realization examined in this study instead allows listeners to turn to arbitrary directions and walk towards, past, and away from a real loudspeaker that reproduced sound only virtually. The experiment was conducted in two parts. In the first part, the subjects were asked whether they are listening to the real or the virtual version, not knowing that it was always the virtual version. In the second part, the real versions of the scenes where the loudspeaker actually reproduced sound were added. Two different source positions, three different test stimuli, and two different sound levels were considered. Seventeen volunteers, including five experts, participated. In the first part, none of the participants noticed that the virtual reproduction was active throughout the different test scenes. The inexperienced listeners tended to accept the virtual reproduction as real, while experts distributed their answers approximately equally. In the second part, experts could identify the virtual version quite reliably. For inexperienced listeners, the individual results varied enormously. Since the presence of the headphones influences the perception of the real sound field, this shadowing effect had to be considered in the creation of the virtual sound source as well. This requirement still limits test methods considering the real version in its ecological validity. Although the results indicate that the availability of a Hidden Refer leads to a more critical evaluation, it is crucial to be aware that the presence of the headphones slightly distorts the reference. This issue seems more vital to the plausibility estimates achieved with this evaluation method than the increased freedom in motion.
Augmented Reality (AR) aims at adding virtual elements to the real environment (Azuma, 1997; Sicaru et al., 2018). Auditory Augmented Reality (AAR) describes the enrichment of a listener’s actual environment with virtual sound sources or other virtual acoustic elements like reflectors or obstacles causing acoustic shadows. A common approach to realize AAR is to use dynamic binaural synthesis over headphones or hearables (Jot and Lee, 2016; Russell et al., 2016; Garí et al., 2019; Nagele et al., 2021). In such reproduction, the position and orientation of the listener’s head are tracked, and the headphone signals are adjusted by convolving the dry mono source signal with the corresponding binaural room impulse responses (BRIR) without a noticeable delay (Lindau, 2009; Brandenburg et al., 2020). A BRIR filter characterizes the transfer path of the sound from the sound source through the room to both ears of the listener or as a substitute head (and torso) simulator with microphones in the ears.