Spatial Audio in Virtual Reality

Two speakers directed at a VR headset


As Virtual Reality (VR) technology continues to evolve, creating truly immersive experiences is becoming more feasible and crucial for user engagement. While visual elements often steal the spotlight in discussions about VR, the role of spatial audio in crafting a convincing, realistic environment cannot be overstated. This article delves into the myriad spatial audio techniques available for enhancing immersion in VR settings – from Head-related Transfer Functions (HRTFs) to object-based audio. It also explores the considerations that need to be made in implementing these techniques, the latest research developments in the field, the challenges faced, and what the future holds for spatial audio in VR. Whether you are a game developer, sound engineer, or simply interested in the expanding universe of VR, understanding how spatial audio shapes our virtual experiences is an important topic to be aware of.

Spatial Audio Techniques

There are a number of different spatial audio techniques that can be used to create immersive and realistic sound experiences in VR. Each technique has its own advantages and disadvantages, and the best technique to use will depend on the specific needs of the application.

Here is a brief overview of some of the most common spatial audio techniques used in VR:

Head-related Transfer Functions (HRTF):

Head-related transfer functions (HRTFs) are mathematical functions that describe how the shape of a person’s head and ears affects the way they perceive sound. HRTFs can be used to create personalised spatial audio experiences by tailoring the sound to each individual user’s anatomy.

HRTFs work by simulating the way that sound waves interact with the human head and ears. This allows sound engineers to create audio that sounds very similar to how the human brain perceives sound in the real world.

Binaural Recording:

Binaural recording is a technique for capturing sound that simulates the way that the human brain perceives sound in the real world. This is done by using two microphones placed in the ears of a dummy head to record audio.

The resulting recording is a stereo recording that sounds very similar to how the human brain would perceive sound if the listener was actually in the room where the recording was made.

Binaural recordings are often used to create spatial audio experiences for headphones, as they can provide a very immersive and realistic listening experience.

Wave-based Rendering:

Wave-based rendering is a technique for simulating the propagation of sound waves in a virtual environment. This allows sound engineers to create realistic audio effects, such as echoes and reverberation.

Wave-based rendering works by simulating the way that sound waves travel through the air. This allows sound engineers to create audio that sounds very similar to how the human brain would perceive sound in a real environment.

Wave-based rendering is often used to create spatial audio experiences for VR games, as it can help to create a more immersive and realistic gaming experience.


Ambisonics is a surround sound format that uses multiple speakers or headphones to create a 360-degree sound field. Ambisonics recordings are made using a special type of microphone array that captures the sound field from all directions.

Ambisonics is a very flexible spatial audio format, and it can be used to create a wide range of spatial audio experiences, from simple stereo recordings to complex 360-degree sound fields.

Ambisonics is often used to create spatial audio experiences for VR movies and videos, as it can provide a very immersive and realistic viewing experience.

Object-based Audio:

Object-based audio is a spatial audio format that treats each sound source as a separate object in 3D space. This allows sound engineers to have more control over the placement of sounds in a virtual environment.

Object-based audio is a relatively new spatial audio format, but it is becoming increasingly popular for VR applications. This is because object-based audio offers a number of advantages over other spatial audio formats, such as the ability to create more realistic and immersive sound experiences.

Object-based audio is often used to create spatial audio experiences for VR games, as it can help to create a more realistic and immersive gaming experience.

These are just a few of the most common spatial audio techniques used in VR. As the technology continues to evolve, we can expect to see even more innovative and creative applications for spatial audio in VR.

Head Tracking:

Head tracking is a technique for tracking the position and orientation of the user’s head in space. This information can be used to update the spatial audio in real time, so that the sound always appears to be coming from the correct direction.

Head tracking is essential for creating immersive and realistic spatial audio experiences in VR. Without head tracking, the sound would always appear to be coming from the same direction, even if the user turned their head. This would be very unrealistic and would break the immersion of the experience.

Head tracking is often used in conjunction with other spatial audio techniques, such as HRTFs and binaural recording. This allows sound engineers to create spatial audio experiences that are both immersive and realistic.

Spatial Audio in Virtual Reality

Spatial audio significantly elevates the immersive quality of Virtual Reality (VR) environments, forging realistic auditory-visual cross-modal associations.


Spatial audio in VR gaming enhances player immersion by accurately representing sound sources within the game environment. It aids in creating a believable world where audio cues assist players in navigating and interacting within the game. One popular game that provides a very realistic soundscape is Half-Life: Alyx. where a lot of work was put into to create and clean up the sound effects for use with HRTF.


In VR concerts or movies, spatial audio contributes to a more immersive and enjoyable experience by realistically reproducing the acoustics of different environments. VR concerts have become popular in recent years, especially during the pandemic and many platforms provide these experiences like AmazeVR, concertVR, and live concerts being available through the NextVR app for Oculus.

Education and Training:

Spatial audio in educational VR applications can aid in creating realistic scenarios for training or learning. It helps in simulating real-world auditory experiences which can be crucial for training purposes. Lele presents different use cases for VR in military training simulations, including scenarios where there is bad weather or an equipment malfunction [1]. Another study shows how VR can be used to train professionals, using firefighters as an example [2]. Other area that it can have an impact are in education [3], and in the medical sector [4]. These are only some examples of research with VR in these areas with a lot more available.

Auditory-visual cross-modal associations are essential as they contribute to a more realistic and engaging VR experience. When the auditory cues match the visual stimuli accurately, it creates a coherent and believable VR environment which is critical for immersion and user engagement.

Implementing Spatial Audio in VR

Implementing spatial audio effectively in Virtual Reality (VR) environments is critical for user immersion. Various tools and middleware platforms offer different approaches and features to achieve this. Unity’s built-in audio tools provide a straightforward, cost-effective means for basic 3D spatialisation. While easy to use, these native tools are somewhat limited in terms of advanced features like HRTF.

Wwise by Audiokinetic offers a highly customisable and feature-rich platform, perfect for complex audio behaviors. Its power comes with a steeper learning curve and may incur additional licensing costs. FMOD by Firelight Technologies serves as a middle-ground solution, offering an intuitive UI and a wide array of built-in effects, albeit with licensing requirements for larger projects.

Google’s Resonance Audio used to be an excellent choice for efficient and high-quality spatialisation. However, it’s worth noting that the platform is now deprecated and is scheduled for removal in Unity version 2020.1. This change makes it less viable for long-term projects.

Finally, Steam Audio by Valve specializes in physically-based sound propagation for extremely realistic audio environments. While it’s free and well-documented, it does demand more computational resources and focuses less on aspects like interactive music and dialogue.

Each tool or middleware has its own advantages and challenges, and the right choice will depend on your project’s specific audio needs, your budget, and the expertise of your development team.

Research & Developments in Spatial Audio for VR

While most existing research in VR has primarily focused on visual elements, a truly immersive experience must engage multiple senses. However, there’s a trade-off: incorporating additional technology to stimulate more senses can potentially overwhelm users, thus diminishing their sense of presence in the virtual environment. Spatial audio offers a unique advantage here; it can enhance immersion without requiring cumbersome additional hardware.

A study by Potter et al. investigated the relative impact of spatial audio fidelity and video resolution on perceived audio-visual quality and immersion. The study considered three different configurations for both audio (monaural, binaural with head tracking, and binaural with head tracking and room acoustic rendering) and video (resolutions of 0.5 megapixels per eye, 1.5 megapixels per eye, and 2.5 megapixels per eye) and found that both factors significantly influenced immersion and audio-visual quality. Notably, adding room acoustic rendering to head-tracked binaural audio improved immersion to the same extent as a five-fold increase in video resolution [5].

Research has also emphasised the role of spatial audio in facilitating navigation and object detection in 3D environments. Studies show that integrating auditory cues with visual stimuli can notably improve the speed and accuracy of navigation tasks [6].

In the realm of object-based audio formats, Dolby Atmos is gaining traction for its ability to offer a more realistic representation of sound within a 3D space. Though one study found no significant improvement in the precision of spatial location when using Dolby Atmos as opposed to 5.1 and 7.1 channel-based surround sound, participants reported greater confidence in their spatial judgments [7].

In practical applications, the popular online game Overwatch became the first to support Dolby Atmos over headphones, enhancing the player’s ability to locate opponents. Valve also incorporated HRTF audio into one of their first-person shooter games, further improving the realism and immersion experienced by users [8].

Ongoing research in sound spatialisation extends beyond the realm of Virtual Reality (VR) but has significant implications for enhancing spatial audio within VR environments. A study by Mickiewicz and Kosmenda proposes an innovative technique to heighten the realism of audio recordings through sound spatialization using intensity impulse responses. This research has applications in various domains, including audio production, gaming, and VR experiences [9]. 


Despite advancements in spatial audio, significant challenges remain. One key issue lies in the use of HRTFs. While generalised HRTFs are convenient, they often fall short in providing a realistic auditory experience. Personalised HRTFs offer a more authentic sound landscape but are complex to create and implement. Some HRTF libraries, like the Kemar library provided generalised HRTF profiles that can be matched to users for close approximate profiles, although the results remain imperfect. HRTFs do solve certain issues inherent in stereo sound, such as difficulties in discerning front-back and elevation cues [8].

Another obstacle is the lack of open standards for spatial audio plugins, which restricts interoperability among different development toolkits. This makes pre-planning crucial, as your choice in hardware and software can dictate your spatial audio options. For instance, using an Oculus headset with Meta’s integration package limits you to the Oculus Spatialiser plugin. This plugin might lack certain features like full audio occlusion, which are available in more feature-rich alternatives like Steam Audio [8].

Future research is likely to focus on several key areas: personalised HRTFs, cross-modal interactions, machine learning and AI integrations, standardisation, and real-time acoustic simulations. Progress in these domains will simplify the creation of immersive audio environments, encouraging broader adoption of VR across various applications that stand to benefit from advanced audio spatialisation.


Various techniques in spatial audio can be synergistically applied to create a more realistic and immersive VR environment. When designing sound and music in such settings, several key factors should be considered: the desired level of immersion, the objectives of the game or application, the role of audio in meeting these objectives, and budget constraints. While striving for the utmost realism in spatial audio is an option, it may not be necessary for all scenarios, especially those that don’t demand high levels of immersion. As advancements continue in the field of spatial audio within VR, many of these considerations will become increasingly nuanced. With the ongoing improvements in technology and understanding, VR applications will soon be capable of handling a broader range of scenarios, rendering some of these questions less critical. In summary, the future of spatial audio in VR holds the promise of more accessible and nuanced auditory experiences, transforming the way we interact with virtual worlds. 


[1] A. Lele, ‘Virtual reality and its military utility’, J Ambient Intell Human Comput, vol. 4, no. 1, pp. 17–26, Feb. 2013, doi: 10.1007/s12652-011-0052-4.

[2] A. Grabowski and K. Jach, ‘The use of virtual reality in the training of professionals: with the example of firefighters’, Computer Animation and Virtual Worlds, vol. 32, no. 2, p. e1981, 2021, doi: 10.1002/cav.1981.

[3] C. Dede, ‘Immersive Interfaces for Engagement and Learning’, Science (American Association for the Advancement of Science), vol. 323, no. 5910, pp. 66–69, 2009, doi: 10.1126/science.1167311.

[4] T. D. Parsons and A. S. Phillips, ‘Virtual reality for psychological assessment in clinical practice.’, Practice Innovations, vol. 1, no. 3, pp. 197–217, Sep. 2016, doi: 10.1037/pri0000028.

[5] T. Potter, Z. Cvetković, and E. De Sena, ‘On the Relative Importance of Visual and Spatial Audio Rendering on VR Immersion’, Frontiers in Signal Processing, vol. 2, 2022, Accessed: Feb. 15, 2023. [Online]. Available:

[6] M. Gröhn, T. Lokki, and T. Takala, ‘Comparison of auditory, visual, and audiovisual navigation in a 3D space’, ACM Trans. Appl. Percept., vol. 2, no. 4, pp. 564–570, Oct. 2005, doi: 10.1145/1101530.1101558.

[7] T. Oramus and P. Neubauer, ‘COMPARISON OF PERCEPTION OF SPATIAL LOCALIZATION BETWEEN CHANNEL AND OBJECT BASED AUDIO’, presented at the Audio Engineering Society Convention 148, Audio Engineering Society, May 2020. Accessed: Oct. 02, 2023. [Online]. Available:

[8] J. Broderick, J. Duggan, and S. Redfern, ‘The Importance of Spatial Audio in Modern Games and Virtual Environments’, in 2018 IEEE Games, Entertainment, Media Conference (GEM), Aug. 2018, pp. 1–9. doi: 10.1109/GEM.2018.8516445.

[9] W. Mickiewicz and K. Kosmenda, ‘Spatialization of sound recordings using intensity impulse responses’, in 2023 27th International Conference on Methods and Models in Automation and Robotics (MMAR), Aug. 2023, pp. 264–268. doi: 10.1109/MMAR58394.2023.10242446.

Featured Photo by Polina Tankilevitch

Leave a Reply

Your email address will not be published. Required fields are marked *