3D Audio

3D audio, or "immersive" audio, has been studied extensively in the academic world, has been used in various forms in computer gaming, and is starting to be used in cinema and the home. But what has really brought focus onto it more recently is Virtual Reality. We have been somewhat obsessed with what can be done here for a rather long time and think this is very exciting!

Some History

Once upon a time, there was mono. Spatially, this is nice and simple - sounds just came from your speaker, but that's enough to hear what is going on and give a sense of distance (the first dimension of our 3D).

Later on, the "stereo" that we know so well came along. Largely developed by Alan Blumlein, this produces a simple sound stage using two channels of audio, typically by varying the relative level of sounds in two speakers.

"Surround" systems (5.1 and 7.1 etc.) are common in cinema and some homes. These put extra speakers around the audience and work in much the same way as stereo. Audio is assigned to a number of speaker channels, typically by varying the level of sounds in two neighbouring speakers. These surround systems have been extended to include raised speakers for a more "3D" experience.

Recently, "object"-based systems have started to be used in cinema. These typically use a surround "bed" but also a number of individual foreground sounds which are only assigned to particular speakers when the audio reaches a particular room and the exact speaker locations are known. This is good, because it means that the cinema mix can be put together in a way that can play consistently in rooms with different, but complex, speaker layouts, potentially with raised speakers. Computer games have effectively been using objects for a long time, because sounds cannot be assigned to particular speakers until the game is played.

In the meantime, the academic world has been studying a variety of different ways to produce realistic 3D audio. The "VBAP" family of technologies are probably the most natural next step from what has come before, but there are other more complex technologies out there, in particular the "Wavefield Synthesis" and "Ambisonic" families. We work with Ambisonics, which was originally developed in the 1970s, or more specifically "Higher Order Ambisonics" ("HOA") which is much more detailed spatially. Ambisonics has the fairly unique feature that it can carry a detailed 3D scene using a fixed set of channels known as "B-Format", but these channels are not fed directly to speakers. Instead, these channels need to be assigned to particular speakers by a "decoding" process when the audio reaches a particular room. So, as with objects, we can handle different rooms and speaker layouts.

The academic world knows a lot more about how the human hearing systems works these days too. We can model the acoustic behaviour of the head using Head Related Transfer Functions (HRTFs) and related technologies, and use this to synthesise "binaural" 3D sound on headphones. This has been used in significant numbers of computer games since the 1990s, but interest has increased hugely because of level of immersion possible in Virtual Reality systems.

It turns out that you can do exceptional binaural with HOA, and HOA B-Format can be rotated easily, so it is perfect for use with Virtual Reality head tracking. And HOA-based rendering engines can handle objects easily enough too!

The Mathematical "Heavy Lifting"

Rapture3D and the O3A studio plugins use "Higher Order Ambisonics" (HOA) to drive your speakers or headphones. The maths is hard, but we worry about all that - and the results are worth it!

On speakers, HOA can work better and better as you add more detail and channels and, like its close relative "Wavefield Synthesis", can manage the acoustic pressure field to generate essentially correct sound in a region of space, not just at a "sweet spot". Unlike Wavefield however, it doesn't insist on huge numbers of speakers. Instead, Rapture3D does the best it can and blends in more robust techniques where there isn't enough detail, or a large enough number of speakers, for more "holographic" approaches. Results are typically rather better than with conventional panning.

We think we have a superb HOA decoder in Rapture3D. We're using a range of new techniques, including handling of irregular speaker layouts, management of wavefront curvature - and new psychoacoustic cues. If you're mathematically inclined, you might be interested in some Technical Notes.

Rapture3D also has support for surround stereo, HCTC crosstalk cancelled stereo, and most importantly binaural 3D on stereo headphones, using six different HRTFs to help you find one that matches your head shape. Rapture3D "Advanced" can even handle personalised HRTFs.

Of course, the Rapture3D game engine does more than just play back HOA. It mixes game objects to HOA itself before rendering the whole 3D mix, live. And fast! This approach is entirely viable on mobile these days, particularly with Rapture3D Universal.

In the studio, the O3A plugins provide a rich suite of creative tools to help you make mixes using HOA easily. O3A uses third order ambisonics and needs 16 audio to channels to work. This gives a good level of spatial detail without overloading your computer.

What Does This All Mean?

This means that Rapture3D can:

Produce "binaural" 3D on headphones, using a choice of HRTF.
Play back pre-rendered 3D mixes put together with our O3A studio plugins (with head-tracking), or "conventional" game sound objects.
On speakers, make better use of your existing multichannel speaker layout to produce a more realistic and immersive soundfield.
Handle 3D speaker layouts and place sounds above and below, as well as to the front, back and sides.
Use "irregular" speaker layouts. You can tell the software where your speakers are and Rapture3D will work out the best way to use them (Rapture3D "Advanced" edition only).

The O3A plugins are aimed at the professional studio. These have a wide range of creative possibilities. For instance, with these plugins you can:

Place any number of sounds in any direction into a 3D audio scene.
"Upmix" content from existing formats (e.g. stereo or 5.1).
Work with real 3D recordings from ambisonic microphones.
See the soundfield graphically so there is no confusion about what's going on.
Manipulate 3D mixes using rotation, reflection, movement, reverb, spatial EQ and much more.
Use a single O3A master to produce many output mixes such as stereo, binaural, 5.1, 7.1, 7.1.2, Auro-3D, 22.2 and more, or keep the O3A audio itself and decode it live in Rapture3D or YouTube.

You know you want to...