HOA Technical Notes - Decoding
In Higher Order Ambisonics, the main audio stream in use is B-Format. This contains an awful lot of information about sound directivity, and it is deliberately unaware of the speakers that you will use for playback. Because of this, it must be "decoded" for playback.
There are lots of different ways to decode a Higher Order Ambisonic stream. Michael Gerzon did provide a relatively formal definition of "Ambisonic Decoding" for first order "Classic" Ambisonics, however many modern decoders do not work this way.
Rapture3D
One of the main feature of our Rapture3D software is a high quality Higher Order Ambisonic decoder. The same decoder is used in the game engine and the music playback engine. The idea is that you configure it once, and then everything "just works".
The "user" and "game" editions come with decoders for a collection of preset rigs that correspond to the standard Windows speaker layouts (such as stereo, 5.1, 7.1) and some variants (such as 3D7.1). These decoders are actually generated using the Rapture3D Decoder Generator, which is available directly in the "advanced" edition. This lets you set up decoders for fairly arbitrary speaker layouts.
Rapture3D uses some very sophisticated techniques for decoding, taking into account soundfield reproduction (which is a form of "acoustic holography"), wavefront curvature, psychoacoustic cues, HRTFs and more. It also supports HRTF-based headphone decoding, output for surround processors and crosstalk cancelled stereo.
Reference Decoders
Over the years, quite a few poor decoders have appeared, which have not always given Ambisonics a good name. Here, we recommend some simple low-order decoders. We prefer the Rapture3D ones, but if you're writing a new decoder and it sounds worse than these, you're doing something wrong!
These decoders are represented as matrices. To apply them, take the relevant components from each sample frame of your B-Format and multiply them by the matrix to produce the corresponding sample frame to feed to the speakers. Except for the mono and hexagon decoders, these are based on the decoders provided by Csound's bformdec1 opcode at the time of writing (2011-05-12), any errors being our own.
Mono
The W channels provides an omnidirectional response, so this this can be used directly to provide a good mono output.
Simple M+S Stereo
This is a simple decode equivalent to a M+S microphone array at the listening point. This works better than a front-facing arrangement for many (but not all) purposes. Note that there are lots of ways to do stereo decodes of B-Format. Also, see more on "Synthetic Microphones" below (this decode is equivalent to two side-facing cardioids, though we've halved the gain).
| W In | Y In | |
|---|---|---|
| Left | 0.7071 | 0.5000 |
| Right | 0.7071 | -0.5000 |
First Order Quad
This is a first order "in-phase" decoder.
| W In | X In | Y In | |
|---|---|---|---|
| Front Left | 0.3536 | 0.1768 | 0.1768 |
| Back Left | 0.3536 | -0.1768 | 0.1768 |
| Back Right | 0.3536 | -0.1768 | -0.1768 |
| Front Right | 0.3536 | 0.1768 | -0.1768 |
Second Order 5.0
This is a second order decoder provided by Bruce Wiggins, targetting an ITU 5.0 speaker layout, compatible with DVD 5.1 etc.
| W In | X In | Y In | U In | V In | |
|---|---|---|---|---|---|
| Front Left | 0.405 | 0.320 | 0.310 | 0.085 | 0.125 |
| Front Right | 0.405 | 0.320 | -0.310 | 0.085 | -0.125 |
| Front Centre | 0.085 | 0.040 | 0.000 | 0.045 | 0.000 |
| Surround Left | 0.635 | -0.335 | 0.280 | -0.080 | 0.080 |
| Surround Right | 0.635 | -0.335 | -0.280 | -0.080 | -0.080 |
Second Order Hexagon
The speakers here are assumed to be set out anticlockwise, with the first speaker at 11 o'clock and the last one at 1 o'clock. This is an "in-phase" decoder.
| W In | X In | Y In | U In | V In | |
|---|---|---|---|---|---|
| Front Left | 0.2357 | 0.1987 | 0.1147 | 0.0321 | 0.0556 |
| Left | 0.2357 | 0.0000 | 0.2294 | -0.0643 | 0.0000 |
| Back Left | 0.2357 | -0.1987 | 0.1147 | 0.0321 | -0.0556 |
| Back Right | 0.2357 | -0.1987 | -0.1147 | 0.0321 | 0.0556 |
| Right | 0.2357 | 0.0000 | -0.2294 | -0.0643 | 0.0000 |
| Front Right | 0.2357 | 0.1987 | -0.1147 | 0.0321 | -0.0556 |
Third Order Octagon
The speakers here are assumed to be set out anticlockwise, with the first speaker roughly at 11 o'clock and the last one roughly at 1 o'clock. This is an "in-phase" decoder.
| W In | X In | Y In | U In | V In | P In | Q In | |
|---|---|---|---|---|---|---|---|
| NNW | 0.1768 | 0.1732 | 0.0718 | 0.0530 | 0.0530 | 0.0048 | 0.0115 |
| WNW | 0.1768 | 0.0718 | 0.1732 | -0.0530 | 0.0530 | -0.0115 | -0.0048 |
| WSW | 0.1768 | -0.0718 | 0.1732 | -0.0530 | -0.0530 | 0.0048 | -0.0115 |
| SSW | 0.1768 | -0.1732 | 0.0718 | 0.0530 | -0.0530 | -0.0115 | 0.0048 |
| SSE | 0.1768 | -0.1732 | -0.0718 | 0.0530 | 0.0530 | -0.0048 | -0.0115 |
| ESE | 0.1768 | -0.0718 | -0.1732 | -0.0530 | 0.0530 | 0.0115 | 0.0048 |
| ENE | 0.1768 | 0.0718 | -0.1732 | -0.0530 | -0.0530 | -0.0048 | 0.0115 |
| NNE | 0.1768 | 0.1732 | -0.0718 | 0.0530 | -0.0530 | 0.0115 | -0.0048 |
First Order Cube
This is a first order "in-phase" decoder.
| W In | X In | Y In | Z In | |
|---|---|---|---|---|
| Front Lower Left | 0.1768 | 0.0722 | 0.0722 | -0.0722 |
| Front Upper Left | 0.1768 | 0.0722 | 0.0722 | 0.0722 |
| Back Lower Left | 0.1768 | -0.0722 | 0.0722 | -0.0722 |
| Back Upper Left | 0.1768 | -0.0722 | 0.0722 | 0.0722 |
| Back Lower Right | 0.1768 | -0.0722 | -0.0722 | -0.0722 |
| Back Upper Right | 0.1768 | -0.0722 | -0.0722 | 0.0722 |
| Front Lower Right | 0.1768 | 0.0722 | -0.0722 | -0.0722 |
| Front Upper Right | 0.1768 | 0.0722 | -0.0722 | 0.0722 |
Virtual Microphones
It's possible to extract simple microphone responses from the B-Format stream. This is particularly relevant when decoding for stereo.
- For an omnidirectional response, simply use the W channel and multiply it by 1.4142.
- For a figure-of-eight response, take the scalar product of the microphone direction vector and the X, Y and Z components (for instance, a front-facing response would multiply <1,0,0> by <X,Y,Z>, giving 1X+0Y+0Z=X as the output).
- For a cardioid response, add the omnidirectional and figure-of-eight responses together (you can vary this to produce hypercardioid responses etc.).
It is tempting to build multichannel decoders by feeding simple virtual microphone responses in the directions of the various speakers. This does not work well, particularly for speaker layouts that are not regular.