HOA Technical Notes - Decoding

In Higher Order Ambisonics, the main audio stream in use is B-Format. This contains an awful lot of information about sound directivity, and it is deliberately unaware of the speakers that you will use for playback. Because of this, it must be "decoded" for playback.

There are lots of different ways to decode a Higher Order Ambisonic stream. Michael Gerzon did provide a relatively formal definition of "Ambisonic Decoding" for first order "Classic" Ambisonics, however many modern decoders do not work this way.

Rapture3D

One of the main feature of our Rapture3D software is a high quality Higher Order Ambisonic decoder. The same decoder is used in the game engine and the music playback engine. The idea is that you configure it once, and then everything "just works".

The "user" and "game" editions come with decoders for a collection of preset rigs that correspond to the standard Windows speaker layouts (such as stereo, 5.1, 7.1) and some variants (such as 3D7.1). These decoders are actually generated using the Rapture3D Decoder Generator, which is available directly in the "advanced" edition. This lets you set up decoders for fairly arbitrary speaker layouts.

Rapture3D uses some very sophisticated techniques for decoding, taking into account soundfield reproduction (which is a form of "acoustic holography"), wavefront curvature, psychoacoustic cues, HRTFs and more. It also supports HRTF-based headphone decoding, output for surround processors and crosstalk cancelled stereo.

Reference Decoders

Over the years, quite a few poor decoders have appeared, which have not always given Ambisonics a good name. Here, we recommend some simple low-order decoders. We prefer the Rapture3D ones, but if you're writing a new decoder and it sounds worse than these, you're doing something wrong!

These decoders are represented as matrices. To apply them, take the relevant components from each sample frame of your B-Format and multiply them by the matrix to produce the corresponding sample frame to feed to the speakers. Except for the mono and hexagon decoders, these are based on the decoders provided by Csound's bformdec1 opcode at the time of writing (2011-05-12), any errors being our own.

Mono

The W channels provides an omnidirectional response, so this this can be used directly to provide a good mono output.

Simple M+S Stereo

This is a simple decode equivalent to a M+S microphone array at the listening point. This works better than a front-facing arrangement for many (but not all) purposes. Note that there are lots of ways to do stereo decodes of B-Format. Also, see more on "Synthetic Microphones" below (this decode is equivalent to two side-facing cardioids, though we've halved the gain).

W InY In
Left0.70710.5000
Right0.7071-0.5000

First Order Quad

This is a first order "in-phase" decoder.

W InX InY In
Front Left0.35360.17680.1768
Back Left0.3536-0.17680.1768
Back Right0.3536-0.1768-0.1768
Front Right0.35360.1768-0.1768

Second Order 5.0

This is a second order decoder provided by Bruce Wiggins, targetting an ITU 5.0 speaker layout, compatible with DVD 5.1 etc.

W InX InY InU InV In
Front Left 0.4050.3200.3100.0850.125
Front Right 0.4050.320-0.3100.085-0.125
Front Centre 0.0850.0400.0000.0450.000
Surround Left 0.635-0.3350.280-0.0800.080
Surround Right 0.635-0.335-0.280-0.080-0.080

Second Order Hexagon

The speakers here are assumed to be set out anticlockwise, with the first speaker at 11 o'clock and the last one at 1 o'clock. This is an "in-phase" decoder.

W InX InY InU InV In
Front Left 0.23570.19870.11470.03210.0556
Left 0.23570.00000.2294-0.06430.0000
Back Left 0.2357-0.19870.11470.0321-0.0556
Back Right 0.2357-0.1987-0.11470.03210.0556
Right 0.23570.0000-0.2294-0.06430.0000
Front Right 0.23570.1987-0.11470.0321-0.0556

Third Order Octagon

The speakers here are assumed to be set out anticlockwise, with the first speaker roughly at 11 o'clock and the last one roughly at 1 o'clock. This is an "in-phase" decoder.

W InX InY InU InV InP InQ In
NNW 0.17680.17320.0718 0.05300.05300.00480.0115
WNW 0.17680.07180.1732 -0.05300.0530-0.0115-0.0048
WSW 0.1768-0.07180.1732 -0.0530-0.05300.0048-0.0115
SSW 0.1768-0.17320.0718 0.0530-0.0530-0.01150.0048
SSE 0.1768-0.1732-0.0718 0.05300.0530-0.0048-0.0115
ESE 0.1768-0.0718-0.1732 -0.05300.05300.01150.0048
ENE 0.17680.0718-0.1732 -0.0530-0.0530-0.00480.0115
NNE 0.17680.1732-0.0718 0.0530-0.05300.0115-0.0048

First Order Cube

This is a first order "in-phase" decoder.

W InX InY InZ In
Front Lower Left 0.17680.07220.0722-0.0722
Front Upper Left 0.17680.07220.07220.0722
Back Lower Left 0.1768-0.07220.0722-0.0722
Back Upper Left 0.1768-0.07220.07220.0722
Back Lower Right 0.1768-0.0722-0.0722-0.0722
Back Upper Right 0.1768-0.0722-0.07220.0722
Front Lower Right 0.17680.0722-0.0722-0.0722
Front Upper Right 0.17680.0722-0.07220.0722

Virtual Microphones

It's possible to extract simple microphone responses from the B-Format stream. This is particularly relevant when decoding for stereo.

  • For an omnidirectional response, simply use the W channel and multiply it by 1.4142.
  • For a figure-of-eight response, take the scalar product of the microphone direction vector and the X, Y and Z components (for instance, a front-facing response would multiply <1,0,0> by <X,Y,Z>, giving 1X+0Y+0Z=X as the output).
  • For a cardioid response, add the omnidirectional and figure-of-eight responses together (you can vary this to produce hypercardioid responses etc.).

It is tempting to build multichannel decoders by feeding simple virtual microphone responses in the directions of the various speakers. This does not work well, particularly for speaker layouts that are not regular.