Conclusion – Reflections on Immersive Music Production

This project set out to explore how immersive audio formats can be used as an integral part of music production rather than as an additional or purely technical layer. Over the course of the project, it became clear that working in 3D audio fundamentally affects compositional, arrangement-related, and production decisions. Spatial considerations do not emerge only at the mixing stage, but influence songwriting, recording strategies, and performance choices from an early point onward.

A central insight of the project is that spatial width and motion are most effective when used deliberately and in contrast. Excessive or constant spatial expansion can reduce musical impact, whereas controlled changes in spatial density and focus can significantly enhance the perceived energy of specific song sections. In this context, immersive audio proved particularly valuable for shaping structural contrasts, clarifying arrangements, and reducing perceptual masking through spatial distribution rather than aggressive spectral processing.

From a technical perspective, the comparative use of Ambisonics and Dolby Atmos workflows provided valuable insights into different production philosophies. Ambisonics offered a flexible and performance-efficient environment for exploratory spatial work, while Dolby Atmos proved especially practical for structured production workflows and distribution on current streaming platforms. Neither approach emerged as universally superior; instead, their strengths depended on artistic intent, playback context, and production requirements.

Overall, the project demonstrates that immersive audio can serve as a meaningful compositional and narrative tool in contemporary music production—provided that spatial decisions remain grounded in musical intention and listener perception. Rather than treating 3D audio as a novelty, this work argues for its thoughtful integration as an expressive dimension that supports, rather than overshadows, the music itself.

Acknowledgements

I would like to sincerely thank Alois Sontacchi for his continuous support throughout this project. Our discussions were consistently insightful and inspiring, not only in relation to this work, but also beyond its immediate scope. A special thanks also goes to Benjamin Pohler, who was always available for short (or longer) conversations and quick exchanges of ideas.

Workflow Comparison: Ambisonics vs. Dolby Atmos

Based on practical experience gained throughout the project, both workflows revealed distinct strengths and limitations that influenced artistic decisions, technical handling, and playback outcomes.

One noticeable difference concerned vertical spatial resolution. In the Ambisonics workflow, access to a continuous vertical sound field allowed for more flexible and coherent vertical movements. In contrast, a Dolby Atmos setup, as used in this project, did not include a top center speaker. This limitation became particularly apparent in sections where vertical motion played a structural or emotional role, such as moments where sound elements were intended to move upwards. During playback in the Cube, this difference was emphasized further, as the upper loudspeaker layer consists of five speakers that could not be addressed using the chosen Dolby Atmos configuration.

Despite this limitation, the Dolby Atmos workflow proved to be highly efficient and reliable. The integration of the Dolby Atmos Renderer directly into Cubase and Nuendo allowed for seamless monitoring across different loudspeaker layouts, as well as quick evaluation of stereo downmixes and binaural renders. This level of integration significantly simplified workflow management and made it easy to check translation across formats within a familiar DAW environment.

In comparison, working with Ambisonics in Reaper was considerably more performance efficient. Even with large sessions consisting of 120 to 150 tracks, CPU usage remained comparatively low. The IEM Plugin Suite offered a powerful and intuitive toolset for spatial encoding and decoding, reverberation, and sound design tools, enabling many creative possibilities with minimal system load. This made Ambisonics particularly suitable for exploratory work and complex spatial experimentation.

Another key difference lay in signal organization and processing philosophy. The Ambisonics workflow encouraged early grouping and encoding strategies. The Dolby Atmos workflow, on the other hand, offered greater flexibility for multichannel summing and corrective processing at the subgroup level, particularly through the use of multichannel-capable plugins. While both approaches were effective, they led to different working habits and influenced how spatial and tonal decisions were made during mixing.

From a distribution perspective, the Dolby Atmos workflow proved to be more practical. At the time of writing, immersive music releases on major streaming platforms require delivery in the ADM format. Working directly within a Dolby Atmos environment allows for a straightforward ADM export that aligns with current industry standards for music distribution. This made the Dolby-based workflow particularly suitable for release-oriented productions, whereas Ambisonics workflows typically require additional conversion steps before meeting platform-specific delivery requirements.

Overall, neither workflow proved universally superior. Instead, each approach offered specific advantages depending on artistic intent, technical requirements, and playback context. The comparative use of both workflows throughout the project contributed significantly to a deeper understanding of immersive music production practices.

Practical Limitations and Session Transfer Issues

Although not directly related to the spatial workflows themselves, practical challenges arose during the transfer of sessions to the production studio system. Due to compatibility issues between different versions of the FabFilter plugins (notably Pro-Q 3 and Pro-Q 4), session interchange became unexpectedly time-consuming.

Sessions created with older plugin versions could not be opened using newer versions, and vice versa. Attempts to work around this limitation, such as using user presets, were unsuccessful, requiring all equalization settings to be recreated manually. This significantly increased preparation time and highlighted an often-overlooked aspect of production workflows: plugin version compatibility across different systems.

EAR Production Suite Experiments

As part of the ongoing series on spatial mixing approaches in practice, this post focuses on experimental tests conducted with the EAR Production Suite (EPS). These experiments were carried out at a late stage of the project and aimed to explore alternative ADM-based playback and conversion workflows.

EAR Production Suite Experiments

In parallel, experiments were conducted using the EAR Production Suite (EPS). These tests took place during the weekend prior to the final presentation, which significantly limited the available time for extended troubleshooting and deeper investigation.

The EAR Production Suite is a set of VST plugins developed by BBC R&D and IRT under the EBU, designed to enable immersive audio production using the Audio Definition Model (ADM). It allows for importing, exporting, and monitoring ADM content for various loudspeaker configurations based on ITU-R BS.2051, using the ITU ADM Renderer. The suite is primarily optimized for Reaper and serves as a reference implementation for ADM-based workflows[1].

Using the EAR Production Suite, I tested alternative playback and conversion approaches, including rendering ADM content into Ambisonics formats. However, during these tests, unexpected behavior occurred, such as excessive spatial spread and routing inconsistencies. Resolving these issues would have required more extensive investigation and testing.

Due to limited working time in the Cube and the need for a fail-safe playback solution, I ultimately decided against further experimentation with the EAR Production Suite in this context. Instead, the fully channel-based rendering approach, as mentioned before, was chosen for all listening examples used in the presentation.


[1] “EAR Production Suite,” accessed February 6, 2026, https://ear-production-suite.ebu.io//.

Dolby Atmos – Workflow Comparison and Technical Reflection

Continuing the series on spatial mixing approaches in practice, this post focuses on the Dolby Atmos workflow I used for Alter Me and Caught In Dreams, and on the practical steps taken to prepare ADM exports and playback in the IEM Cube.

For the Dolby Atmos productions, I decided to work in Cubase and Nuendo, as the Dolby Atmos Renderer is already fully integrated into both environments. This allowed for a streamlined workflow without the need for external rendering tools[1].

After completing the stereo mixes of Alter Me and Caught In Dreams to an advanced stage, the sessions were converted into Dolby Atmos projects. Cubase provides an automated conversion process in which all existing tracks are initially routed into a standard bed configuration.

For my workflow, I used the standard bed primarily for reverberation. I also used an Ambisonics bus with the Room Encoder and the FDN Reverb as a reverb send. Since the standard bed in Dolby Atmos is limited to a maximum configuration of 7.1.2, I deliberately avoided placing direct sound sources in this bed. Instead, I created a so-called object bed. In this setup, 11 objects were placed at the exact positions of the loudspeakers (used in the production studio), which in my case was the 7.1.4 configuration at the IEM production studio.

Routing signals into this object bed allowed me to address individual loudspeakers, provided that the loudspeaker positions were correctly defined. While this spatial correspondence was largely accurate in the production studio, minor deviations remained due to differences between the virtual speaker layout and the physical setup (higher elevated top speakers for example).

Subgroup structure and processing

In addition to object-based routing, extensive use of subgroups was made. Instrument groups such as drums, guitars, and vocals were routed into dedicated multichannel buses. For example, the drum signals were routed into a 7.1.4 drum bus, allowing for internal panning decisions as well as group-based processing.

Within these subgroup buses, summing and tonal shaping were carried out using multichannel-capable plugins, primarily from the FabFilter suite. Compared to the Ambisonics workflow, this approach provided greater flexibility for summing and corrective processing at the group level, while the overall structural logic of the routing remained similar.

Signals involving pronounced movement or spatial automation were routed directly to objects. In cases where a sound source only changed position briefly within a song, the track was often routed into the object bed and automated using the track’s multipanner rather than being continuously treated as a Dolby Atmos object.

LFE handling

The Low Frequency Effects (LFE) channel was deliberately not used in this workflow. Although the LFE channel is definitely part of standard Dolby Atmos workflow, it is often not used in music production. By excluding the LFE channel, the separation between the standard bed and the object bed remained clear, as any signal intended to address the LFE channel must be routed in the bed. This decision helped maintain a clean and predictable routing structure.

Export and playback preparation for IEM CUBE

At the end of the production process, an ADM file was rendered directly from Cubase. For playback preparation in the Cube, several approaches were tested with the goal of ensuring a stable and reliable setup for the final presentation of this project.

The ADM file was imported into Nuendo and up-rendered to a 9.1.6 configuration. At the time of production, I was not aware that the production studio system (their Nuendo version) also supported a 9.1.6 setup. In retrospect, creating the object bed directly in 9.1.6 would have been the more precise solution.

The up-rendered 9.1.6 mix was then exported as a channel-based 16-channel WAV file. This file was routed manually and directly to the corresponding loudspeakers in the Cube, ensuring full control over playback and eliminating potential uncertainties related to rendering or decoding behavior.


References:

[1] “Getting Started in Dolby Atmos with Steinberg Cubase and Nuendo,” accessed February 8, 2026, https://professionalsupport.dolby.com/s/article/Getting-Started-in-Dolby-Atmos-with-Steinberg-Cubase-and-Nuendo?language=en_US.

Ambisonics – Workflow Comparison and Technical Reflection

Ambisonics Workflow

When it came to mixing in 3D audio, I decided to begin my first immersive mixing experiments using Ambisonics in Reaper rather than Dolby Atmos. This decision was mainly influenced by the IEM Plugin Suite, which provides intuitive and flexible tools for Ambisonics mixing and made the initial entry into 3D audio more accessible.

I chose to work with fifth-order Ambisonics for this project to achieve a more accurate and immersive rendering of diffuseness, spaciousness, and spatial depth. While first-order Ambisonics might seem sufficient due to the even nature of diffuse sound fields, in practice, their low spatial resolution leads to high directional correlation during playback, which significantly impairs the perception of these spatial qualities. Higher-order Ambisonics, in contrast, improves the mapping of uncorrelated signals and preserves spatial impressions much more effectively. Psychoacoustic research has shown that an Ambisonic order of three or higher is required to perceptually preserve decorrelation between neighboring loudspeakers, which is crucial for rendering depth and diffuseness. Fifth-order Ambisonics further enhances this, particularly outside the sweet spot, providing a more consistent spatial experience across a larger listening area. As demonstrated in the IEM CUBE, a fifth-order system allows nearly the entire horizontal listening plane—in this case, a 12 × 10 m concert space—to become a valid and perceptually plausible playback zone. [1]

Thus, fifth-order Ambisonics is not only a practical choice for immersive production in larger spaces, but it also strikes an effective balance between spatial resolution, technological complexity, and perceptual benefit [2].

I also had the opportunity to experience this myself during a small listening test we conducted with Matthias Frank. We listened to first-, third-, and fifth-order Ambisonics in a blind comparison and were asked to rate certain spatial parameters like spatial depth or localization. The first order was quite easy to identify due to its limited spatial resolution. However, distinguishing between third- and fifth-order Ambisonics proved to be much more challenging, as the differences were often subtle and less immediately perceptible.

After that, I started with setting up the routing, which was one of the most underestimated parts of this project. Similar to a traditional stereo production, I created a structure of groups and subgroups, but adapted it for Ambisonics. For example, in the drum section, encoding happens at the main drum group via the IEM Multi Encoder. All individual channels are routed into that group, allowing me to process them using conventional stereo plugins before spatializing them — saving both CPU resources and maintaining flexibility in the early mixing stages.

Within the drum routing, I created subgroups for kick, snare, overheads and the “Droom”, allowing for finer control and processing. When dealing with coherent signals, such as double-tracked guitars, I first routed both signals (panned hard L & hard R) into a stereo group to conserve CPU power by processing them together. This group is then routed into a master guitar group that handles Ambisonics encoding. Since the L and R signals remain separated, they can still be treated independently in the encoder and placed individually in the 3D field.

I followed the same approach with vocals, organizing them into groups before routing them into the Multi Encoder. For specific adlibs, I used the Granular Encoder to create glitchy, scattered spatial effects.

To add a sense of depth and immersion to the vocals, I used a small amount of FDN Reverb for diffuse reverberation and the Room Encoder for early reflections — all plugins from the IEM Suite.

Finding this optimal signal flow took considerable time and experimentation. It was a major learning process to understand how to best structure a large session for Ambisonics.


References

[1] Franz Zotter and Matthias Frank, Ambisonics: A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality, Springer Topics in Signal Processing (Springer International Publishing, 2019), 19:18–20, https://doi.org/10.1007/978-3-030-17207-7.

[2] Zotter and Frank, Ambisonics, 19:18–20.

Workflow Comparison and Technical Reflection

As part of the ongoing series on spatial mixing approaches in practice, this post shifts the focus from artistic decisions to a technical reflection on the workflows used throughout the project. The following sections outline how different immersive production approaches influenced working methods, creative flexibility, and playback outcomes.

Workflow Overview

This chapter outlines the different production and mixing workflows used throughout the project. While all recordings were carried out using the same studio environment and similar recording setups, two distinct immersive audio workflows were applied during the course of the project.

The first workflow is based on Ambisonics and reflects my initial approach to immersive music production. This workflow was primarily explored during the production of Standby and served as an entry point into working beyond stereo formats.

As the project progressed, a second workflow based on Dolby Atmos was introduced and applied to the subsequent tracks Alter Me and Caught In Dreams. This shift allowed for a comparative evaluation of both approaches in terms of practical handling, artistic possibilities, and production implications.

All projects had about 120–150 individual tracks. Recording was carried out using Cubase and Reaper, depending on the session requirements. Ambisonics mixing was performed in Reaper, while Dolby Atmos productions were realized using Cubase 15 and Nuendo 13. The following blog entries describe both workflows separately, focusing on their respective structures and characteristics.

Motion and Vertical Movement as Structural Tools – Spatial Mixing Approaches in Practice

Continuing the series on spatial mixing approaches in practice, this post focuses on two spatial strategies applied in Caught In Dreams that intentionally challenge listener perception. Both examples explore motion and verticality as expressive devices and examine their role as structural and narrative tools within immersive music production.

Motion as Creative Risk

An experimental spatial decision was made during a two-bar drum fill preceding the second chorus. In this section, the drum signal is rotated around the listener. This moment coincides with the lyric “turning nights into nightmares” and was intended to briefly destabilize the listening perspective.

This decision was approached deliberately as a creative risk. While the movement can be perceived as engaging and expressive, it also raises questions regarding distraction and musical focus. The example was included to provoke reflection on how much spatial motion is appropriate within groove-based music and where the boundary between expressive effect and overuse may lie.

Vertical Movement as Formal Break

A further spatial strategy occurs during a short bridge following the second chorus. This section represents a moment of realization, expressed in the lyrics “I woke up and realized it was just a dream.” At this point, multiple elements—including ride cymbals, guitars, and vocals—are shifted upward in the vertical dimension.

This vertical movement functions as a formal break rather than a continuous effect. After this section, the mix collapses back toward a more frontal and dry presentation, reintroducing a mono-oriented guitar similar to the intro. The contrast emphasizes the narrative shift and prepares the listener for the final section of the song.

The spatial strategies discussed above were realized using two different immersive audio workflows. The following blog posts provides a comparative reflection on these workflows and their implications for music production and playback.

Reduced Masking Through Spatial Placement – Spatial Mixing Approaches in Practice

Caught In Dreams

As part of the ongoing series on spatial mixing approaches in practice, this post shifts the focus from Alter Me to the second track discussed in detail: Caught In Dreams. The following sections outline the song’s emotional context and a key spatial mixing strategy applied during its production.

Song Context and Emotional Arc

Caught In Dreams addresses the realization that certain dreams and ideals can become dangerous illusions. The song reflects a gradual loss of grounding driven by the desire for more, leading to a feeling of being trapped within one’s own expectations. While the track maintains a dreamy and indie-inspired character, it also aims to confront the listener with the consequences of losing balance and perspective.

Reduced Masking Through Spatial Placement

A central advantage of immersive mixing in Caught In Dreams lies in the increased spatial capacity compared to stereo production. By distributing sound sources across multiple loudspeakers rather than concentrating them within a left–right panorama, significantly more space is available. This spatial separation reduces the need for aggressive EQing and helps to minimize masking between competing elements.

As a result, overlapping frequency ranges—for example in the low-mid region—become less problematic, as spatial separation supports perceptual differentiation between sources.

The use of a dedicated center speaker further contributes to this effect. Unlike a phantom center, which relies on equal energy from the left and right channels, a discrete center channel allows the lead vocal to be placed alone in one speaker. This reinforces intelligibility and reduces interference with other centrally positioned elements.

A direct comparison between the stereo vocal mix and the immersive version demonstrates that the 3D mix achieves a more open vocal sound with reduced masking, not primarily through equalization, but through spatial distribution. This example highlights how immersive audio can create mix clarity by reallocating elements in space rather than by removing frequency content.

Vocal Arrangement and Spatial Density – Spatial Mixing Approaches in Practice

As part of the ongoing series on spatial mixing approaches in practice, this post continues the analysis of Alter Me. After examining spatial width and impact, the focus now shifts to vocal arrangement and spatial density as key compositional tools in immersive mixing.

Vocal Arrangement and Spatial Density

Vocal production played a central role in my spatial productions and mixes. The lead vocal remains dry and clearly localized in the center channel, providing a stable perceptual anchor throughout the song. Reverberation and delay are routed to the other channels.

In the verses, vocal processing is kept relatively restrained, using slapback delay and reduced reverb to maintain focus. In the chorus, longer delay throws and increased reverberation are introduced to enhance perceived size.

Backing vocals are treated as a spatial and structural element rather than as additional layers only. In the verses, they are reduced in number, less widely distributed, and processed with minimal reverb. In the chorus, backing vocals become more numerous, more saturated, spatially wider, and more reverberant. This increase in spatial density contributes significantly to the perceived size of the chorus while maintaining a clearly localized lead vocal.

Spatial Width and Impact – Spatial Mixing Approaches in Practice

The following blog posts focus on selected spatial mixing approaches applied in practice during the production of this EP. Rather than providing complete production breakdowns, the emphasis lies on specific spatial decisions that were consciously made to support musical structure, narrative development, and listener perception.

The series begins with Alter Me and examines how spatial width, focus, and contrast were used as compositional tools within an immersive mixing context. Subsequent posts will expand on these ideas by exploring additional spatial strategies applied in other tracks of the project.

Alter Me – Spatial Mixing Decisions

Song Context and Narrative Function

Alter Me is conceived as a dialog with one’s own addiction. The song portrays addiction as an internal voice that initially appears supportive and reassuring, but gradually reveals its manipulative and destructive nature. As the song progresses, this internal conflict becomes more explicit, culminating in an emotional outburst during the chorus.

The spatial design of the track was used to support this narrative by differentiating between internal and external perspectives and by reinforcing contrasts between sections.

Spatial Width and Impact

The introduction of Alter Me consists of a single guitar, a snare roll, and several sustained E-bow layers. These E-bow sounds are spatially distributed and move around the listener, creating a highly immersive and enveloping sound field. The intention was to represent the intrusive and surrounding nature of the “addiction voice” before the band enters.

When the full band enters, the spatial strategy changes noticeably. Drums, bass, and guitars are deliberately focused toward the front, and the overall spatial width is reduced. During production, it became clear that an extremely wide and immersive intro can reduce the perceived impact of the band entry. By slightly narrowing the spatial image before the entry, the contrast between intro and chorus is increased, resulting in a stronger sense of impact and energy.

This observation was particularly noticeable during studio monitoring and binaural listening. Interestingly, playback in the Cube emphasized different aspects of this contrast, highlighting how playback environments can influence spatial perception.

Additionally, spatial width is further enhanced by adding multiple, largely uncorrelated signals. Different performances, variations in timing, timbre, and spatial position contribute to a wider and more complex spatial image.