research

by Gregor Schmitz - 29. May 2025

Exploring the Edges of Concert Design: Between Practice and Research

Title image: Luis Miehlich, “Cartographies – Ein Halbschlafkonzert (2023) – Pieces for Ensemble, Electronics & Video,” luismiehlich, accessed May 25, 2025, https://luismiehlich.com/.

In addition to developing the idea of a technical tool-set, I’ve started to dig a little bit deeper into the research part of my project, trying to better understand the evolving field the creative and technical work inhabits. What started as an effort to clarify the conceptual underpinnings of my practical project turned into a broader exploration of a field that is, in many ways, still defining itself: concert design.

This term may sound straightforward, but its scope is definitively not. Concert design is not just about programming a setlist or choosing a venue; it’s about crafting the entire experiential and spatial context of a performance. It treats every element of the concert, starting from basic things like the seating arrangements (or why not just laying down for example?) to interactivity, from sonic spatialization to the architecture of the space. Everything is understood as part of the creative material designers can work with.

A Field Still Taking Shape

What struck me early on is how fragmented this field still is, even though there are of course some technical resources in more specific aspects like e.g. stage lighting. But there are only a handful of academic sources that explicitly use the term concert design, understanding it as a more holistic view and even fewer that attempt to define it systematically. Among them, people like Martin Tröndle stand out for their efforts to create a structured framework through the emerging field of Concert Studies. Another name, more in the field of practical work, is Folkert Uhde.

Yet, when looking beyond academic texts, I found countless artistic projects that embody the principles of concert design even if their creators never labeled them as such. Here I want to point out the ambient scene with early experiments and even non-scientific reflections from Brian Eno up until very recent formats from Luis Miehlich for example. This suggests a noticeable gap: while practice is vibrant and evolving, theoretical reflection and shared language are still catching up.

Research Process

To navigate this space, I tried out different keywords relating disciplinary intersections; terms like “immersive performance,” “audience interaction,” “spatial dramaturgy”.

With that I found other fields that may offer interesting works, that are worth getting into:

Theater studies turned out to be a goldmine offering both practical and theoretical insights into spatial and participatory performance. There seems to be a howl tradition featuring big names like Berthold Brecht.

But what really surprised me, even though it might seem obvious, was the relevance of game design. The inherent interactive nature impacts of course the work with sound and music. The spaces were players interact with it might be of virtual nature, still the interaction of recipients with there surrounding has to be thought of during the design process. I think there might be a huge potential to examine as well, though it opens the frame to an extend that exceeds this project.

Future Steps: From Reflection to Contribution

The more I researched, the clearer it became that it is hard to just rely on existing research. A way to deal with that can be to contribute to the field as both a designer and researcher. This could be in the following ways:

Provide an overview of the evolving field, both as a practical discipline and as an academic field. This may be a starting point.
Reach out to leading voices in the field (e.g., Martin Tröndle, Experimental Concert Research) for interviews. This may lead to the following observations.
Identify needs and gaps, from the perspective of practitioners and researchers: What do they lack? What could help them frame, evaluate, or communicate their work?

Ultimately, this could lead to the development of a manual or evaluation guid; something that can serve as a conceptual and practical tool for artists and designers, help them providing to the exploration performative spatial sound and the field of concert design.

From Sound Design to Concert Design

This research journey runs in parallel to my technical development of a spatial sound toolkit (→ previous blog entry), but it also stands on its own. It’s an interesting experience for me, locating my work within a broader context and trying to build some kind of bridge between my individual artistic practice and shared disciplinary structures. This might not be my future field of work, still I have the feeling, I can take this locating approach as a strategy with me and implement in future projects, to elevate them and for better communication towards outsiders.

Sources:

Martin Tröndle, ed., Das Konzert II: Beiträge zum Forschungsfeld der Concert Studies (Bielefeld: transcript Verlag, 2018), https://doi.org/10.1515/9783839443156.

“Folkert Uhde Konzertdesign,” accessed May 25, 2025, https://www.folkertuhdekonzertdesign.de/.

Brian Eno, “Ambient Music,” in Audio Culture: Readings in Modern Music, ed. Christoph Cox and Daniel Warner (New York: Continuum, 2004).

Luis Miehlich, “Cartographies – Ein Halbschlafkonzert (2023) – Pieces for Ensemble, Electronics & Video,” luismiehlich, accessed May 25, 2025, https://luismiehlich.com/.

“Re-Cartographies, by Luis Miehlich,” Bandcamp, accessed May 25, 2025, https://woolookologie.bandcamp.com/album/re-cartographies.

by David Adlberger - 16. April 2025

Prototyping V: Image Extender – Image sonification tool for immersive perception of sounds from images and new creation possibilities

Integration of AI-Object Recognition in the automated audio file search process:

After setting up the initial interface for the freesound.org API and confirming everything works with test tags and basic search filters, the next major milestone is now in motion: AI-based object recognition using the GeminAI API.

The idea is to feed in an image (or a batch of them), let the AI detect what’s in it, and then use those recognized tags to trigger an automated search for corresponding sounds on freesound.org. The integration already loads the detected tags into an array, which is then automatically passed on to the sound search. This allows the system to dynamically react to the content of an image and search for matching audio files — no manual tagging needed anymore.

So far, the detection is working pretty reliably for general categories like “bird”, “car”, “tree”, etc. But I’m looking into models or APIs that offer more fine-grained recognition. For instance, instead of just “bird”, I’d like it to say “sparrow”, “eagle”, or even specific songbird species if possible. This would make the whole sound mapping feel much more tailored and immersive.

A list of test images will be prepared, but there’s already a testing matrix for different objects, situations, scenery and technical differences

On the freesound side, I’ve got the basic query parameters set up: tag search, sample rate, file type, license, and duration filters. There’s room to expand this with additional parameters like rating, bit depth, and maybe even a random selection toggle to avoid repetition when the same tag comes up multiple times.

Coming up: I’ll be working on whether to auto-play or download the selected audio files, and starting to test how the AI-generated tags influence the mood and quality of the soundscape. The long-term plan includes layering sounds, adjusting volumes, experimenting with EQ and filtering — all to make the playback more natural and immersive.

by David Adlberger - 7. April 202516. April 2025

Prototyping IV: Image Extender – Image sonification tool for immersive perception of sounds from images and new creation possibilities

Tests on automated audio file search via freesound.org api:

For further use in the automated audio file search of the recognized objects I tested the freesound.org api and programmed the first interface for testing purposes. The first thing I had to do was request an API-Key by freesound.org. After that I noticed an interesting point to think about using it in my project: it is open for 5000 requests per year, but I will research on possibilities for using it more. For the testing 5000 is more than enough.

The current code already searches with a few testing tags and gives possibilities to filter the searches by samplerate, duration, licence and file type. There might be added more filter possibilities next like rating, bit depth, and maybe the possibility of random file selection so it won’t be always the same for each tag.

Next steps would also include to either download the file or just play it automatically. Then there will be tests on using the tags of the AI image recognition code for this automated search. And later in the process I have to figure out the playback of multiple files, volume staging and filtering or EQing methods for masking effects etc…

Test gui for automated sound searching via freesounds.org API

by David Adlberger - 4. April 2025

IRCAM Forum Workshops 2025 – Promenade Sonore

Sound Meets the City: Nadine Schütz’s Promenade Sonore Transforms a Footbridge into a Living Instrument

We first encountered Nadine Schütz’s fascinating work during her presentation at the IRCAM Forum Workshops 2025, where she introduced her project Promenade Sonore: Vent, Soleil, Pluie (“Wind, Sun, Rain”). The talk offered deep insights into her creative process and the technical and ecological thinking behind the installation.

In the heart of Saint-Denis, just north of Paris, Swiss sound artist Nadine Schütz has reimagined the way we move through and experience urban space. Her project Promenade Sonore: Vent, Soleil, Pluie (“Wind, Sun, Rain”) is not just a public art installation—it’s a multi-sensory experience that turns an ordinary walk across a footbridge into an acoustic encounter with the environment.

Commissioned by Plaine Commune and developed in close collaboration with architect-engineer Marc Mimram, the installation is located on the Pleyel footbridge, a key link between the neighborhoods of Pleyel and La Plaine. Rather than adding passive sound or music, Schütz has embedded three sculptural sound instruments directly into the architecture of the bridge, each one activated by a different natural element: wind, sun, and rain.

These instruments aren’t just symbolic; they actually respond to the environment in real time. Wind passes through a metal structure that produces soft, organ-like tones. When sunlight hits specific points, it activates solar-powered chimes or sound emitters. During rainfall, the structure becomes percussive, resonating with the rhythm of droplets. The bridge becomes a living, breathing instrument that reacts to weather conditions, turning nature into both performer and composer.

What makes Promenade Sonore truly compelling is how seamlessly it blends technology, ecology, and design. It’s not loud or intrusive—it doesn’t drown out the urban soundscape. Instead, it subtly enhances the auditory experience of the city, encouraging passersby to slow down and listen. It transforms a utilitarian space into a space of poetic reflection.

Schütz’s work is rooted in the idea that sound can deepen our connection to place. In this project, she brings attention to the sonic qualities of weather and architecture—things we often overlook in our fast-paced, screen-driven lives. The soundscape is never the same twice: it shifts with the wind, the angle of the sun, or the mood of the rain. Every walk across the bridge is a unique composition.

More than just an artistic gesture, Promenade Sonore is part of a broader vision of “land-sound” design—a practice Schütz has pioneered that treats sound as an essential component of landscape and urban planning. In doing so, she challenges traditional boundaries between art, science, and infrastructure.

Visit of the pleyel bridge

We had the chance to visit the Pleyel footbridge ourselves—and it was a one-of-a-kind experience. Walking across the bridge, immersed in the subtle interplay of environmental sound and sculptural form, was both meditative and inspiring. While on site, we also conducted our own field recordings to capture the dynamic soundscape as it unfolded in real time. Listening through headphones, the bridge became even more alive—each gust of wind, each shifting light pattern, each ambient tone weaving into a delicate, ever-changing composition.

by David Adlberger - 4. April 2025

IRCAM Forum Workshops 2025 – ACIDS

From 26 to 28^th of March, we (the sound design master, second semester) had the incredible opportunity to visit IRCAM (Institut de Recherche et Coordination Acoustique/Musique) in Paris as part of a student excursion. For anyone passionate about sound, music technology, and AI, IRCAM is like stepping into new fields of research, discussion and seeing prototypes in action. One of my personal highlights was learning about the ACIDS team (Artificial Creatiive Intelligence and Data Science) and their research projects—RAVE (Real-time Audio Variational autoEncoder) and AFTER (Audio Features Transfer and Exploration in Real-time

ACIDS – Team

The ACIDS team is a multidisciplinary group of researchers working at the intersection of machine learning, sound synthesis, and real-time audio processing. Their name stands for Audio, Communication, Information, Data, and Sound, reflecting their broad focus on computational audio research. During our visit, they gave us an inside look at their latest developments, including demonstrations from the IRCAM Forum Workshop (March 26–28, 2025), where they showcased some of their most exciting advancements. Beside their really good and catchy (also a bit funny) presentation I want to showcase two projects.

RAVE (Real-Time Neural Audio Synthesis)

One of the most impressive projects we explored was RAVE (Real-time Audio Variational autoEncoder), a deep learning model for high-quality audio synthesis and transformation. Unlike traditional digital signal processing, RAVE uses a latent space representation of sound, allowing for intuitive and expressive real-time manipulation.

Overall architecture of the proposed approach. Blocks in blue are the only ones optimized,
while blocks in grey are fixed or frozen operations.

Key Innovations

Two-Stage Training:
- Stage 1: Learns compact latent representations using a spectral loss.
- Stage 2: Fine-tunes the decoder with adversarial training for ultra-realistic audio.
Blazing Speed:
- Runs 20× faster than real-time on a laptop CPU, thanks to a multi-band decomposition technique.
Precision Control:
- Post-training latent space analysis balances reconstruction quality vs. compactness.
- Enables timbre transfer and signal compression (2048:1 ratio).

Performance

Outperforms NSynth and SING in audio quality (MOS: 3.01 vs. 2.68/1.15) with fewer parameters (17.6M).
Handles polyphonic music and speech, unlike many restricted models.

You can explore RAVE’s code and research on their GitHub repository and learn more about its applications on the IRCAM website.

AFTER

While many AI audio tools focus on raw sound generation, what sets AFTER (Audio Foundation Transformer) apart is its sophisticated control mechanisms—a priority highlighted in recent research from the ACIDS team. As their paper states:

“Deep generative models now synthesize high-quality audio signals, shifting the critical challenge from audio quality to control capabilities. While text-to-music generation is popular, explicit control and example-based style transfer better capture the intents of artists.”

How AFTER Achieves Precision

The team’s breakthrough lies in separating local and global audio information:

Global (timbre/style): Captured from a reference sound (e.g., a vintage synth’s character).
Local (structure): Controlled via MIDI, text prompts, or another audio’s rhythm/melody.

This is enabled by a diffusion autoencoder that builds two disentangled representation spaces, enforced through:

Adversarial training to prevent overlap between timbre and structure.
A two-stage training strategy for stability.

Detailed overview of our method. Input signal(s) are passed to structure and timbre encoders, which provides
semantic encodings that are further disentangled through confusion maximization. These are used to condition a latent
diffusion model to generate the output signal. Input signals are identical during training and but distinct at inference.

Why Musicians Care

In tests, AFTER outperformed existing models in:

One-shot timbre transfer (e.g., making a piano piece sound like a harp).
MIDI-to-audio generation with precise stylistic control.
Full “cover version” generation—transforming a classical piece into jazz while preserving its melody.

Check out AFTER’s progress on GitHub and stay updated via IRCAM’s research page.

References

Caillon, Antoine, and Philippe Esling. “RAVE: A Variational Autoencoder for Fast and High-Quality Neural Audio Synthesis.” arXiv preprint arXiv:2111.05011 (2021). https://arxiv.org/abs/2111.05011.

Demerle, Nils, Philippe Esling, Guillaume Doras, and David Genova. “Combining Audio Control and Style Transfer Using Latent Diffusion.”

by David Adlberger - 4. April 2025

Prototyping III: Image Extender – Image sonification tool for immersive perception of sounds from images and new creation possibilities

Research on sonification of images / video material and different approaches – focus on RGB

The paper by Kopecek and Ošlejšek presents a system that enables visually impaired users to perceive color images through sound using a semantic color model. Each primary color (such as red, green, or blue) is assigned a unique sound, and colors in an image are approximated by the two closest primary colors. These are represented through two simultaneous tones, with volume indicating the proportion of each color. Users can explore images by selecting pixels or regions using input devices like a touchscreen or mouse. The system calculates the average color of the selected area and plays the corresponding sounds. Distinct audio cues indicate image boundaries, and sounds can be either synthetic or instrument-based, with timbre and pitch helping to differentiate them. Users can customize colors and sounds for a more personalized experience. This approach allows for dynamic, efficient exploration of images and supports navigation via annotated SVG formats.

image seperation by Kopecek and Ošlejšek

The review by Sarkar, Bakshi, and Sa offers an overview of various image sonification methods designed to help visually impaired users interpret visual scenes through sound. It covers techniques such as raster scanning, query-based, and path-based approaches, where visual data like pixel intensity and position are mapped to auditory cues. Systems like vOICe and NAVI use high and low-frequency tones to represent image regions vertically. The paper emphasizes the importance of transfer functions, which link image properties to sound attributes such as pitch, volume, and frequency. Different rendering methods—like audification, earcons, and parameter mapping—are discussed in relation to human auditory perception. Special attention is given to color sonification, including the semantic color model introduced by Kopecek and Ošlejšek, which improves usability through clearly distinguishable tones. The paper also explores applications in fields such as medical imaging, algorithm visualization, and network analysis, and briefly touches on sound-to-image conversions.

Principles of the image-to-sound mapping

Matta, Rudolph, and Kumar propose the theoretical system “Auditory Eyes,” which converts visual data into auditory and tactile signals to support blind users. The system comprises three main components: an image encoder that uses edge detection and triangulation to estimate object location and distance; a mapper that translates features like motion, brightness, and proximity into corresponding sound and vibration cues; and output generators that produce sound using tools like Csound and tactile feedback via vibrations. Motion is represented using effects like Doppler shift and interaural time difference, while spatial positioning is conveyed through head-related transfer functions. Brightness is mapped to pitch, and edges are conveyed through tone duration. The authors emphasize that combining auditory and tactile information can create a richer and more intuitive understanding of the environment, making the system potentially very useful for real-world navigation and object recognition.

References

Kopecek, Ivan, and Radek Ošlejšek. 2008. “Hybrid Approach to Sonification of Color Images.” In Third 2008 International Conference on Convergence and Hybrid Information Technology, 721–726. IEEE. https://doi.org/10.1109/ICCIT.2008.152.

Sarkar, Rajib, Sambit Bakshi, and Pankaj K Sa. 2012. “Review on Image Sonification: A Non-visual Scene Representation.” In 1st International Conference on Recent Advances in Information Technology (RAIT-2012), 1–5. IEEE. https://doi.org/10.1109/RAIT.2012.6194495.

Matta, Suresh, Heiko Rudolph, and Dinesh K Kumar. 2005. “Auditory Eyes: Representing Visual Information in Sound and Tactile Cues.” In Proceedings of the 13th European Signal Processing Conference (EUSIPCO 2005), 1–5. Antalya, Turkey. https://www.researchgate.net/publication/241256962.

by David Adlberger - 17. March 2025

Prototyping II: Image Extender – Image sonification tool for immersive perception of sounds from images and new creation possibilities

Expanded research on sonification of images / video material and different approaches:

Yeo and Berger (2005) write in “A Framework for Designing Image Sonification Methods” about the challenge of mapping static, time-independent data like images into the time-dependent auditory domain. They introduce two main concepts: scanning and probing. Scanning follows a fixed, pre-determined order of sonification, whereas probing allows for arbitrary, user-controlled exploration. The paper also discusses the importance of pointers and paths in defining how data is mapped to sound. Several sonification techniques are analyzed, including inverse spectrogram mapping and the method of raster scanning (which already was explained in the Prototyping I – Blog entry), with examples illustrating their effectiveness. The authors suggest that combining scanning and probing offers a more comprehensive approach to image sonification, allowing for both global context and local feature exploration. Future work includes extending the framework to model human image perception for more intuitive sonification methods.

Time on “perpendicular” axis. (Yeo, Berger, 2005)

Sharma et al. (2017) explore action recognition in still images using Natural Language Processing (NLP) techniques in “Action Recognition in Still Images Using Word Embeddings from Natural Language Descriptions.” Rather than training visual action detectors, they propose detecting prominent objects in an image and inferring actions based on object relationships. The Object-Verb-Object (OVO) triplet model predicts verbs using object co-occurrence, while word2vec captures semantic relationships between objects and actions. Experimental results show that this approach reliably detects actions without computationally intensive visual action detectors. The authors highlight the potential of this method in resource-constrained environments, such as mobile devices, and suggest future work incorporating spatial relationships and global scene context.

Iovino et al. (1997) discuss developments in Modalys, a physical modeling synthesizer based on modal synthesis, in “Recent Work Around Modalys and Modal Synthesis.” Modalys allows users to create virtual instruments by defining physical structures (objects), their interactions (connections), and control parameters (controllers). The authors explore the musical possibilities of Modalys, emphasizing its flexibility and the challenges of controlling complex synthesis parameters. They propose applications such as virtual instrument construction, simulation of instrumental gestures, and convergence of signal and physical modeling synthesis. The paper also introduces single-point objects, which allow for spectral control of sound, bridging the gap between signal synthesis and physical modeling. Real-time control and expressivity are emphasized, with future work focused on integrating Modalys with real-time platforms.

McGee et al. (2012) describe Voice of Sisyphus, a multimedia installation that sonifies a black-and-white image using raster scanning and frequency domain filtering in “Voice of Sisyphus: An Image Sonification Multimedia Installation.” Unlike traditional spectrograph-based sonification methods, this project focuses on probing different image regions to create a dynamic audio-visual composition. Custom software enables real-time manipulation of image regions, polyphonic sound generation, and spatialization. The installation cycles through eight phrases, each with distinct visual and auditory characteristics, creating a continuous, evolving experience. The authors discuss balancing visual and auditory aesthetics, noting that visually coherent images often produce noisy sounds, while abstract images yield clearer tones. The project draws inspiration from early experiments in image sonification and aims to create a synchronized audio-visual experience engaging viewers on multiple levels.

Software Interface for Voice of Sisyphus (McGee et al., 2012)

Roodaki et al. (2017) introduce SonifEye, a system that uses physical modeling sound synthesis to convey visual information in high-precision tasks, in “SonifEye: Sonification of Visual Information Using Physical Modeling Sound Synthesis.” They propose three sonification mechanisms: touch, pressure, and angle of approach, each mapped to sounds generated by physical models (e.g., tapping on a wooden plate or plucking a string). The system aims to reduce cognitive load and avoid alarm fatigue by using intuitive, natural sounds. Two experiments compare the effectiveness of visual, auditory, and combined feedback in high-precision tasks. Results show that auditory feedback alone can improve task performance, particularly in scenarios where visual feedback may be distracting. The authors suggest applications in medical procedures and other fields requiring precise manual tasks.

Dubus and Bresin review mapping strategies for the sonification of physical quantities in “A Systematic Review of Mapping Strategies for the Sonification of Physical Quantities.” Their study analyzes 179 publications to identify trends and best practices in sonification. The authors find that pitch is the most commonly used auditory dimension, while spatial auditory mapping is primarily applied to kinematic data. They also highlight the lack of standardized evaluation methods for sonification efficiency. The paper proposes a mapping-based framework for characterizing sonification and suggests future work in refining mapping strategies to enhance usability.

References

Yeo, Woon Seung, and Jonathan Berger. 2005. “A Framework for Designing Image Sonification Methods.” In Proceedings of ICAD 05-Eleventh Meeting of the International Conference on Auditory Display, Limerick, Ireland, July 6-9, 2005.

Sharma, Karan, Arun CS Kumar, and Suchendra M. Bhandarkar. 2017. “Action Recognition in Still Images Using Word Embeddings from Natural Language Descriptions.” In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 978-1-5090-4941-7/17. DOI: 10.1109/WACVW.2017.17.

Iovino, Francisco, Rene Causse, and Richard Dudas. 1997. “Recent Work Around Modalys and Modal Synthesis.” In Proceedings of the International Computer Music Conference (ICMC).

McGee, Ryan, Joshua Dickinson, and George Legrady. 2012. “Voice of Sisyphus: An Image Sonification Multimedia Installation.” In Proceedings of the 18th International Conference on Auditory Display (ICAD-2012), Atlanta, USA, June 18–22, 2012.

Roodaki, Hessam, Navid Navab, Abouzar Eslami, Christopher Stapleton, and Nassir Navab. 2017. “SonifEye: Sonification of Visual Information Using Physical Modeling Sound Synthesis.” IEEE Transactions on Visualization and Computer Graphics 23, no. 11: 2366–2371. DOI: 10.1109/TVCG.2017.2734320.

Dubus, Gaël, and Roberto Bresin. 2013. “A Systematic Review of Mapping Strategies for the Sonification of Physical Quantities.” PLoS ONE 8(12): e82491. DOI: 10.1371/journal.pone.0082491.

by David Adlberger - 10. March 2025

Prototyping I: Image Extender – Image sonification tool for immersive perception of sounds from images and new creation possibilities

Shift of intention of the project due to time plan:

By narrowing down the topic to ensure the feasibility of this project the focus or main purpose of the project will be the artistic approach. The tool will still combine the use of direct image to audio translation and the translation via sonification into a more abstract form. The main use cases will be generating unique audio samples for creative applications, such as sound design for interactive installations, brand audio identities, or matching image soundscapes and the possibility to be a versatile instrument for experimental media artists and display tool for image information.

By further research on different possibilities of sonification of image data and development of the sonification language itself the translation and display purpose is going to get more clear within the following weeks.

Testing of Google Gemini API for AI Object and Image Recognition:

The first testing of the Google Gemini Api started well. There are different models for dedicated object recognition and image recognition itself which can be combined to analyze pictures in terms of objects and partly scenery. These models (SSD, EfficientNET,…) create similar results but not always the same. It might be an option to make it selectable for the user (so that in a failure case a different model can be tried and may give better results). The scenery recognition itself tends to be a problem. It may be a possibility to try out different apis.

The data we get from this AI model is a tag for the recognized objects or image content and a percentage of the probability.

The next steps for the direct translation of it into realistic sound representations will be to test the possibility of using the api of freesound.org to search directly and automated for the recognized object tags and load matching audio files. These search calls also need to filter by copyright type of the sounds and a choosing rule / algorithm needs to be created.

object recognition: efficient float 16 model (Photo by Jason Oh on unsplash)

Research on sonification of images / video material and different approaches:

The world of image sonification is rich with diverse techniques, each offering unique ways to transform visual data into auditory experiences. The world of image sonification is rich with diverse techniques, each offering unique ways to map visual data into auditory experiences. One of the most straightforward methods is raster scanning, introduced by Yeo and Berger. This technique maps the brightness values of grayscale image pixels directly to audio samples, creating a one-to-one correspondence between visual and auditory data. By scanning an image line by line, from top to bottom, the system generates a sound that reflects the texture and patterns of the image. For example, a smooth gradient might produce a steady tone, while a highly textured image could result in a more complex, evolving soundscape. The process is fully reversible, allowing for both image sonification and sound visualization, making it a versatile tool for artists and researchers alike. This method is particularly effective for sonifying image textures and exploring the auditory representation of visual filters, such as “patchwork” or “grain” effects.(Yeo and Berger, 2006)

Principle raster scanning (Yeo and Berger, 2006)

In contrast, Audible Panorama (Huang et al. 2019) automates sound mapping for 360° panorama images used in virtual reality (VR). It detects objects using computer vision, estimates their depth, and assigns spatialized audio from a database. For example, a car might trigger engine sounds, while a person generates footsteps, creating an immersive auditory experience that enhances VR realism. A user study confirmed that spatial audio significantly improves the sense of presence. It contains a interesting concept regarding to choosing a random audio file from a sound library to avoid producing similar or same results. Also it mentions the aspect of postprocessing the audios which also would be a relevant aspect for the image extender project.

principle audible panorama (Huang et al. 2019)

Another approach, HindSight (Schoop, Smith, and Hartmann 2018), focuses on real-time object detection and sonification in 360° video. Using a head-mounted camera and neural networks, it detects objects like cars and pedestrians, then sonifies their position and danger level through bone conduction headphones. Beeps increase in tempo and pan to indicate proximity and direction, providing real-time safety alerts for cyclists.

Finally, Sonic Panoramas (Kabisch, Kuester, and Penny 2005) takes an interactive approach, allowing users to navigate landscape images while generating sound based on their position. Edge detection extracts features like mountains or forests, mapping them to dynamic soundscapes. For instance, a mountain ridge might produce a resonant tone, while a forest creates layered, chaotic sounds, blending visual and auditory art. It also mentions different approaches for sonification itself. For example the idea of using micro (timbre, pitch and melody) and macro level (rhythm and form) mapping.

principle sonic panoramas (Kabisch, Kuester, and Penny 2005)

Each of these methods—raster scanning, Audible Panorama, HindSight, and Sonic Panoramas—demonstrates the versatility of sonification as a tool for transforming visual data into sound and lead keeping these different approaches in mind for developing my own sonification language or mapping method. It also leads to further research by checking some useful references they used in their work for a deeper understanding of sonification and extending the possibilities.

References

Huang, Haikun, Michael Solah, Dingzeyu Li, and Lap-Fai Yu. 2019. “Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery.” In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1–11. Glasgow, Scotland: ACM. https://doi.org/10.1145/3290605.3300851.

Kabisch, Eric, Falko Kuester, and Simon Penny. 2005. “Sonic Panoramas: Experiments with Interactive Landscape Image Sonification.” In Proceedings of the 2005 International Conference on Artificial Reality and Telexistence (ICAT), 156–163. Christchurch, New Zealand: HIT Lab NZ.

Schoop, Eldon, James Smith, and Bjoern Hartmann. 2018. “HindSight: Enhancing Spatial Awareness by Sonifying Detected Objects in Real-Time 360-Degree Video.” In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 1–12. Montreal, QC, Canada: ACM. https://doi.org/10.1145/3173574.3173717.

Yeo, Woon Seung, and Jonathan Berger. 2006. “Application of Raster Scanning Method to Image Sonification, Sound Visualization, Sound Analysis and Synthesis.” In Proceedings of the 9th International Conference on Digital Audio Effects (DAFx-06), 311–316. Montreal, Canada: DAFx.