Product II: Image Extender

Dual-Model Vision Interface – OpenAI × Gemini Integration for Adaptive Image Understanding

Following the foundational phase of last week, where the OpenAI API Image Analyzer established a structured evaluation framework for multimodal image analysis, the project has now reached a significant new milestone. The second release integrates both OpenAI’s GPT-4.1-based vision models and Google’s Gemini (MediaPipe) inference pipeline into a unified, adaptive system inside the Image Extender environment.

Unified Recognition Interface

In The current version, the recognition logic has been completely refactored to support runtime model switching.
A dropdown-based control in Google Colab enables instant selection between:

  • Gemini (MediaPipe) – for efficient, on-device object detection and panning estimation
  • OpenAI (GPT-4.1 / GPT-4.1-mini) – for high-level semantic and compositional interpretation

Non-relevant parameters such as score threshold or delegate type dynamically hide when OpenAI mode is active, keeping the interface clean and focused. Switching back to Gemini restores all MediaPipe-related controls.
This creates a smooth dual-inference workflow where both engines can operate independently yet share the same image context and visualization logic.

Architecture Overview

The system is divided into two self-contained modules:

  1. Image Upload Block – handles external image input and maintains a global IMAGE_FILE reference for both inference paths.
  2. Recognition Block – manages model selection, executes inference, parses structured outputs, and handles visualization.

This modular split keeps the code reusable, reduces side effects between branches, and simplifies later expansion toward GUI-based or cloud-integrated applications.

OpenAI Integration

The OpenAI branch extends directly from Last week but now operates within the full environment.
It converts uploaded images into Base64 and sends a multimodal request to gpt-4.1 or gpt-4.1-mini.
The model returns a structured Python dictionary, typically using the following schema:

{

    “objects”: […],

    “scene_and_location”: […],

    “mood_and_composition”: […],

    “panning”: […]

}

A multi-stage parser (AST → JSON → fallback) ensures robustness even when GPT responses contain formatting artifacts.

Prompt Refinement

During development, testing revealed that the English prompt version initially returned empty dictionaries.
Investigation showed that overly strict phrasing (“exclusively as a Python dictionary”) caused the model to suppress uncertain outputs.
By softening this instruction to allow “reasonable guesses” and explicitly forbidding empty fields, the API responses became consistent and semantically rich.

Debugging the Visualization

A subtle logic bug was discovered in the visualization layer:
The post-processing code still referenced German dictionary keys (“objekte”, “szenerie_und_ort”, “stimmung_und_komposition”) from Last week.
Since the new English prompt returned English keys (“objects”, “scene_and_location”, etc.), these lookups produced empty lists, which in turn broke the overlay rendering loop.
After harmonizing key references to support both language variants, the visualization resumed normal operation.

Cross-Model Visualization and Validation

A unified visualization layer now overlays results from either model directly onto the source image.
In OpenAI mode, the “panning” values from GPT’s response are projected as vertical lines with object labels.
This provides immediate visual confirmation that the model’s spatial reasoning aligns with the actual object layout, an important diagnostic step for evaluating AI-based perception accuracy.

Outcome and Next Steps

The project now represents a dual-model visual intelligence system, capable of using symbolic AI interpretation (OpenAI) and local pixel-based detection (Gemini).

Next steps

The upcoming development cycle will focus on connecting the openAI API layer directly with the Image Extender’s audio search and fallback system.

Zwischen Bild und Ton – Kritische Bewertung der Masterarbeit “Automatic Sonification of Video Sequences” von Andrea Corcuera Marruffo

Grundlegendes

Autorin: Andrea Corcuera Marruffo
Titel: Automatic Sonification of Video Sequences through Object Detection and Physical Modelling
Hochschule: Aalborg University Copenhagen
Studiengang: MSc Sound and Music Computing
Jahr: 2017

Die Arbeit von Andrea Corcuera Marruffo untersucht die automatische Erzeugung von Foley-Sounds aus Videosequenzen. Ziel ist es, audiovisuelle Inhalte algorithmisch zu sonifizieren, indem visuelle Informationen, z.B. Materialeigenschaften oder Objektkollisionen, mithilfe von Convolutional Neural Networks (nutzung des YOLO models) analysiert und anschließend physikalisch modellierte Klänge synthetisiert werden. Damit positioniert sich die Arbeit an der Schnittstelle von Klangsynthese, teilweise software und coding und Wahrnehmung, ein Feld, das in der Medienproduktion wie auch in der künstlerischen Forschung zunehmende Relevanz besitzt und entsprechend auch überschneidungen zum Grundkonzept meiner vorstehenden Masterarbeit.

Das „Werkstück“ besteht aus einem funktionalen Prototypen, der Videos analysiert, Objekte klassifiziert und deren Interaktionen in synthetisierte Klänge übersetzt. Ergänzt wird dieses Tool durch eine Evaluation, in der audiovisuelle Stimuli hinsichtlich ihrer Plausibilität und wahrgenommenen Qualität getestet werden.

Bewertung

systematisch anhand der Beurteilungskriterien des Studiengangs CMS

(1) Gestaltungshöhe

Die Arbeit zeigt eine sehr gute technische Tiefe und eine klare methodische Struktur. Der Aufbau ist logisch, die Visualisierungen (z. B. Flussdiagramme, Spektrogramme) sind nachvollziehbar und unterstützen das Verständnis des Prozesses.

(2) Innovationsgrad

Der Ansatz, Foley-Sound automatisch (unter dem Einsatz von „physical modelling“) zu generieren, wurde zum Zeitpunkt der Veröffentlichung (2017) nur vereinzelt erforscht. Die Verbindung von Object Detection und Physical Modelling stellt daher einen innovativen Beitrag im Bereich „Computational Sound Design“ dar.

(3) Selbstständigkeit

Die Arbeit zeigt eine deutliche Eigenleistung. Die Autorin erstellt ein eigenes Dataset, modifiziert Trainingsdaten und implementiert das YOLO Model in einer angepassten Form. Auch die Syntheseparameter werden experimentell abgeleitet. Die Eigenständigkeit ist daher sowohl konzeptionell als auch technisch vorhanden.

(4) Gliederung und Struktur

Die Struktur folgt einem klassischen wissenschaftlichen Aufbau. Theorie, Implementierung, Evaluation, Schlussfolgerung. Kapitel sind klar fokussiert, jedoch teils stark technisch geprägt, was die Lesbarkeit für fachfremde Leser einschränken kann. Eine visuellere Darstellung der Evaluationsmethodik hätte das eventuell verbessert.

(5) Kommunikationsgrad

Die Arbeit ist insgesamt verständlich und präzise formuliert. Fachtermini werden sorgfältig eingeführt, Abbildungen sind beschriftet und logisch eingebunden. Der sprachliche Stil ist sachlich, allerdings manchmal zu stark an technischer Dokumentation orientiert. Narrative Reflexionen zu Designentscheidungen oder ästhetischen Überlegungen fehlen weitgehend, was anhand des Studiengangs, welcher sich nicht hauptsächlich an design orientiert verständlich und nachvollziehbar ist.

(6) Umfang der Arbeit

Mit über 30 Seiten Haupttext und zusätzlichem Anhang ist der Umfang angemessen. Die Balance zwischen Theorie, Umsetzung und Evaluation ist gelungen. Die empirische Studie mit 15 Proband bleibt jedoch relativ klein, wodurch die statistische Aussagekraft begrenzt ist.

(7) Orthographie, Sorgfalt und Genauigkeit

Die Arbeit ist durchgängig formal korrekt und methodisch sorgfältig dokumentiert. Kleinere sprachliche Unschärfen („he first talkie film“) mindern den Gesamteindruck kaum. Zitate und Quellenverweise sind konsistent.

(8) Literatur Das Literaturverzeichnis zeigt eine solide theoretische Fundierung. Es werden gängige Quellen zu Sound Synthesis, Modal Modelling und Neural Networks verwendet (Smith, Farnell, Van den Doel). Allerdings wären aktueller Medien- oder Wahrnehmungsforschung (durch z. B. Sonic Interaction Design, Embodied Sound Studies) noch eine spannende Ergänzung hinsichtlich Forschungsliteratur gewesen.

Abschließende Einschätzung

Insgesamt überzeugt die Arbeit durch ihren innovativen Ansatz, die methodische Präzision und die gelungene Umsetzung eines komplexen Systems. Die Evaluation zeigt kritisch die Grenzen des Modells auf (Objektgenauigkeit und Synchronisationsprobleme), was die Autorin reflektiert und nachvollziehbar einordnet.

Stärken: klare Struktur, hohes technisches Niveau, origineller Forschungsansatz, eigenständige Implementierung.
Schwächen: begrenzte ästhetische Reflexion, kleine Stichprobe in der Evaluation, eingeschränkte Materialvielfalt.

Critical Review: “Sound response to physicality – Artistic expressions of movement sonification” by Aleksandra Joanna Słyż (Royal College of Music, 2022)

by Verena Schneider, CMS24 Sound Design Master 

The master thesis “Sound Response to Physicality: Artistic Expressions of Movement Sonification” was written by Aleksandra Joanna Słyż in 2022 at the Royal College of Music in Stockholm (Kungliga Musikhögskolan; Stockholm, Sweden).

Introduction

I chose Aleksandra Słyż’s master thesis because her topic immediately resonated with my own research interests. In my master project I am working with the x-IMU3 motion sensor to track surf movements and transform them into sound for a surf documentary.
During my research process, the question of how to sonify movement data became central, and Słyż’s work gave me valuable insights into which parameters can be used and how the translation from sensor to sound can be conceptually designed.

Her thesis, Sound response to physicality, focuses on the artistic and perceptual dimensions of movement sonification. Through her work Hypercycle, she explores how body motion can control and generate sound in real time, using IMU sensors and multichannel sound design. I found many of her references—such as John McCarthy and Peter Wright’s Technology as Experience—highly relevant for my own thesis.

Gestaltungshöhe – Artistic Quality and Level of Presentation

Słyż’s thesis presents a high level of artistic and conceptual quality. The final piece, Hypercycle, is a technically complex and interdisciplinary installation that connects sound, body, and space. The artistic idea of turning the body into a musical instrument is powerful, and she reflects deeply on the relation between motion, perception, and emotion.

Visually, the documentation of her work is clear and professional, though I personally wished for a more detailed sonic description. The sound material she used is mainly synthesized tones—technically functional, but artistically minimal. As a sound designer, I would have enjoyed a stronger exploration of timbre and spatial movement as expressive parameters.

Innovationsgrad – Innovation and Contribution to the Field

Using motion sensors for artistic sonification is not entirely new, yet her combination of IMU data, embodied interaction, and multichannel audio gives the project a strong contemporary relevance. What I found innovative was how she conceptualized direct and indirect interaction—how spectators experience interactivity even when they don’t control the sound themselves.

However, from a technical point of view, the work could have been more transparent. I was missing a detailed explanation of how exactly she mapped sensor data to sound parameters. This part felt underdeveloped, and I see potential for future work to document such artistic systems more precisely.

Selbstständigkeit – Independence and Original Contribution

Her thesis clearly shows independence and artistic maturity. She worked across disciplines—combining psychology, music technology, and perception studies—and reflected on her process critically. I especially appreciated that she didn’t limit herself to the technical side but also integrated a psychological and experiential perspective.

As someone also working with sensor-based sound, I can see how much self-direction and experimentation this project required. The depth of reflection makes the work feel authentic and personal.

Gliederung und Struktur – Structure and Coherence

The structure of the thesis is logical and easy to follow. Each chapter begins with a quote that opens the topic in a poetic way, which I found very effective. She starts by explaining the theoretical background, then moves toward the technical discussion of IMU sensors, and finally connects everything to her artistic practice.

Her explanations are written in clear English, and she carefully defines all important terms such as sonificationproprioception, and biofeedback. Even readers with only basic sound design knowledge can follow her reasoning.

Kommunikationsgrad – Communication and Expression

The communication of her ideas is well-balanced between academic precision and personal reflection. I like that she uses a human-centered language, often describing how the performer or spectator might feel within the interactive system.

Still, the technical documentation of the sonification process could be more concrete. She briefly shows a Max/MSP patch, but I would have loved to understand more precisely how the data flow—from IMU to sound—was built. For future readers and practitioners, such details would be extremely valuable.

Umfang – Scope and Depth

The length of the thesis (around 50 pages) feels appropriate for the topic. She covers a wide range of areas: from sensor technology and perception theory to exhibition practice and performance philosophy.
At the same time, I had the impression that she decided to keep the technical parts lighter, focusing more on conceptual reflection. For me, this makes the thesis stronger as an artistic reflection, but weaker as a sound design manual.

Orthography, Accuracy, and Formal Care

The thesis is very carefully written and proofread. References are consistent, and the terminology is accurate. She integrates both scientific and artistic citations, which gives the text a professional academic tone.
The layout is clear, and the visual elements (diagrams, performance photos) are well placed.

Literature – Quality and Relevance

The literature selection is one of the strongest aspects of this work. She cites both technical and philosophical sources—from G. Kramer’s Sonification Report to McCarthy & Wright’s Technology as Experience and Tanaka & Donnarumma’s The Body as Musical Instrument.
For me personally, her bibliography became a guide for my own research. I found new readings that I will also include in my master thesis.

Final Assessment – Strengths, Weaknesses, and Personal Reflection

Overall, Sound response to physicality is a well-balanced, thoughtful, and inspiring thesis that connects technology, perception, and art.
Her biggest strength lies in how she translates complex sensor-based interactions into human experience and emotional resonance. The way she conceptualizes embodied interaction and indirect interactivity is meaningful and poetic.

The main weakness, in my opinion, is the lack of detailed technical documentation—especially regarding how the IMU data was mapped to sound and multichannel output. As someone building my own sonification system with the x-IMU3 and contact microphones, I would have loved to see the exact data chain from sensor to audio.

Despite that, her work inspired me profoundly. It reminded me that the psychological and experiential dimensions of sound are just as important as the data itself. In my own project, where I sonify the movement of a surfboard and the feeling of the ocean, I will carry this understanding forward: that sonification is not only about data translation but about shaping human experience through sound.

Opera meets code: Philippe Manoury’s Die letzten Tage der Menschheit

At our visit at the IRCAM-institute during our Paris-excursion I visited a panel talk, that described the workflow in creating a multi-media opera, that lies at the intersection of traditional opera and contemporary music technology and that struck me: Die letzten Tage der Menschheit (The Last Days of Mankind) by French composer Philippe Manoury. Based on the extensive anti-war drama by Austrian writer Karl Kraus, the work premiered at the Cologne Opera in June 2025 and reflects on themes of conflict, media, and societal collapse.

The Material

Karl Kraus wrote Die letzten Tage der Menschheit during and after World War I. The text consists of over 220 short scenes, depicting fragments of daily life, political rhetoric, and journalistic distortion that led to the chaos of the war. Due to its scale and structure, Kraus himself considered the piece impossible to stage in its entirety.

Manoury’s adaptation condenses the material into a three-hour opera. Rather than present a straightforward narrative, the production offers a layered and often disjointed sequence of impressions and reflections. Manoury and director Nicolas Stemann refer to the result as a “Thinkspiel”, a hybrid of the German Spiel (play) and the English “think”, suggesting a theatre of ideas rather than linear storytelling.

Blending Acoustic and (digital)Electronic Practice

Manoury, known for his work with live electronics, collaborated closely with IRCAM (Institut de Recherche et Coordination Acoustique/Musique) in developing this opera. He used tools such as Antescofo, a real-time score-following system that syncs live instrumental input with preprogrammed electronic components, and PureData, a visual programming environment designed for audio synthesis and spatial control.

The system enables audio to follow performers in real time, allowing electronics to respond to spoken text, instrumental timing, and stage movement. Manoury worked with Miller Puckette, the creator of PureData, to develop new modules tailored to the opera’s needs, including a granular speech-processing system that tracks vocal input spatially on stage.

This setup allowed for integration of a full orchestra, live electronics, spoken word, and multimedia, with a focus on flexibility and performer interaction during rehearsals and live performance.

Structure and Staging

The opera is divided into two distinct parts. The first presents loosely chronological scenes from the First World War, focusing on figures such as politicians, journalists, and ordinary citizens. The second part is meant to be a reflection and takes a more abstract and philosophical tone, exploring themes such as violence, historical memory, and self-destruction.

A newly introduced character, Angelus Novus acts as an observer throughout the piece. Performed by mezzo-soprano Anne Sofie von Otter, the character provides continuity and commentary across the fragmented scenes.

The staging involves video projections, live camera feeds, war imagery, and a modular stage design. The visual components are used not for spectacle but to support the opera’s shifting focus and tonal contrasts.

A Contemporary Approach to Historical Events

Die letzten Tage der Menschheit does not aim for easy accessibility. Its structure, sound design, and subject matter are complex and at times demanding. However, the production reflects current interests in combining artistic disciplines and using digital tools to reinterpret historical works.

Rather than retell World War I history, the opera focuses on atmosphere and fragmentation, using both musical and technological language to examine how war, media, and misinformation interact, which in my opinion is as relevant as ever in the face of current events.

Sources:

https://antescofo-doc.ircam.fr

https://www.oper.koeln/de/produktionen/die-letzten-tage-der-menschheit/1018

https://www.philippemanoury.org/8584-2/

https://de.wikipedia.org/wiki/Die_letzten_Tage_der_Menschheit

https://www.youtube.com/watch?v=yG9OFe2IE7A

SURFBOARD PROTOTYPE CONSTRUCTION

The base model and final prototype selected for this project is built on top of my own personal shortboard. It is measuring 5 feet 9 inches in length and is made for faster maneuvers like the cutback because of its short length and small volume. Considering these factors the board was selected due to its size and shape, which offer a wider range of motion and faster changes of speed and rotation in comparison to a longboard. Also, the dynamical movement and the internal board vibrations will be different than the one of a longboard or a board with a higher volume. Before the construction, a planning session was conducted with the Noa team to identify the ideal locations for sensor placement, cable routing, mounting of the housing, and material usage considering the exposure to saltwater.

Noa surfboards is a small factory for shaping mostly shortboards and riverboards. With their own shaping studio, they represent one of the few professional shapers in the region of Austria and Germany. This studio was chosen for the professional knowledge and experience of shaping to develop a well-functioning and safe protype.  

Looking at the building phase of the protype, Noa Surfboards proposed embedding the piezo disc underneath the front-foot zone of the deck. This area is perfect to capture the movement of the surfer, while not being under strong impact of the bodyweight of the surfer. In order to integrate the microphone in the body of the board a rectangular section of the fiberglass top layer was carefully removed. In the next step the piezo disc was mounted directly to the raw material. To protect the microphone from external impacts and the saltwater multiple layers of fiberglass cloth were laid over the sensor and encapsulate the mic completely. 

Another critical technical step was to route the cable from the embedded mic to the waterproof electronics box. Therefore, a narrow channel was drilled on the side of the box for the cable to enter. 

Inside the case, the Zoom H4n recorder and x-IMU3 sensor were suspended in a foam block designed to isolate the electronics from board vibrations and strong impacts. 

  1. Evaluation of the prototype

SURF SKATE SIMULATION AND TEST RECORDINGS

Purpose of the Simulation

Before deploying the system in ocean conditions, a controlled test was performed using a surf skate on land in order to structure the synchronization part of the different media in advance. Therefore, the simulation served multiple purposes:

  • First, to test the stability and functionality of the hardware setup under strong movements
  • To collect and analyze motion data from surfing-like movements like the cutback using the ximu3 sensor
  • To test and evaluate the contact microphone’s responsiveness to board interaction and different movement patterns
  • To practice audiovisual synchronization between footage an external camera setup, the Zoom H4n recorder, the contact microphone and the x-IMU3 motion data.

Therefore, the surf skate was chosen because of its closely representation of  the body movement and board rotation then surfing. Especially the cutback movement can be imitated by using a skate ramp.  

This testing setup consists of the following tools:

  • A Carver-style surf skateboard
  • The x-IMU3 sensor mounted on the bottom of the board to capture movement dynamics
  • The Piezo contact microphone taped next to the motion sensor on the bottom of the board. After testing the microphone was placed in the middle of the skateboard deck in order to capture the movement of both axes of the board at the same amount of loudness. Placing the microphone closer to the wheels of the board would result in much more noise in the recording due to the internal rotation of the axes. 
  • The Zoom H4n recorder was help in the hand of the skater and was connected to closed over ear headphones. 
  • Using the external film camera Sony Alpha 7iii the whole test was captured. This additional recording was helpful later in the synchronization part. 

The board was ridden in a skate ramp simulating the composition of the wave. ON the top of the ramp the cutback movement can be executed. 

A skateboard with headphones and a remote

AI-generated content may be incorrect.

At the start of the recording session, all devices were synchronized through a short impulse sound (hitting on the board) recorded on all three devices: Zoom, GoPro, and x-IMU3. The single surf skate tackes lasted approximately 2 minutes of recording and were repeated multiple times. 
The data recorded consists of:

  • accelerometer, gyroscope, orientation from the x-IMU3
  • Mono WAV audio from the contact mic
  • 1080p video footage from the external camera

The files were transferred and loaded into the respective analysis environments:

The x-IMU3 data was decoded using the official GUI and exported as CSV files;

The WAV audio was imported into REAPER and cross-referenced with the GoPro’s audio to align the sync impulse;

Motion data was plotted using Python and matched frame-by-frame to movement events in the video.

The result was a perfectly aligned audio-motion-video composite, usable both for analysis and composition.

  1.  Observations and Results

The contact mic successfully captured vibrational data including surface noise, carving intensity, and road texture;

The x-IMU3 data revealed clear peaks in angular velocity during simulated cutbacks and sharp turns;

The GoPro footage confirmed that movement gestures correlated well with sonic and motion data markers;

The Pelican case and foam provided sufficient shock insulation and no overheating or component failure occurred;

The synchronization method using a single impulse sound proved highly reliable.

The surf skate test validated the concept and highlighted important considerations:

Movement-based sonic gestures are highly expressive and usable for composition;

Vibration sensitivity of the contact mic is sufficient for detailed sound capture;

The sync strategy will work equally well in ocean sessions with minor adjustments;

Battery and storage life are adequate for short-to-medium-length surf sessions;

Cable insulation and structural mounting are durable under stress.

This test confirmed the system’s readiness for its full application in Morocco, where ocean sessions will build upon the structure and learnings of this simulation.

SOUND DESIGN METHODOLOGY

The motivation of using different methodes to do the sound design for this surf movie comes from a lack of surf movies and documentaries that use sound design based on field recordings in this area. With this project I want to showcase how many layers of realness can be added to a surf documentary by using on set field recordings paired with sensor data to convey this experience of surfing on a much deeper level. 
Therefore, the Sound design of this project is not seen as a post-processing effect but as an integral part of how motion and environmental interaction are perceived. The core idea and mission is to treat the surfer’s movement as a gestural performance and dance that modulates audio based on what is actually happening in the image. With the help of Pure Data, a modular synthesis environment, the motion data is mapped on audio processing parameters to underline this immersive and expressive sonic storyline.

Starting with the different sound design inputs that will be used in the Surf film, the primary audio material comes from a contact microphone imbedded in the surfboard. These are real, physical vibrations, bumps, hits, and subtle board resonances create the basic sonic texture of the piece. These raw recordings are used as:

  • Input for audio modulation
  • Triggers or modulating sources for effects like pitch shifting, filtering, and delay

Second core sound source is the Zoom H4n recorder mounted on the nose of the board. Here the focus lies strongly on field recordings in order to capture the raw sonic experience of the surf session. 
Furthermore, the data of the sensor will be adjusting the soundscape, translating raw data into modulation for sound design. 
Also, the internal audio of the GoPro Hero3 will be used to synchronize data in post processing and the recorded video will be a visual representation of the full experience.  

Looking at the mapping part of the project, the x-IMU3 sensor provides multiple streams of data like acceleration, gyroscopic rotation, and orientation, that are mapped to sound parameters. Each data of movement is used differently:

Acceleration (X, Y, Z) modulates filter cutoff, grain density, or reverb size. Here the exact usage of modulation parameters will be discussed in the postproduction phase of the project. 

Angular velocity controls pitch shift, stereo panning, or feedback loops. 

Orientation (Euler angles or quaternion) is used to gate effects or trigger events based on the recorded movement thresholds.

The mappings will be adjusted in the following process and are designed to reflect the physical sensation of surfing in the most accurate way possible. Looking at the movement that is most important, the Cutback move, here a sharp move will translate in a spike in angular velocity. This spike can be translated in a big glitch sound effect. Here more research and test will be needed in order to find the best parameter settings for this modulation. 

One possibility of audio modulation in Pure Data will be the granular synthesis. It allows to create evolving textures from short segments, like grain noise of the recorded contact mic sounds. 
Further examples of possible modulations: 

  • Grain size – (short = more textural, long = more tonal)
  • Playback speed and pitch
  • Density and overlap of grains

Looking at the storyline of the surf documentary one can pinpoint the following narrative structure of the sound design: 

Before the surf / coastline of Morocco

To catch the stressful coastal live of Morocco field recordings will be used to translate this feeling of stress, being overwhelmed (red mind).  Here the recordings will be done by the Zoom H4n recorder. 

Entering the water/ Paddling Phase

As the surfer enters the water the stressful coastal sounds fade, and the listener will be surrounded by the sound of the ocean. Here it is important to translate the soundscape, which the surfer actually perceives. No further sound modulation is added here. The theory of the blue mind points out how much the noise of the ocean can regulate the nervous system. This will be translated to the sound design of this section of the movie, giving the listener the feeling of being in the present. 

Catching the wave

As soon as the surfer catches the wave and manages to stand up on the wave the dramaturgical main part of the composition begins. This will be initialized by a strong impact on the contact microphone, triggered by the jump of the person. This will also be measurable on the motion sensor with increase of speed. At this point of the composition the sound modulation starts. 

Riding the wave / Cutbacks: At this stage of the movie the person feels a sensation of absolute presence and high focus. This natural high state gives a feeling that is hardly describable in words or images. Here the Sound Desing carries the listener through. Granular synthesis, stereo modulation and filtered resonance reflecting the physical and spiritual intensity in this moment. Here the tool of sound modulation is chosen intentionally to also create a contrast between the paddling stage of the movie.

End of the riding / Hit of the wave

In the end of the movie the surfer will fall in the water creating a strong and impactful ending of the whole experience. This sudden cut will be auditory through a big amount of noise of the underwater recording. Nothing more than muffled wave sounds will be heard to empathize the feeling of being underwater. Sonic textures will decay leaving with a feeling of stillness after this intense movement. 

With the help of this sonic structure both the physical and emotional journey of a surf session is captured and represented.

Considering the final sound piece a stereo format is the first output. Also including spatial depth will be achieved through modulation and stereo imaging based on the recorded motion data. Volume normalization and dynamic range control are applied in Davinci Resolve, however by respecting the intention of the sound piece to add less additional audio modulation by a software and only using techniques of audio manipulation using the sensory data. 

The final audio and movie is intended for headphone or multichannel playback in an installations or possible surf exhibitions.

HARDWARE SYSTEM SURFBOARD


  1. 1.1. OVERVIEW OF THE SETUP
    The hardware setup of this project was developed to function and withstand under the challenging environmental conditions typical for surfing. Therefore, the full equipment needs to not only be made for saltwater exposure, but also be strong enough to handle strong hits and impacts. The sunlight, and hot temperatures also act as another impactor. Therefore, building components were selected based on their stability, mobility, and compactness. The complete system includes a waterproof Pelican 1050 case mounted on the surfboard, containing a Zoom H4n audio recorder, a piezoelectric contact microphone and an x-IMU3 motion sensor. An externally mounted GoPro Hero 3 camera records video and sound. The interior of the Peli case is filed with protective foam to minimize shock and mechanical disturbance. Concluding, the arrangement was optimized to allow a smooth operation during surfing while maintaining robust data acquisition.

1.2. MOTION SENSOR – X-IMU3

The x-IMU3, was developed by x-io Technologies. It is a compact inertial measurement tool (IMU) capable of logging tri-axis accelerometer, gyroscope, magnetometer and orientation data with timestamp precision. For this application, the sensor operated in standalone mode and will be charged by an external small power bank later retrieval. After each recording session, the x-IMU3 GUI and SDK were used to decode. ximu3 binary files into structured CSV datasets (x-io Technologies, 2024). These data streams are then available for the synchronization part with audio and video recordings. Furthermore, these recorded values will be used to manipulate the recorded audio using Pure Data.

The x-IMU3 sensor was selected due to its reliability, sampling rate of up to 500 Hz, and OSC-compatible output structure. This enables later integration with sound synthesis software’s in the later process. The sensor is placed in the box cushioned within protective foam in the Pelican case to minimize noise artifacts caused by board vibration.

1.3. CONTACT MICROPHONE – PIEZO DISC
In order to add another dimension to the sound recording by capturing board vibrations and internal mechanical changes, a piezoelectric contact microphone was mounted beneath the surfboard wax layer, at the right side of the nose, near the front foot position. Unlike traditional microphones, piezo elements record vibrations through physical material contact, making them suitable for capturing impactful sound events. Also, due the good implementation movements of the surfer on the board are recorded very well. The sensor is routed to the case using a sealed cable channel and insulation to prevent water from getting in the box or inside the board.
This microphone setup allows for the recording of impactful events such as hits, flex, and frictional interactions between the board, the water and the surfer. These signals, together with the recordings of the zoom, form the primary audio source used in the sonic interpretation of the surf session. This implementation of a piezo mic in a surfboard has not been done or documented before and is therefore an innovative approach which is of course interesting for sound engineers, as well as surfers and surfboard builder (Truax, 2001).

1.4. AUDIO RECORDER – ZOOM H4N
The audio data was recorded using a Zoom H4n Handy Recorder, configured to capture a mono signal from the contact microphone. The recorder was selected for its portability, sound quality (24-bit/44.1 kHz), and dual XLR/TRS inputs. It was housed inside the Pelican case using closed-cell foam to dampen mechanical noise. Battery-powered operation and SD card storage enabled autonomous recording during mobile sessions.
Gain levels were calibrated before each session to preserve signal integrity and prevent clipping. The system was designed to ensure consistent signal acquisition even under dynamic surf conditions (Zoom Corporation, 2023).

1.5. VISUAL SYNCHRONIZATION – GOPRO HERO 3
To also have a video output of the surf session, GoPro Hero 3 camera is mounted at the board’s nose. This video material served as both documentation and reference for synchronization. Here, the synchronization of different audio sources and the sensor data is challenging but will made easier with having audiovisual references. For example, a double tapping on the board can help synchronize image to sound. The GoPro’s audio, while limited in quality, served as another layer reference for alignment.
In addition, the video recordings serve also as a tool to analyze body posture, movement patterns, and spatial context (Watkinson, 2013). The surf movie will be consisting of many shots taken by the GoPro and will support the surf film with an immersive camera angle.


1.6. ENCLOSURE AND MOUNTING
– PELICAN CASE 1050
The Zoom Recorder, sensor, power bank and cables of the contact microphone are enclosed in a Pelican 1050 Micro Case. This model was selected for its IP67-rated waterproof sealing, shock resistance, and small form, making it not too bulky on the board, but still big enough to fit all the necessary equipment.
Moving forward, the case is mounted to the surfboard using strong glue and surfboard wax and is incorporated in the general body of the board. In order to connect the contact microphone from outside to the inside, one hole was made in the box. This hole is again sealed with silicone caulk to make it leak and saltwater proof.

Inside, the box a special Peli foam is inserts to prevent internal motion and a fixation for the sensor and the recorder.
The case and cabling configuration underwent field testing, including simulated riding on a surf skate and controlled submersion for a specific amount of time, to ensure no leakage will occur during recording

Post 1: Listening to the Ocean

– The Emotional Vision Behind Surfboard Sonification

Surfing is more than just a sport. For many surfers, it is a ritual, a form of meditation, and an experience of deep emotional release. There is a unique silence that exists out on the water. It is not the absence of sound but the presence of something else: a sense of connection, stillness, and immersion. This is where the idea for “Surfboard Sonification” was born. It began not with technology, but with a feeling. A moment on the water when the world quiets, and the only thing left is motion and sensation.

The project started with a simple question: how can one translate the feeling of surfing into sound? What if we could make that feeling audible? What if we could tell the story of a wave, not through pictures or words, but through vibrations, resonance, and sonic movement?

My inspiration came from both my personal experiences as a surfer and from sound art and acoustic ecology. I was particularly drawn to the work of marine biologist Wallace J. Nichols and his theory of the “Blue Mind.” According to Nichols, being in or near water has a scientifically measurable impact on our mental state. It relaxes us, improves focus, and connects us to something larger than ourselves. It made me wonder: can we create soundscapes that replicate or amplify that feeling?

In addition to Nichols’ research, I studied the sound design approaches of artists like Chris Watson and Jana Winderen, who work with natural sound recordings to create immersive environments. I also looked at data-driven artists such as Ryoji Ikeda, who transform abstract numerical inputs into rich, minimalist sonic works.

The goal of Surfboard Sonification was to merge these worlds. I wanted to use real sensor data and field recordings to tell a story. I did not want to rely on synthesizers or artificial sound effects. I wanted to use the board itself as an instrument. Every crackle, vibration, and movement would be captured and turned into music—not just any music, but one that feels like surfing.

The emotional journey of a surf session is dynamic. You begin on the beach, often overstimulated by the environment. There is tension, anticipation, the chaos of wind, people, and crashing waves. Then, as you paddle out, things change. The noise recedes. You become attuned to your body and the water. You wait, breathe, and listen. When the wave comes and you stand up, everything disappears. It’s just you and the ocean. And then it’s over, and a sense of calm returns.

This narrative arc became the structure of the sonic composition I set out to create. Beginning in noise and ending in stillness. Moving from overstimulation to focus. From red mind to blue mind.

To achieve this, I knew I needed to design a system that could collect as much authentic data as possible. This meant embedding sensors into a real surfboard without affecting its function. It meant using microphones that could capture the real vibrations of the board. It meant synchronizing video, sound, and movement into one coherent timeline.

This was not just an artistic experiment. It was also a technical challenge, an engineering project, and a sound design exploration. Each part of the system had to be carefully selected and tested. The hardware had to survive saltwater, sun, and impact. The software had to process large amounts of motion data and translate it into sound in real time or through post-processing.

And at the heart of all this was one simple but powerful principle, spoken to me once by a surf teacher in Sri Lanka:

“You are only a good surfer if you catch a wave with your eyes closed.”

That phrase stayed with me. It encapsulates the essence of surfing. Surfing is not about seeing; it’s about sensing. Feeling. Listening. This project was my way of honoring that philosophy—by creating a system that lets us catch a wave with our ears.

This blog series will walk through every step of that journey. From emotional concept to hardware integration, from dry-land simulation to ocean deployment. You will learn how motion data becomes music. How a surfboard becomes a speaker. And how the ocean becomes an orchestra.

In the next post, I will dive into the technical setup: the sensors, microphones, recorders, and housing that make it all possible. I will describe the engineering process behind building a waterproof, surfable, sound-recording device—and what it took to embed that into a real surfboard without compromising performance.

But for now, I invite you to close your eyes. Imagine paddling out past the break. The sound of your breath, the splash of water, the silence between waves. This is the world of Surfboard Sonification. And this is just the beginning.

References

Nichols, W. J. (2014). Blue Mind. Little, Brown Spark.

Watson, C. (n.d.). Field recording artist.

Winderen, J. (n.d.). Jana Winderen: Artist profile. https://www.janawinderen.com

Ikeda, R. (n.d.). Official site. https://www.ryojiikeda.com

Truax, B. (2001). Acoustic Communication. Ablex Publishing.

Puckette, M. S. (2007). The Theory and Technique of Electronic Music. World Scientific Publishing Company.

Building the Panner: Creating an interface for Sound, Space, and Interaction

After thinking about the concept for my sound toolkit, the next step in my development focused on the implementation of a central feature: the panner interface. This module allows both creators and audiences to explore and interact with sound in space, directly connecting objects within a room to specific sonic materials.

Mapped Space and Sculpted Sound

The basic functionality of the panner is simple in concept but provides an intuitive experience: it lets users navigate a mapped room and “find” interesting objects through their sonic feature. These objects are linked to compositional materials; for instance, looping ambient pads that are distributed over all of the objects. As you move across the interface, you transition between these materials, and with that inherently between the acoustic properties of each object, they begin to transform what you hear.

This movement isn’t just technical; it’s compositional. Further the potential is there, that the listener becomes part of the performance, shaping the sonic outcome through their interaction with the panning-position; references for similar ideas and use-cases can be found in spatial audio, game sound, and interactive installation art.

Introducing Triggers

To deepen the interaction, I added another layer to the interface: object-based triggers. These can be placed on top of objects in the room and are activated through user interaction. Each trigger is connected to a collection of sound events; sonic gestures that may be specific to certain objects.

What makes these events interesting is that they can be tailored to the object’s qualities. A metallic object, for instance, might trigger sharp industrial sounds, while a soft, fabric-covered object could respond with warm filtered tones. But of course the creative potential is broad. So for example the compositional logic could be based also on affordances; a concept introduced by psychologist James J. Gibson.

Affordance refers to the perceived and actual properties of an object that determine how it could be used. In this context, a desk might afford work or stress, and thus be linked to fast-paced or “busy” sounds.
(Source: Gibson, James J. “The Theory of Affordances” The Ecological Approach to Visual Perception. Boston: Houghton Mifflin, 1979)


Triggers play back events using randomized selection, similar to round-robin techniques used in video games. This ensures variation and prevents the experience from becoming predictable or repetitive; especially useful in exhibition settings, where visitors move at their own pace and may stay for different durations. With just six triggers each holding eight events, you already have 48 sonic elements that can be recombined into an evolving aleatoric composition.

Between Creator Tool and Public Interface

Importantly, this panner isn’t only meant for audiences; it’s also built to serve creators as a composition tool. Implemented as a Max for Live plug-in, I further provide an Ableton Live session template that simplifies the setup, which now consists of the following steps:

  • Load a map of the room.
  • Place objects using the provided visual grid.
  • Begin composing within the sessions structure without worrying about the technical backend.

The final panning interface itself can also serve as a user interface for an audience. The most simple solution for this would be the use of Max/MSP’s presentation mode, which of course already works. This dual-purpose design supports both easy prototyping for composers and a potential for more public oriented contexts like e.g. exhibitions, offering flexibility to musicians, designers, and curators alike.

What’s Next: Integration and Testing

The next planned development steps for this specific elemnt of my toolbox include:

  • Adding OSC integration, so creators can use external XY controller apps (e.g., on smartphones or tablets) to interact with the panner in real-time.
  • User testing with other creators, to gain feedback on interface design, usability, and creative workflows.

As someone used to designing tools mainly for my own use, this phase marks an important shift. Building something for others has pushed me to rethink how I structure code, name parameters, and guide the user. This process has also begun to improve my own workflow, making it easier for me to revisit and repurpose tools in the future.

Closing Thoughts

This latest phase of development has brought together many of the themes I’ve been exploring; from spatial sound and interaction to composition, psychology, and usability. The panner is not just a technical feature; it’s a conceptual lens for thinking about how space, sound, and interface design come together to shape musical experience and my workflow as musician.