Critical review of “Microbiophonic Emergences” by Adomas Palekas (Master’s Thesis, Institute of Sonology, 2024)

Adomas Palekas’s master’s thesis, entitled Microbiophonic Emergences, could be described as an interdisciplinary mixture of artistic reflection, philosophical speculation, and experimental sound practice. Combining ecological thought and artistic research, the text examines the relationship between sound, life, and perception. Drawing upon the Gaia hypothesis and Goethean science, the author advances a more sensitive and ethical mode of listening wherein the boundaries between the art and scientific observation are dissolved.

Overall, the presentation of the work is considerate and visually well-structured, though it significantly deviates from academic conventions. A clearly defined research question, hypothesis, or structured methodology is lacking. Instead, the text comes across as a long essay on listening, nature, and non-human agency. Its first half is dedicated to theoretical reflections on sound as a living force, while the second half introduces a series of artistic experiments and installations entitled Kwass Fermenter, Microbial Music I–III (On Bread, Compost and Haze, Aerials), Rehydration, Infection, Spectrum of Mutations: Myosin III and Kwassic Motion. According to the author, these works constitute a coherent artistic ecosystem in which microorganisms and sonic feedback interact.

More conceptually, this framing of sonification as bi-directional means that sound should not just be generated from biological data but is also to be sent back into the system and used to affect it. Conceptually, this approach seeks to transform sonification into a dialogue rather than a representation. This claim of originality, however, feels somewhat overstated: bi-directional or feedback-based sonification has been explored conceptually and practically by many artists and researchers before him, mostly within the frames of bio-art and ecological sound practices. Palekas himself mentions only one precedent when, in fact, there exists a wide range of comparable works dealing with translating biological activity into sound and then re-introducing it into the same system. His treatment of the topic is therefore limited and without deep contextual awareness, giving the impression of a rediscovery of ideas that are well conceptualized in the discipline.

The artistic independence of the thesis and a strong, personal vision are explicit. Palekas’s voice is consistent; his writing also reflects genuine curiosity and sensitivity. But it is this very independence that alienates his research from the broader academic and artistic discourse. One misses the dialogue with other practitioners or with theoretical perspectives, except for the few philosophical sources mentioned above. The limited literature review weakens the credibility of his theoretical framework and makes it difficult to situate the work within contemporary sound studies or bio-art research.

The structuring of the thesis is much closer to a philosophical narrative than to a scientific report. The chapters are more intuitively than logically connected. Because explicit methodological framing is absent, the reader has to reconstruct the logic of the experiments from poetic descriptions. For example, the sonification tests with fermentation are told in narrative terms, sometimes mentioning sensors, mappings, and feedback without providing detailed diagrams, lists of parameters, or reproducible data.

From a communicational point of view, the thesis is well-written and easy to read. Palekas’s prose is expressive and reflective; his philosophical passages are a pleasure to read. At the same time, this lyricism too often supplants analytical clarity. The experimental results remain fuzzy; the measurements are given “by ear,” not through numerical analysis, and the reader cannot tell whether the effects observed are significant or only subjective impressions.

In scope and depth, the thesis is ambitious but uneven. It tries to combine philosophy, biology, and sound art, but the practical documentation remains superficial. The experiments are deficient in calibration and control conditions, as well as in quantitative evidence. The author himself recognizes that fermentation is hardly predictable and thus difficult to reproduce. But this admission only underlines the fragility of his conclusions. Without a presentation of clear data or even replicable protocols, the whole project remains conceptual rather than empirical.

Partial accuracy and attention to detail: the author provides some information about equipment and process – for example, relative calibration among the CO₂ sensors, use of Arduino, Pure Data, but no consistent system is provided for reporting values, frequencies, and time spans. References made to appendices and videos are incomplete, and none of the referred sound recordings and codes are available. The result is that the project cannot be scientifically evaluated or reproduced.

The section on literature review reflects selectivity: In situating his thought within broader ecological and philosophical frameworks, Palekas barely engages the rich corpus of research on bio-sonification, microbial sensing, and feedback sound systems. The lack of these sources increases the effect of isolation: the thesis feels self-contained rather than in conversation with a field.

This gap between theory and documentation is where the quality of the artifact is questioned. The installations and performances he describes conceptually are incompletely and poorly documented. It is not clear if the works were created for this thesis or collated from previous projects. Without recordings, schematics, or step-by-step documentation available, one cannot evaluate any artistic or technical outcomes. Put differently, Microbiophonic Emergences is a strong artistic and philosophical statement, but it is only a partially successful academic thesis. Its conceptual strength comes from the ethical rethinking of listening, the poetic vision of sound as life, and the attempt to dissolve the hierarchy between observer and observed.

The work unfortunately lacks in methodological rigor, detailed evidence, and sufficient contextual grounding. While Palekas seeks to establish a dialogue between humans and microbes, the outcome is just speculative and remains unverified. This invention of bi-directional sonification is not really new; moreover, the thesis overlooks the numerous past projects that have already elaborated on a similar feedback relationship between sound and living systems. Overall, the work is successful as a reflective, imaginative exploration of sound and ecology but fails as a systematically researched academic document. While the work evokes curiosity and wonder, it requires far stronger methodological and contextual grounding to meet the standards of a master’s thesis.

Product III: Image Extender – Image sonification tool for immersive perception of sounds from images and new creation possibilities

Intelligent Sound Fallback Systems – Enhancing Audio Generation with AI-Powered Semantic Recovery

After refining Image Extender’s sound layering and spectral processing engine, this week’s development shifted focus to one of the system’s most practical yet creatively crucial challenges: ensuring that the generation process never fails silently. In previous iterations, when a detected visual object had no directly corresponding sound file in the Freesound database, the result was often an incomplete or muted soundscape. The goal of this phase was to build an intelligent fallback architecture—one capable of preserving meaning and continuity even in the absence of perfect data.

Closing the Gap Between Visual Recognition and Audio Availability

During testing, it became clear that visual recognition is often more detailed and specific than what current sound libraries can support. Object detection models might identify entities like “Golden Retriever,” “Ceramic Cup,” or “Lighthouse,” but audio datasets tend to contain more general or differently labeled entries. This mismatch created a semantic gap between what the system understands and what it can express acoustically.

The newly introduced fallback framework bridges this gap, allowing Image Extender to adapt gracefully. Instead of stopping when a sound is missing, the system now follows a set of intelligent recovery paths that preserve the intent and tone of the visual analysis while maintaining creative consistency. The result is a more resilient, contextually aware sonic generation process—one that doesn’t just survive missing data, but thrives within it.

Dual Strategy: Structured Hierarchies and AI-Powered Adaptation

Two complementary fallback strategies were introduced this week: one grounded in structured logic, and another driven by semantic intelligence.

The CSV-based fallback system builds on the ontology work from the previous phase. Using the tag_hierarchy.csv file, each sound tag is part of a parent–child chain, creating predictable fallback paths. For example, if “tiger” fails, the system ascends to “jungle,” and then “nature.” This rule-based approach guarantees reliability and zero additional computational cost, making it ideal for large-scale batch operations or offline workflows.

In contrast, the AI-powered semantic fallback uses GPT-based reasoning to dynamically generate alternative tags. When the CSV offers no viable route, the model proposes conceptually similar or thematically related categories. A specific bird species might lead to the broader concept of “bird sounds,” or an abstract object like “smartphone” could redirect to “digital notification” or “button click.” This layer of intelligence brings flexibility to unfamiliar or novel recognition results, extending the system’s creative reach beyond its predefined hierarchies.

User-Controlled Adaptation

Recognizing that different projects require different balances between cost, control, and creativity, the fallback mode is now user-configurable. Through a simple dropdown menu, users can switch between CSV Mode and AI Mode.

  • CSV Mode favors consistency, predictability, and cost-efficiency—perfect for common, well-defined categories.
  • AI Mode prioritizes adaptability and creative expansion, ideal for complex visual inputs or unique scenes.

This configurability not only empowers users but also represents a deeper design philosophy: that AI systems should be tools for choice, not fixed solutions.

Toward Adaptive and Resilient Multimodal Systems

This week’s progress marks a pivotal evolution from static, database-bound sound generation to a hybrid model that merges structured logic with adaptive intelligence. The dual fallback system doesn’t just fill gaps, it embodies the philosophy of resilient multimodal AI, where structure and adaptability coexist in balance.

The CSV hierarchy ensures reliability, grounding the system in defined categories, while the AI layer provides flexibility and creativity, ensuring the output remains expressive even when the data isn’t. Together, they form a powerful, future-proof foundation for Image Extender’s ongoing mission: transforming visual perception into sound not as a mechanical translation, but as a living, interpretive process.

Product II: Image Extender – Image sonification tool for immersive perception of sounds from images and new creation possibilities

Dual-Model Vision Interface – OpenAI × Gemini Integration for Adaptive Image Understanding

Following the foundational phase of last week, where the OpenAI API Image Analyzer established a structured evaluation framework for multimodal image analysis, the project has now reached a significant new milestone. The second release integrates both OpenAI’s GPT-4.1-based vision models and Google’s Gemini (MediaPipe) inference pipeline into a unified, adaptive system inside the Image Extender environment.

Unified Recognition Interface

In The current version, the recognition logic has been completely refactored to support runtime model switching.
A dropdown-based control in Google Colab enables instant selection between:

  • Gemini (MediaPipe) – for efficient, on-device object detection and panning estimation
  • OpenAI (GPT-4.1 / GPT-4.1-mini) – for high-level semantic and compositional interpretation

Non-relevant parameters such as score threshold or delegate type dynamically hide when OpenAI mode is active, keeping the interface clean and focused. Switching back to Gemini restores all MediaPipe-related controls.
This creates a smooth dual-inference workflow where both engines can operate independently yet share the same image context and visualization logic.

Architecture Overview

The system is divided into two self-contained modules:

  1. Image Upload Block – handles external image input and maintains a global IMAGE_FILE reference for both inference paths.
  2. Recognition Block – manages model selection, executes inference, parses structured outputs, and handles visualization.

This modular split keeps the code reusable, reduces side effects between branches, and simplifies later expansion toward GUI-based or cloud-integrated applications.

OpenAI Integration

The OpenAI branch extends directly from Last week but now operates within the full environment.
It converts uploaded images into Base64 and sends a multimodal request to gpt-4.1 or gpt-4.1-mini.
The model returns a structured Python dictionary, typically using the following schema:

{

    “objects”: […],

    “scene_and_location”: […],

    “mood_and_composition”: […],

    “panning”: […]

}

A multi-stage parser (AST → JSON → fallback) ensures robustness even when GPT responses contain formatting artifacts.

Prompt Refinement

During development, testing revealed that the English prompt version initially returned empty dictionaries.
Investigation showed that overly strict phrasing (“exclusively as a Python dictionary”) caused the model to suppress uncertain outputs.
By softening this instruction to allow “reasonable guesses” and explicitly forbidding empty fields, the API responses became consistent and semantically rich.

Debugging the Visualization

A subtle logic bug was discovered in the visualization layer:
The post-processing code still referenced German dictionary keys (“objekte”, “szenerie_und_ort”, “stimmung_und_komposition”) from Last week.
Since the new English prompt returned English keys (“objects”, “scene_and_location”, etc.), these lookups produced empty lists, which in turn broke the overlay rendering loop.
After harmonizing key references to support both language variants, the visualization resumed normal operation.

Cross-Model Visualization and Validation

A unified visualization layer now overlays results from either model directly onto the source image.
In OpenAI mode, the “panning” values from GPT’s response are projected as vertical lines with object labels.
This provides immediate visual confirmation that the model’s spatial reasoning aligns with the actual object layout, an important diagnostic step for evaluating AI-based perception accuracy.

Outcome and Next Steps

The project now represents a dual-model visual intelligence system, capable of using symbolic AI interpretation (OpenAI) and local pixel-based detection (Gemini).

Next steps

The upcoming development cycle will focus on connecting the openAI API layer directly with the Image Extender’s audio search and fallback system.

Product I: Image Extender – Image sonification tool for immersive perception of sounds from images and new creation possibilities

OpenAI API Image Analyzer – Structured Vision Testing and Model Insights

Adaptive Visual Understanding Framework
In this development phase, the focus was placed on building a robust evaluation framework for OpenAI’s multimodal models (GPT-4.1 and GPT-4.1-mini). The primary goal: systematically testing image interpretation, object detection, and contextual scene recognition while maintaining controlled cost efficiency and analytical depth.

upload of image (image source: https://www.trumau.at/)
  1. Combined Request Architecture
    Unlike traditional multi-call pipelines, the new setup consolidates image and text interpretation into a single API request. This streamlined design prevents token overhead and ensures synchronized contextual understanding between categories. Each inference returns a structured Python dictionary containing three distinct analytical branches:
    • Objects – Recognizable entities such as animals, items, or people
    • Scene and Location Estimation – Environment, lighting, and potential geographic cues
    • Mood and Composition – Aesthetic interpretation, visual tone, and framing principles

For each uploaded image, the analyzer prints three distinct lists per modelside by side. This offers a straightforward way to assess interpretive differences without complex metrics. In practice, GPT-4.1 tends to deliver slightly more nuanced emotional and compositional insights, while GPT-4.1-mini prioritizes concise, high-confidence object recognition.

results of the image object analysis and model comparison

Through the unified format, post-processing can directly populate separate lists or database tables for subsequent benchmarking, minimizing parsing latency and data inconsistencies.

  1. Robust Output Parsing
    Because model responses occasionally include Markdown code blocks (e.g., python {…}), the parsing logic was redesigned with a multi-layered interpreter using regex sanitation and dual parsing strategies (AST > JSON > fallback). This guarantees that even irregularly formatted outputs are safely converted into structured datasets without manual intervention. The system thus sustains analytical integrity under diverse prompt conditions.
  2. Model Benchmarking: GPT-4.1-mini vs. GPT-4.1
    The benchmark test compared inference precision, descriptive richness, and token efficiency between the two models. While GPT-4.1 demonstrates deeper contextual inference and subtler mood detection, GPT-4.1-mini achieves near-equivalent recognition accuracy at approximately one-tenth of the cost per request. For large-scale experiments (e.g., datasets exceeding 10,000 images), GPT-4.1-mini provides the optimal balance between granularity and economic viability.
  3. Token Management and Budget Simulation
    A real-time token tracker revealed an average consumption of ~1,780 tokens per image request. Given GPT-4.1-mini’s rate of $0.003 / 1k tokens, a one-dollar operational budget supports roughly 187 full image analyses. This insight forms the baseline for scalable experimentation and budget-controlled automation workflows in cloud-based vision analytics.

The next development phase will integrate this OpenAI-driven visual analysis directly into the Image Extender environment. This integration marks the transition from isolated model testing toward a unified generative framework.

Zwischen Bild und Ton – Kritische Bewertung der Masterarbeit “Automatic Sonification of Video Sequences” von Andrea Corcuera Marruffo

Grundlegendes

Autorin: Andrea Corcuera Marruffo
Titel: Automatic Sonification of Video Sequences through Object Detection and Physical Modelling
Hochschule: Aalborg University Copenhagen
Studiengang: MSc Sound and Music Computing
Jahr: 2017

Die Arbeit von Andrea Corcuera Marruffo untersucht die automatische Erzeugung von Foley-Sounds aus Videosequenzen. Ziel ist es, audiovisuelle Inhalte algorithmisch zu sonifizieren, indem visuelle Informationen, z.B. Materialeigenschaften oder Objektkollisionen, mithilfe von Convolutional Neural Networks (nutzung des YOLO models) analysiert und anschließend physikalisch modellierte Klänge synthetisiert werden. Damit positioniert sich die Arbeit an der Schnittstelle von Klangsynthese, teilweise software und coding und Wahrnehmung, ein Feld, das in der Medienproduktion wie auch in der künstlerischen Forschung zunehmende Relevanz besitzt und entsprechend auch überschneidungen zum Grundkonzept meiner vorstehenden Masterarbeit.

Das „Werkstück“ besteht aus einem funktionalen Prototypen, der Videos analysiert, Objekte klassifiziert und deren Interaktionen in synthetisierte Klänge übersetzt. Ergänzt wird dieses Tool durch eine Evaluation, in der audiovisuelle Stimuli hinsichtlich ihrer Plausibilität und wahrgenommenen Qualität getestet werden.

Bewertung

systematisch anhand der Beurteilungskriterien des Studiengangs CMS

(1) Gestaltungshöhe

Die Arbeit zeigt eine sehr gute technische Tiefe und eine klare methodische Struktur. Der Aufbau ist logisch, die Visualisierungen (z. B. Flussdiagramme, Spektrogramme) sind nachvollziehbar und unterstützen das Verständnis des Prozesses.

(2) Innovationsgrad

Der Ansatz, Foley-Sound automatisch (unter dem Einsatz von „physical modelling“) zu generieren, wurde zum Zeitpunkt der Veröffentlichung (2017) nur vereinzelt erforscht. Die Verbindung von Object Detection und Physical Modelling stellt daher einen innovativen Beitrag im Bereich „Computational Sound Design“ dar.

(3) Selbstständigkeit

Die Arbeit zeigt eine deutliche Eigenleistung. Die Autorin erstellt ein eigenes Dataset, modifiziert Trainingsdaten und implementiert das YOLO Model in einer angepassten Form. Auch die Syntheseparameter werden experimentell abgeleitet. Die Eigenständigkeit ist daher sowohl konzeptionell als auch technisch vorhanden.

(4) Gliederung und Struktur

Die Struktur folgt einem klassischen wissenschaftlichen Aufbau. Theorie, Implementierung, Evaluation, Schlussfolgerung. Kapitel sind klar fokussiert, jedoch teils stark technisch geprägt, was die Lesbarkeit für fachfremde Leser einschränken kann. Eine visuellere Darstellung der Evaluationsmethodik hätte das eventuell verbessert.

(5) Kommunikationsgrad

Die Arbeit ist insgesamt verständlich und präzise formuliert. Fachtermini werden sorgfältig eingeführt, Abbildungen sind beschriftet und logisch eingebunden. Der sprachliche Stil ist sachlich, allerdings manchmal zu stark an technischer Dokumentation orientiert. Narrative Reflexionen zu Designentscheidungen oder ästhetischen Überlegungen fehlen weitgehend, was anhand des Studiengangs, welcher sich nicht hauptsächlich an design orientiert verständlich und nachvollziehbar ist.

(6) Umfang der Arbeit

Mit über 30 Seiten Haupttext und zusätzlichem Anhang ist der Umfang angemessen. Die Balance zwischen Theorie, Umsetzung und Evaluation ist gelungen. Die empirische Studie mit 15 Proband bleibt jedoch relativ klein, wodurch die statistische Aussagekraft begrenzt ist.

(7) Orthographie, Sorgfalt und Genauigkeit

Die Arbeit ist durchgängig formal korrekt und methodisch sorgfältig dokumentiert. Kleinere sprachliche Unschärfen („he first talkie film“) mindern den Gesamteindruck kaum. Zitate und Quellenverweise sind konsistent.

(8) Literatur Das Literaturverzeichnis zeigt eine solide theoretische Fundierung. Es werden gängige Quellen zu Sound Synthesis, Modal Modelling und Neural Networks verwendet (Smith, Farnell, Van den Doel). Allerdings wären aktueller Medien- oder Wahrnehmungsforschung (durch z. B. Sonic Interaction Design, Embodied Sound Studies) noch eine spannende Ergänzung hinsichtlich Forschungsliteratur gewesen.

Abschließende Einschätzung

Insgesamt überzeugt die Arbeit durch ihren innovativen Ansatz, die methodische Präzision und die gelungene Umsetzung eines komplexen Systems. Die Evaluation zeigt kritisch die Grenzen des Modells auf (Objektgenauigkeit und Synchronisationsprobleme), was die Autorin reflektiert und nachvollziehbar einordnet.

Stärken: klare Struktur, hohes technisches Niveau, origineller Forschungsansatz, eigenständige Implementierung.
Schwächen: begrenzte ästhetische Reflexion, kleine Stichprobe in der Evaluation, eingeschränkte Materialvielfalt.

Critical Review: “Sound response to physicality – Artistic expressions of movement sonification” by Aleksandra Joanna Słyż (Royal College of Music, 2022)

by Verena Schneider, CMS24 Sound Design Master 

The master thesis “Sound Response to Physicality: Artistic Expressions of Movement Sonification” was written by Aleksandra Joanna Słyż in 2022 at the Royal College of Music in Stockholm (Kungliga Musikhögskolan; Stockholm, Sweden).

Introduction

I chose Aleksandra Słyż’s master thesis because her topic immediately resonated with my own research interests. In my master project I am working with the x-IMU3 motion sensor to track surf movements and transform them into sound for a surf documentary.
During my research process, the question of how to sonify movement data became central, and Słyż’s work gave me valuable insights into which parameters can be used and how the translation from sensor to sound can be conceptually designed.

Her thesis, Sound response to physicality, focuses on the artistic and perceptual dimensions of movement sonification. Through her work Hypercycle, she explores how body motion can control and generate sound in real time, using IMU sensors and multichannel sound design. I found many of her references—such as John McCarthy and Peter Wright’s Technology as Experience—highly relevant for my own thesis.

Gestaltungshöhe – Artistic Quality and Level of Presentation

Słyż’s thesis presents a high level of artistic and conceptual quality. The final piece, Hypercycle, is a technically complex and interdisciplinary installation that connects sound, body, and space. The artistic idea of turning the body into a musical instrument is powerful, and she reflects deeply on the relation between motion, perception, and emotion.

Visually, the documentation of her work is clear and professional, though I personally wished for a more detailed sonic description. The sound material she used is mainly synthesized tones—technically functional, but artistically minimal. As a sound designer, I would have enjoyed a stronger exploration of timbre and spatial movement as expressive parameters.

Innovationsgrad – Innovation and Contribution to the Field

Using motion sensors for artistic sonification is not entirely new, yet her combination of IMU data, embodied interaction, and multichannel audio gives the project a strong contemporary relevance. What I found innovative was how she conceptualized direct and indirect interaction—how spectators experience interactivity even when they don’t control the sound themselves.

However, from a technical point of view, the work could have been more transparent. I was missing a detailed explanation of how exactly she mapped sensor data to sound parameters. This part felt underdeveloped, and I see potential for future work to document such artistic systems more precisely.

Selbstständigkeit – Independence and Original Contribution

Her thesis clearly shows independence and artistic maturity. She worked across disciplines—combining psychology, music technology, and perception studies—and reflected on her process critically. I especially appreciated that she didn’t limit herself to the technical side but also integrated a psychological and experiential perspective.

As someone also working with sensor-based sound, I can see how much self-direction and experimentation this project required. The depth of reflection makes the work feel authentic and personal.

Gliederung und Struktur – Structure and Coherence

The structure of the thesis is logical and easy to follow. Each chapter begins with a quote that opens the topic in a poetic way, which I found very effective. She starts by explaining the theoretical background, then moves toward the technical discussion of IMU sensors, and finally connects everything to her artistic practice.

Her explanations are written in clear English, and she carefully defines all important terms such as sonificationproprioception, and biofeedback. Even readers with only basic sound design knowledge can follow her reasoning.

Kommunikationsgrad – Communication and Expression

The communication of her ideas is well-balanced between academic precision and personal reflection. I like that she uses a human-centered language, often describing how the performer or spectator might feel within the interactive system.

Still, the technical documentation of the sonification process could be more concrete. She briefly shows a Max/MSP patch, but I would have loved to understand more precisely how the data flow—from IMU to sound—was built. For future readers and practitioners, such details would be extremely valuable.

Umfang – Scope and Depth

The length of the thesis (around 50 pages) feels appropriate for the topic. She covers a wide range of areas: from sensor technology and perception theory to exhibition practice and performance philosophy.
At the same time, I had the impression that she decided to keep the technical parts lighter, focusing more on conceptual reflection. For me, this makes the thesis stronger as an artistic reflection, but weaker as a sound design manual.

Orthography, Accuracy, and Formal Care

The thesis is very carefully written and proofread. References are consistent, and the terminology is accurate. She integrates both scientific and artistic citations, which gives the text a professional academic tone.
The layout is clear, and the visual elements (diagrams, performance photos) are well placed.

Literature – Quality and Relevance

The literature selection is one of the strongest aspects of this work. She cites both technical and philosophical sources—from G. Kramer’s Sonification Report to McCarthy & Wright’s Technology as Experience and Tanaka & Donnarumma’s The Body as Musical Instrument.
For me personally, her bibliography became a guide for my own research. I found new readings that I will also include in my master thesis.

Final Assessment – Strengths, Weaknesses, and Personal Reflection

Overall, Sound response to physicality is a well-balanced, thoughtful, and inspiring thesis that connects technology, perception, and art.
Her biggest strength lies in how she translates complex sensor-based interactions into human experience and emotional resonance. The way she conceptualizes embodied interaction and indirect interactivity is meaningful and poetic.

The main weakness, in my opinion, is the lack of detailed technical documentation—especially regarding how the IMU data was mapped to sound and multichannel output. As someone building my own sonification system with the x-IMU3 and contact microphones, I would have loved to see the exact data chain from sensor to audio.

Despite that, her work inspired me profoundly. It reminded me that the psychological and experiential dimensions of sound are just as important as the data itself. In my own project, where I sonify the movement of a surfboard and the feeling of the ocean, I will carry this understanding forward: that sonification is not only about data translation but about shaping human experience through sound.

Post 1: Listening to the Ocean

– The Emotional Vision Behind Surfboard Sonification

Surfing is more than just a sport. For many surfers, it is a ritual, a form of meditation, and an experience of deep emotional release. There is a unique silence that exists out on the water. It is not the absence of sound but the presence of something else: a sense of connection, stillness, and immersion. This is where the idea for “Surfboard Sonification” was born. It began not with technology, but with a feeling. A moment on the water when the world quiets, and the only thing left is motion and sensation.

The project started with a simple question: how can one translate the feeling of surfing into sound? What if we could make that feeling audible? What if we could tell the story of a wave, not through pictures or words, but through vibrations, resonance, and sonic movement?

My inspiration came from both my personal experiences as a surfer and from sound art and acoustic ecology. I was particularly drawn to the work of marine biologist Wallace J. Nichols and his theory of the “Blue Mind.” According to Nichols, being in or near water has a scientifically measurable impact on our mental state. It relaxes us, improves focus, and connects us to something larger than ourselves. It made me wonder: can we create soundscapes that replicate or amplify that feeling?

In addition to Nichols’ research, I studied the sound design approaches of artists like Chris Watson and Jana Winderen, who work with natural sound recordings to create immersive environments. I also looked at data-driven artists such as Ryoji Ikeda, who transform abstract numerical inputs into rich, minimalist sonic works.

The goal of Surfboard Sonification was to merge these worlds. I wanted to use real sensor data and field recordings to tell a story. I did not want to rely on synthesizers or artificial sound effects. I wanted to use the board itself as an instrument. Every crackle, vibration, and movement would be captured and turned into music—not just any music, but one that feels like surfing.

The emotional journey of a surf session is dynamic. You begin on the beach, often overstimulated by the environment. There is tension, anticipation, the chaos of wind, people, and crashing waves. Then, as you paddle out, things change. The noise recedes. You become attuned to your body and the water. You wait, breathe, and listen. When the wave comes and you stand up, everything disappears. It’s just you and the ocean. And then it’s over, and a sense of calm returns.

This narrative arc became the structure of the sonic composition I set out to create. Beginning in noise and ending in stillness. Moving from overstimulation to focus. From red mind to blue mind.

To achieve this, I knew I needed to design a system that could collect as much authentic data as possible. This meant embedding sensors into a real surfboard without affecting its function. It meant using microphones that could capture the real vibrations of the board. It meant synchronizing video, sound, and movement into one coherent timeline.

This was not just an artistic experiment. It was also a technical challenge, an engineering project, and a sound design exploration. Each part of the system had to be carefully selected and tested. The hardware had to survive saltwater, sun, and impact. The software had to process large amounts of motion data and translate it into sound in real time or through post-processing.

And at the heart of all this was one simple but powerful principle, spoken to me once by a surf teacher in Sri Lanka:

“You are only a good surfer if you catch a wave with your eyes closed.”

That phrase stayed with me. It encapsulates the essence of surfing. Surfing is not about seeing; it’s about sensing. Feeling. Listening. This project was my way of honoring that philosophy—by creating a system that lets us catch a wave with our ears.

This blog series will walk through every step of that journey. From emotional concept to hardware integration, from dry-land simulation to ocean deployment. You will learn how motion data becomes music. How a surfboard becomes a speaker. And how the ocean becomes an orchestra.

In the next post, I will dive into the technical setup: the sensors, microphones, recorders, and housing that make it all possible. I will describe the engineering process behind building a waterproof, surfable, sound-recording device—and what it took to embed that into a real surfboard without compromising performance.

But for now, I invite you to close your eyes. Imagine paddling out past the break. The sound of your breath, the splash of water, the silence between waves. This is the world of Surfboard Sonification. And this is just the beginning.

References

Nichols, W. J. (2014). Blue Mind. Little, Brown Spark.

Watson, C. (n.d.). Field recording artist.

Winderen, J. (n.d.). Jana Winderen: Artist profile. https://www.janawinderen.com

Ikeda, R. (n.d.). Official site. https://www.ryojiikeda.com

Truax, B. (2001). Acoustic Communication. Ablex Publishing.

Puckette, M. S. (2007). The Theory and Technique of Electronic Music. World Scientific Publishing Company.

Experiment VIII: Embodied Resonance – ecg_to_cc.py

After converting heart-rate data into drums and HRV energy into melodic tracks, I needed a tiny helper that outputs nothing but automation. The idea was to pull every physiologic stream—HR, SDNN, RMSSD, VLF, LF, HF plus the LF / HF ratio—and stamp each one into its own MIDI track as CC-1 so that in the DAW every signal can be mapped to a knob, filter or layer-blend. The script must first look at all CSV files in a session, find the global minimum and maximum for every column, and apply those bounds consistently. Timing stays on the same 75 BPM / 0.1 s grid we used for the drums and notes, making one CSV row equal one 1⁄32-note. With that in mind, the core design is just constants, a value-to-CC rescaler, a loop that writes control changes, and a batch driver that walks the folder.

If you want to see the full, visit: https://github.com/ninaeba/EmbodiedResonance

# ─── Default Configuration ───────────────────────────────────────────
DEF_BPM       = 75             # Default tempo
CSV_GRID      = 0.1            # Time between CSV samples (s) ≙ 1/32-note @75 BPM
SIGNATURE     = "8/8"          # Eight-cell bar; enough for pure CC data
CC_NUMBER     = 1              # We store everything on Mod Wheel
CC_TRACKS     = ["HR", "SDNN", "RMSSD", "VLF", "LF", "HF", "LF_HF"]
BEAT_PER_GRID = 1 / 32         # Grid resolution in beats

The first helper, map_to_cc, squeezes any value into the 0-127 range. It is the only place where scaling happens, so if you ever need logarithmic behaviour you change one line here and every track follows.

def map_to_cc(val: float, vmin: float, vmax: float) -> int:
norm = np.clip((val - vmin) / (vmax - vmin), 0, 1)
return int(round(norm * 127))

generate_cc_tracks loads a single CSV, back-fills gaps, creates one PrettyMIDI container and pins the bar length with a time-signature event. Then it iterates over CC_TRACKS; if the column exists it opens a fresh instrument named after the signal and sprinkles CC messages along the grid. Beat-to-second conversion is the same one-liner used elsewhere in the project.

for name in CC_TRACKS:
if name not in df.columns:
continue
vals = df[name].to_numpy()
vmin, vmax = minmax[name]
inst = pm.Instrument(program=0, is_drum=False, name=f"{name}_CC1")

for i in range(len(vals) - 1):
    beat = i * BEAT_PER_GRID * 4          # 1/32-note → quarter-note space
    t    = beat * 60.0 / bpm
    cc_val = map_to_cc(vals[i], vmin, vmax)
    inst.control_changes.append(
        pm.ControlChange(number=CC_NUMBER, value=cc_val, time=t)
    )

midi.instruments.append(inst)

The wrapper main is pure housekeeping: parse CLI flags, list every *.csv, compute global min-max per signal, and then hand each file to generate_cc_tracks. Consistent scaling is guaranteed because min-max is frozen before any writing starts.

minmax: dict[str, tuple[float, float]] = {}
for name in CC_TRACKS:
vals = []
for f in csv_files:
df = pd.read_csv(f, usecols=lambda c: c.upper() == name)
if name in df.columns:
vals.append(df[name].to_numpy())
if vals:
all_vals = np.concatenate(vals)
minmax[name] = (np.nanmin(all_vals), np.nanmax(all_vals))

Each CSV turns into *_cc_tracks.mid containing seven tracks—HR, SDNN, RMSSD, VLF, LF, HF, LF_HF—each packed with CC-1 data at 1⁄32-note resolution. Drop the file into Ableton, assign a synth or effect per track, map the Mod Wheel to whatever parameter makes sense, and the physiology animates the mix in real time.

As a result, we get CC automation that we can easily map to something else or just copy automation somewhere else:

Here is an example of HR from patient 115 mapped from 0 to 127 in CC:

Experiment VII: Embodied Resonance – ecg_to_notes.py

The next step after the drum generator is a harmonic layer built from frequency-domain HRV metrics.
The script should do exactly what the drum code did for HR: scan all source files, find the global minima and maxima of every band, then map those values to pitches. Very-low-frequency (VLF) energy lives in the first octave, low-frequency (LF) in the second, high-frequency (HF) in the third. Each band is written to its own MIDI track so a different instrument can be assigned later, and every track also carries two automation curves: one CC lane for the band’s amplitude and one for the LF ⁄ HF ratio. In the synth that ratio will cross-fade between a sine and a saw: a large LF ⁄ HF—often a stress marker—makes the tone brighter and scratchier.

Instead of all twelve semitones, the script confines itself to the seven notes of C major or C minor. The mode flips in real time: if LF ⁄ HF drops below two the scale is major, above two it turns minor. Timing is flexible: each band can have its own step length and note duration so VLF moves more slowly than HF.

Below is a concise walk-through of ecg_to_notes.py. Key fragments of the code are shown inline for clarity. If you want to see the full, visit: https://github.com/ninaeba/EmbodiedResonance

# ----- core tempo & bar settings -----
DEF_BPM   = 75      # fixed song tempo
CSV_GRID  = 0.1     # raw data step in seconds
SIGNATURE = "8/8"   # eight cells per bar (fits VLF rhythms)

The script assigns a dedicated time grid, rhythmic value, and starting octave to each band.
Those values are grouped in three parallel constants so you can retune the behaviour by editing one block.

# ----- per-band timing & pitch roots -----
VLF_CELL_SEC = 0.8; VLF_STEP_BEAT = 1/4 ; VLF_NOTE_LEN = 1/4 ; VLF_ROOT = 36  # C2
LF_CELL_SEC  = 0.4; LF_STEP_BEAT  = 1/8 ; LF_NOTE_LEN  = 1/8 ; LF_ROOT  = 48  # C3
HF_CELL_SEC  = 0.2; HF_STEP_BEAT  = 1/16; HF_NOTE_LEN  = 1/16; HF_ROOT  = 60  # C4

Major and minor material is pre-baked as two simple interval lists. A helper translates any normalised value into a scale degree, then into an absolute MIDI pitch; another helper turns the same value into a velocity between 80 and 120 so you can hear amplitude without a compressor.

MAJOR_SCALE = [0, 2, 4, 5, 7, 9, 11]
MINOR_SCALE = [0, 2, 3, 5, 7, 10, 11]

def normalize_note(val, vmin, vmax, scale, root):
    norm  = np.clip((val - vmin) / (vmax - vmin), 0, 0.999)
    step  = scale[int(norm * len(scale))]
    return root + step

def map_velocity(val, vmin, vmax):
    norm = np.clip((val - vmin) / (vmax - vmin), 0, 1)
    return int(round(80 + norm * 40))

signals_to_midi opens one CSV, grabs the four relevant columns, and sets up three PrettyMIDI instruments—one per band. Inside a single loop the three arrays are processed identically, differing only in timing and octave. At each step the script decides whether we are in C major or C minor, converts the current value to a note and velocity, and appends the note to the respective instrument.

for name, vals, cell_sec, step_beat, note_len, root, vmin, vmax, inst in [
    ("VLF", vlf_vals, VLF_CELL_SEC, VLF_STEP_BEAT, VLF_NOTE_LEN, VLF_ROOT, vlf_min, vlf_max, inst_vlf),
    ("LF",  lf_vals,  LF_CELL_SEC,  LF_STEP_BEAT,  LF_NOTE_LEN,  LF_ROOT,  lf_min,  lf_max,  inst_lf),
    ("HF",  hf_vals,  HF_CELL_SEC,  HF_STEP_BEAT,  HF_NOTE_LEN,  HF_ROOT,  hf_min,  hf_max,  inst_hf)
]:
    for i, idx in enumerate(range(0, len(vals), int(cell_sec / CSV_GRID))):
        t        = idx * CSV_GRID
        scale    = MINOR_SCALE if lf_hf_vals[idx] > 2 else MAJOR_SCALE
        pitch    = normalize_note(vals[idx], vmin, vmax, scale, root)
        velocity = map_velocity   (vals[idx], vmin, vmax)
        duration = note_len * 60 / DEF_BPM
        inst.notes.append(pm.Note(velocity, pitch, t, t + duration))

Every note time-stamp is paired with two controller messages: the first transmits the band amplitude, the second the LF / HF stress proxy. Both are normalised to the full 0-127 range so they can drive filters, wavetable morphs, or anything else in the synth.

inst.control_changes.append(pm.ControlChange(CC_SIGNAL, int(round(norm_val  * 127)), t))
inst.control_changes.append(pm.ControlChange(CC_RATIO,  int(round(norm_lf_hf * 127)), t))

The main wrapper gathers all CSVs in the chosen folder, computes global min–max values so every file shares the same mapping, and calls signals_to_midi on each source. Output files are named *_signals_notes.mid, each holding three melodic tracks plus two CC lanes per track. Together with the drum generator this completes the biometric groove: pulse drives rhythm, spectral power drives harmony, and continuous controllers keep the sound evolving.

Here is what we get if we make one mappinf for all patients

An here is example of MIDI files we get if we make individual mapping for each patient separetly





Experiment VI: Embodied Resonance – ecg_to_drum.py

To generate heart-rate–driven drums I set out to write a script that builds a drum pattern whose density depends on HR intensity. I fixed the tempo at 75 BPM because the math is convenient: 0.1 s of data ≈ one 1/32-note at 75 BPM. I pictured a bar with 16 cells and designed a map that decides which cell gets filled next; the higher the HR, the more cells are activated. Before rendering, a helper scans the global HR minimum and maximum and slices that span into 16 equal zones (an 8-zone fallback is available) so the percussion always scales correctly. For extra flexibility, the script also writes SDNN and RMSSD into the MIDI file as two separate CC automation lanes. In Ableton, I can map those controllers to any parameter, making it easy to experiment with the track’s sonic texture.

Let’s take a closer look at the code. If you want to see the full, visit: https://github.com/ninaeba/EmbodiedResonance

This script converts raw heart-rate data into a drum track.
Every 0.1-second CSV sample lines up with a 1⁄32-note at the fixed tempo of 75 BPM, so medical time falls neatly onto a musical grid. The pulse controls how many cells inside each 16-step bar are filled, while SDNN and RMSSD are embedded as two CC lanes for later sound-design tricks.

Default configuration (all constants live in one block)
 STEP_BEAT = 1 / 32  # grid resolution in beats
 NOTE_LENGTH = 1 / 32  # each hit lasts a 32nd-note
 DEF_BPM = 75    # master tempo
 CSV_GRID = 0.1   # CSV step in seconds
 SEC_PER_CELL = 0.1   # duration of one pattern cell
 SIGNATURE = "16/16" # one bar = 16 cells
 CC_SDNN = 11    # long-term HRV
 CC_RMSSD = 1    # short-term HRV

The spatial order of hits is defined by a map that spreads layers across the bar, so extra drums feel balanced instead of bunching at the start

 CELL_MAP = [0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15]

generate_rules slices the global HR span into sixteen equal zones and binds each zone to one cell-and-note pair; the last rule catches any pulse above the top threshold

 def generate_rules(hr_min, hr_max, cells):
  thresholds = np.linspace(hr_min, hr_max, cells + 1)[1:]
  rules = [(thr, CELL_MAP[i], 36 + i) for i, thr in enumerate(thresholds)]
  rules[-1] = (np.inf, rules[-1][1], rules[-1][2])
  return rules

Inside csv_to_midi the script first writes two continuous-controller streams—one for SDNN, one for RMSSD—normalised to 0-127 at every grid tick

 sdnn_norm = clip((sdnn − min) / (range), 0, 1)
 rmssd_norm = clip((rmssd − min) / (range), 0, 1)
 add CC 11 with value round(sdnn_norm × 127) at time t
 add CC 1 with value round(rmssd_norm × 127) at time t

A cumulative-pattern list is built so rhythmic density rises without ever muting earlier layers

 CUM_PATTERNS = []
 acc = []
 for (thr, cell, note) in rules:
  acc.append((cell, note)) # keep previous hits
  CUM_PATTERNS.append(list(acc)) # zone n holds n+1 hits

During rendering each bar/cell slot is visited, the current HR is interpolated onto that exact moment, its zone is looked up, velocity is set to the rounded pulse value, and only the cells active in that zone fire

 hr = interp(t_sample, times, hr_vals)
 zone = first i where hr ≤ rules[i].threshold
 vel = int(round(hr))

 for (z_cell, note) in CUM_PATTERNS[zone]:
  if z_cell ≠ cell: continue
  beat = (bar × cells + cell) × STEP_BEAT × 4
  t0 = beat × 60 / bpm
  write Note(pitch=note, velocity=vel, start=t0, end=t0 + NOTE_LENGTH in seconds)


Before any writing starts the entry-point scans the whole folder to gather global minima and maxima for HR, SDNN and RMSSD, guaranteeing identical zone limits and CC scaling across every file in the batch

 hr_min, hr_max = min/max of all HR values
 sdnn_min, sdnn_max = min/max of all SDNN values
 rmssd_min, rmssd_max = min/max of all RMSSD values
 rules = generate_rules(hr_min, hr_max, cells)

Run the converter like this:

 python ecg_to_drum.py path/to/csv_folder --tempo 75 --time-signature 16/16

Each CSV becomes a _drums.mid file that contains a single drum-kit track whose note density follows heart-rate zones and whose CC 11 and CC 1 envelopes mirror long- and short-term variability—ready to animate filters, reverbs or whatever you map them to in the DAW.

Here are examples of generated MIDI files with drums. On the left side, there are drums made with individual mapping for every patient separately, and on the right side, mapping is one for all patients. We can see they resemble our HR graphs.