automated sound search

by David Adlberger - 29. January 2026

Product XI: Image Extender

From Notebook Prototype to Local, Exhibitable Software

This iteration was less about adding new conceptual capabilities and more about solidifying the system as an actual, deployable artifact. The core task was migrating the image extender from its experimental form into a standalone local application. What sounds like a technical refactor turned out to be a decisive shift in how the system is meant to exist, be used, and be encountered.

Until now, the notebook environment functioned as a kind of protected laboratory. It encouraged rapid iteration, verbose configuration, and exploratory branching. Moving out of that space meant confronting a different question: what does this system look like when it stops being a research sketch and starts behaving like software?

The transition from Colab-style execution to a locally running script forced a re-evaluation of assumptions that notebooks quietly hide:

Implicit state becomes explicit
Execution order must be deterministic
Errors can no longer be “scrolled past”
Configuration must be intentional, not convenient

Porting the logic meant flattening the notebook’s narrative structure into a single, readable execution flow. Cells that once assumed context had to be restructured into functions, initialization stages, and clearly defined entry points. This wasn’t just cleanup, it was an architectural clarification.

In the notebook, ambiguity is tolerated. In running software, it accumulates as friction.

Reduction as Design: Cutting Options to Increase Clarity

One of the more deliberate changes during this phase was a reduction in exposed settings. The notebook version allowed extensive tweaking, model switches, resolution variants, prompt behaviors, fallback paths, all useful during development, but overwhelming in a public-facing context.

For the exhibition version, optionality became noise.

Instead of presenting the system as a configurable toolkit, I reframed it as a guided instrument. Core behaviors remain intact, but the number of visible parameters was intentionally constrained. This aligns with a recurring principle in the project: flexibility should live inside the system, not on its surface.

Adapting for Exhibition: Y2K as Interface Language

Alongside the structural changes, the interface was visually adapted to match the exhibition context. The decision to lean into a Y2K-inspired color palette wasn’t purely aesthetic; it functioned as a form of contextual grounding.

The visual layer needed to communicate that this is not a neutral utility, but a situated artifact. The Y2K styling introduced:

High-contrast synthetic colors
Clear visual hierarchy
A subtle nod to early digital optimism and machinic playfulness

Rather than competing with the system’s conceptual weight, the styling makes its artificiality explicit.

Stability Over Novelty

Another quiet but important shift was prioritizing stability over feature expansion. The migration process exposed several edge cases that were easy to ignore in a notebook but unacceptable in a live context: silent failures, unclear loading states, brittle dependencies.

Addressing these didn’t add visible functionality, but they fundamentally changed how trustworthy the system feels. In an exhibition setting, reliability is part of the experience. A system that hesitates or crashes invites interpretation for the wrong reasons.

Here, robustness became a form of authorship.

Reframing the System’s Status

By the end of this iteration, the most significant change wasn’t technical, it was ontological. The system is no longer best described as “a notebook that does something interesting.” It is now a runnable, bounded piece of software, designed to be encountered without explanation.

This transition marks a subtle but important moment in the project’s lifecycle:

From private exploration to public behavior
From configurable experiment to opinionated instrument
From development environment to exhibited system

The constraints introduced in this phase don’t limit future growth, they define a stable core from which growth can happen meaningfully.

If earlier updates were about expanding the system’s conceptual reach, this one was about giving it a body.

by David Adlberger - 30. December 2025

Product VIII: Image Extender

Iterative Workflow and Feedback Mechanism

The primary objective for this update was to architect a paradigm shift from a linear generative pipeline to a nonlinear, interactive sound design environment

System Architecture & Implementation of Interactive Components

The existing pipeline, comprising image analysis (object detection, semantic tagging), importance-weighted sound search, audio processing (equalization, normalization, panoramic distribution based on visual coordinates), and temporal randomization was extended with a state-preserving session layer and an interactive control interface, implemented within the collab notebook ecosystem.

Data Structure & State Management
A critical prerequisite for interactivity was the preservation of all intermediate audio objects and their associated metadata. The system was refactored to maintain a global, mutable data structure, a list of processed_track objects. Each object encapsulates:

The raw audio waveform (as a NumPy array).
Semantic source tag (e.g., “car,” “rain”).
Track type (ambience base or foreground object).
Temporal onset and duration within the mix.
Panning coefficient (derived from image x-coordinate).
Initial target loudness (LUFS, derived from object importance scaling).

Dynamic Mixing Console Interface
A GUI panel was generated post-sonification, featuring the following interactive widgets for each processed_track:

Per-Track Gain Sliders: Linear potentiometers (range 0.0 to 2.0) controlling amplitude multiplication. Adjustment triggers an immediate recalculation of the output sum via a create_current_mix() function, which performs a weighted summation of all tracks based on the current slider states.
Play/Stop Controls: Buttons invoking a non-blocking, threaded audio playback engine (using IPython.display.Audio and threading), allowing for real-time auditioning without interface latency.

On-Demand Sound Replacement Engine
The most significant functional addition is the per-track “Search & Replace” capability. Each track’s GUI includes a dedicated search button (🔍). Its event handler executes the following algorithm:

Tag Identification: Retrieves the original semantic tag from the target processed_track.
Targeted Audio Retrieval: Calls a modified search_new_sound_for_tag(tag, exclude_id_list) function. This function re-executes the original search logic, including query formulation, Freesound API calls, descriptor validation (e.g., excluding excessively long or short files), and fallback strategies—while maintaining a session-specific exclusion list to avoid re-selecting previously used sounds.
Consistent Processing: The newly retrieved audio file undergoes an identical processing chain as in the initial pipeline: target loudness normalization (to the original track’s LUFS target), application of the same panning coefficient, and insertion at the identical temporal position.
State Update & Mix Regeneration: The new audio data replaces the old waveform in the processed_track object. The create_current_mix() function is invoked, seamlessly integrating the new sonic element while preserving all other user adjustments (e.g., volume levels of other tracks).

Integrated Feedback & Evaluation Module
To formalize user evaluation and gather data for continuous system improvement, a structured feedback panel was integrated adjacent to the mixing controls. This panel captures:

A subjective 5-point Likert scale rating.
Unstructured textual feedback.
Automated attachment of complete session metadata (input image description, derived tags, importance values, processing parameters, and the final processed_track list).
This design explicitly closes the feedback loop, treating each user interaction as a potential training or validation datum for future algorithmic refinements.
Automated sending of the feedback via email