Explore I: Image Extender – Image sonification tool for immersive perception of sounds from images and new creation possiblities

The project would be a program that uses either AI-content recognition or a specific sonification algorithm by using equivalent of the perception of sight (cross-model metaphors).

examples of cross modal metaphors (Görne, 2017, S.53)

This approach could serve two main audiences:

1. Visually Impaired Individuals:
The tool would provide an alternative to traditional audio descriptions, aiming instead to deliver a sonic experience that evokes the ambiance, spatial depth, or mood of an image. Instead of giving direct descriptive feedback, it would use non-verbal soundscapes to create an “impression” of the scene, engaging the listener’s perception intuitively. Therefore, the aspect of a strict sonification language might be a good approach. Maybe even better than just displaying the sounds of the images. Or maybe a mixture of both.

2. Artists and Designers:
The tool could generate unique audio samples for creative applications, such as sound design for interactive installations, brand audio identities, or cinematic soundscapes. By enabling the synthesis of sound based on visual data, the tool could become a versatile instrument for experimental media artists.

Purpose

The core purpose would be the mixture of both purposes before, a tool that supports and helps creating in the same suite.

The dual purpose of accessibility and creativity is central to the project’s design philosophy, but balancing these objectives poses a challenge. While the tool should serve as a robust aid for visually impaired users, it also needs to function as a practical and flexible sound design instrument.

The final product can then be used by people who benefit from the added perception they get of images and screens and for artists or designers as a tool.

Primary Goal

A primary goal is to establish a sonification language that is intuitive, consistent, and adaptable to a variety of images and scenes. This “language” would ideally be flexible enough for creative expression yet structured enough to provide clarity for visually impaired users. Using a dynamic, adaptable set of rules tied to image data, the tool would be able to translate colors, textures, shapes, and contrasts into specific sounds.

To make the tool accessible and enjoyable, careful attention needs to be paid to the balance of sound complexity. Testing with visually impaired individuals will be essential for calibrating the audio to avoid overwhelming or confusing sensory experiences. Adjustable parameters could allow users to tailor sound intensity, frequency, and spatialization, giving them control while preserving the underlying sonification framework. It’s important to focus on realistic an achievable goal first.

  • planning on the methods (structure)
  • research and data collection
  • simple prototyping of key concept
  • testing phases
  • implementation in an standalone application
  • ui design and mobile optimization

The prototype will evolve in stages, with usability testing playing a key role in refining functionality. Early feedback from visually impaired testers will be invaluable in shaping how soundscapes are structured and controlled. Incorporating adjustable settings will likely be necessary to allow users to customize their experience and avoid potential overstimulation. However, this customization could complicate the design if the aim is to develop a consistent sonification language. Testing will help to balance these needs

Initial development will target desktop environments, with plans to expand to smartphones. A mobile-friendly interface would allow users to access sonification on the go, making it easier to engage with images and scenes from any device.

In general, it could lead to a different perception of sound in connection with images or visuals.

Needed components

Technological Basis:

Programming Language & IDE:
The primary development of the image recognition could be done in Python, which offers strong libraries for image processing, machine learning, and integration with sound engines. Also wekinator could be a good start for the communication via OSC for example.

Sonification Tools:
Pure Data or Max/MSP are ideal choices for creating the audio processing and synthesis framework, as they enable fine-tuned audio manipulation. These platforms can map visual data inputs (like color or shape) to sound parameters (such as pitch, timbre, or rhythm).

Testing Resources:
A set of test images and videos will be required to refine the tool’s translations across various visual scenarios.

Existing Inspirations and References:

– Melobytes: Software that converts images to music, highlighting the potential for creative auditory representations of visuals.

– VOSIS: A synthesizer that filters visual data based on grayscale values, demonstrating how sound synthesis can be based on visual texture.

– image-sonification.vercel.app: A platform that creates audio loops from RGB values, showing how color data can be translated into sound.

– BeMyEyes: An app that provides auditory descriptions for visually impaired users, emphasizing the importance of accessibility in technology design.

Academic Foundations:

Literature on sonification, psychoacoustics, and synthesis will support the development of the program. These fields will help inform how sound can effectively communicate complex information without overwhelming the listener.

References / Source

Görne, Tobias. Sound Design. Munich: Hanser, 2017.