Product VII: Image Extender

Room-Aware Mixing – From Image Analysis to Coherent Acoustic Spaces

Instead of attempting to recover exact physical properties, the system derives normalized, perceptual room parameters from visual cues such as geometry, materials, furnishing density, and openness. These parameters are intentionally abstracted to work with algorithmic reverbs.

The introduced parameters are:

  • room_detected (bool)
    Indicates whether the image depicts a closed indoor space or an outdoor/open environment.
  • room_size (0.0–1.0)
    Represents the perceived acoustic size of the room (small rooms → short decay, large spaces → long decay).
  • damping (0.0–1.0)
    Estimates high-frequency absorption based on visible materials (soft furnishings, carpets, curtains vs. glass and hard walls).
  • wet_level (0.0–1.0)
    Describes how reverberant the space naturally feels.
  • width (0.0–1.0)
    Estimates perceived stereo width derived from room proportions and openness.

All parameters are stored flat within the same dictionary as objects, panning, and importance values, forming a single coherent scene representation.

Dereverberation: Explored, Then Intentionally Abandoned

As part of this phase, automatic analysis of existing reverberation (RT60, DRR estimation) and dereverberation was evaluated.

The outcome:

  • Computationally expensive, especially in Google Colab
  • Inconsistent and often unsatisfactory audio results
  • High complexity with limited practical benefit

Decision:
Dereverberation is not pursued further in this project. Instead, the system relies on:

  • Consistent room estimation
  • Controlled, unified reverb application
  • Preventive design rather than corrective processing

The next step will be to focus on the analysis of the sounds (especially rt60 and drr values) to make the reverb (if its a closed room) less on the specific sound.