IRCAM Forum Workshops 2025 – ACIDS

From 26 to 28th of March, we (the sound design master, second semester) had the incredible opportunity to visit IRCAM (Institut de Recherche et Coordination Acoustique/Musique) in Paris as part of a student excursion. For anyone passionate about sound, music technology, and AI, IRCAM is like stepping into new fields of research, discussion and seeing prototypes in action. One of my personal highlights was learning about the ACIDS team (Artificial Creatiive Intelligence and Data Science) and their research projects—RAVE (Real-time Audio Variational autoEncoder) and AFTER (Audio Features Transfer and Exploration in Real-time

ACIDS – Team

The ACIDS team is a multidisciplinary group of researchers working at the intersection of machine learning, sound synthesis, and real-time audio processing. Their name stands for Audio, Communication, Information, Data, and Sound, reflecting their broad focus on computational audio research. During our visit, they gave us an inside look at their latest developments, including demonstrations from the IRCAM Forum Workshop (March 26–28, 2025), where they showcased some of their most exciting advancements. Beside their really good and catchy (also a bit funny) presentation I want to showcase two projects.

RAVE (Real-Time Neural Audio Synthesis)

One of the most impressive projects we explored was RAVE (Real-time Audio Variational autoEncoder), a deep learning model for high-quality audio synthesis and transformation. Unlike traditional digital signal processing, RAVE uses a latent space representation of sound, allowing for intuitive and expressive real-time manipulation.

Overall architecture of the proposed approach. Blocks in blue are the only ones optimized,
while blocks in grey are fixed or frozen operations.

Key Innovations

  1. Two-Stage Training:
    • Stage 1: Learns compact latent representations using a spectral loss.
    • Stage 2: Fine-tunes the decoder with adversarial training for ultra-realistic audio.
  2. Blazing Speed:
    • Runs 20× faster than real-time on a laptop CPU, thanks to a multi-band decomposition technique.
  3. Precision Control:
    • Post-training latent space analysis balances reconstruction quality vs. compactness.
    • Enables timbre transfer and signal compression (2048:1 ratio).

Performance

  • Outperforms NSynth and SING in audio quality (MOS: 3.01 vs. 2.68/1.15) with fewer parameters (17.6M).
  • Handles polyphonic music and speech, unlike many restricted models.

You can explore RAVE’s code and research on their GitHub repository and learn more about its applications on the IRCAM website.

AFTER

While many AI audio tools focus on raw sound generation, what sets AFTER (Audio Foundation Transformer) apart is its sophisticated control mechanisms—a priority highlighted in recent research from the ACIDS team. As their paper states:

“Deep generative models now synthesize high-quality audio signals, shifting the critical challenge from audio quality to control capabilities. While text-to-music generation is popular, explicit control and example-based style transfer better capture the intents of artists.”

How AFTER Achieves Precision

The team’s breakthrough lies in separating local and global audio information:

  • Global (timbre/style): Captured from a reference sound (e.g., a vintage synth’s character).
  • Local (structure): Controlled via MIDI, text prompts, or another audio’s rhythm/melody.

This is enabled by a diffusion autoencoder that builds two disentangled representation spaces, enforced through:

  1. Adversarial training to prevent overlap between timbre and structure.
  2. A two-stage training strategy for stability.
Detailed overview of our method. Input signal(s) are passed to structure and timbre encoders, which provides
semantic encodings that are further disentangled through confusion maximization. These are used to condition a latent
diffusion model to generate the output signal. Input signals are identical during training and but distinct at inference.

Why Musicians Care

In tests, AFTER outperformed existing models in:

  • One-shot timbre transfer (e.g., making a piano piece sound like a harp).
  • MIDI-to-audio generation with precise stylistic control.
  • Full “cover version” generation—transforming a classical piece into jazz while preserving its melody.

Check out AFTER’s progress on GitHub and stay updated via IRCAM’s research page.

References

Caillon, Antoine, and Philippe Esling. “RAVE: A Variational Autoencoder for Fast and High-Quality Neural Audio Synthesis.” arXiv preprint arXiv:2111.05011 (2021). https://arxiv.org/abs/2111.05011.

Demerle, Nils, Philippe Esling, Guillaume Doras, and David Genova. “Combining Audio Control and Style Transfer Using Latent Diffusion.” 

Prototyping III: Image Extender – Image sonification tool for immersive perception of sounds from images and new creation possibilities

Research on sonification of images / video material and different approaches – focus on RGB

The paper by Kopecek and Ošlejšek presents a system that enables visually impaired users to perceive color images through sound using a semantic color model. Each primary color (such as red, green, or blue) is assigned a unique sound, and colors in an image are approximated by the two closest primary colors. These are represented through two simultaneous tones, with volume indicating the proportion of each color. Users can explore images by selecting pixels or regions using input devices like a touchscreen or mouse. The system calculates the average color of the selected area and plays the corresponding sounds. Distinct audio cues indicate image boundaries, and sounds can be either synthetic or instrument-based, with timbre and pitch helping to differentiate them. Users can customize colors and sounds for a more personalized experience. This approach allows for dynamic, efficient exploration of images and supports navigation via annotated SVG formats.

image seperation by Kopecek and Ošlejšek

The review by Sarkar, Bakshi, and Sa offers an overview of various image sonification methods designed to help visually impaired users interpret visual scenes through sound. It covers techniques such as raster scanning, query-based, and path-based approaches, where visual data like pixel intensity and position are mapped to auditory cues. Systems like vOICe and NAVI use high and low-frequency tones to represent image regions vertically. The paper emphasizes the importance of transfer functions, which link image properties to sound attributes such as pitch, volume, and frequency. Different rendering methods—like audification, earcons, and parameter mapping—are discussed in relation to human auditory perception. Special attention is given to color sonification, including the semantic color model introduced by Kopecek and Ošlejšek, which improves usability through clearly distinguishable tones. The paper also explores applications in fields such as medical imaging, algorithm visualization, and network analysis, and briefly touches on sound-to-image conversions.

Principles of the image-to-sound mapping

Matta, Rudolph, and Kumar propose the theoretical system “Auditory Eyes,” which converts visual data into auditory and tactile signals to support blind users. The system comprises three main components: an image encoder that uses edge detection and triangulation to estimate object location and distance; a mapper that translates features like motion, brightness, and proximity into corresponding sound and vibration cues; and output generators that produce sound using tools like Csound and tactile feedback via vibrations. Motion is represented using effects like Doppler shift and interaural time difference, while spatial positioning is conveyed through head-related transfer functions. Brightness is mapped to pitch, and edges are conveyed through tone duration. The authors emphasize that combining auditory and tactile information can create a richer and more intuitive understanding of the environment, making the system potentially very useful for real-world navigation and object recognition.

References

Kopecek, Ivan, and Radek Ošlejšek. 2008. “Hybrid Approach to Sonification of Color Images.” In Third 2008 International Conference on Convergence and Hybrid Information Technology, 721–726. IEEE. https://doi.org/10.1109/ICCIT.2008.152.

Sarkar, Rajib, Sambit Bakshi, and Pankaj K Sa. 2012. “Review on Image Sonification: A Non-visual Scene Representation.” In 1st International Conference on Recent Advances in Information Technology (RAIT-2012), 1–5. IEEE. https://doi.org/10.1109/RAIT.2012.6194495.

Matta, Suresh, Heiko Rudolph, and Dinesh K Kumar. 2005. “Auditory Eyes: Representing Visual Information in Sound and Tactile Cues.” In Proceedings of the 13th European Signal Processing Conference (EUSIPCO 2005), 1–5. Antalya, Turkey. https://www.researchgate.net/publication/241256962.

Vergleich verschiedener KI-Video-Tools

Im ersten Schritt meiner Recherche zu KI und KI-gestützten Video-Tools habe ich mir einen umfassenden Überblick über die gängigen Anbieter verschafft und die verschiedenen Tools einem ersten Test unterzogen.

Nachfolgend findest du eine detaillierte Auflistung der wichtigsten Funktionen, Preisstrukturen sowie meiner persönlichen Erfahrungen mit den jeweiligen Tools. Abschließend ziehe ich ein Fazit, welches meine bisherigen Erkenntnisse zusammenfasst und eine erste Einschätzung zu den besten Anwendungen für unterschiedliche Anforderungen gibt.

Adobe Firefly Video Model

Adobe Firefly Video Model richtet sich primär an professionelle Anwender aus der Film- und Medienbranche, die hochwertige KI-generierte Clips benötigen. Die Integration in Adobe Premiere Pro macht es besonders attraktiv für bestehende Adobe-Nutzer. In der Anwendung überzeugt Firefly mit einer hohen Qualität der generierten 5-Sekunden-Clips, jedoch sind die aktuellen Funktionen im Vergleich zu anderen KI-Video-Tools noch recht limitiert.

Hauptfunktionen:

  • Generierung von 5-Sekunden-Clips in 1080p​
  • Integration in Adobe Premiere Pro​
  • Fokus auf Qualität und realistische Darstellung​

Preismodell:

Gratis/in der Creative Cloud enthalten: 1.000 Generative Credits für Bild- und Vektorgrafik-Standardfunktionen wie „Text zu Bild“ und „Generatives Füllen“+ 2 KI-Videos

  • Basis: 11,08€ pro Monat für 20 Clips​ à 5 Sekunden
  • Erweitert: 33,26€ pro Monat für 70 Clips​ à 5 Sekunden
  • Premium: Preis auf Anfrage für Studios und hohe Volumen

Fazit:

+ Funktioniert an sich sehr gut, einfaches und logisches Interface, generierte Videos sehr gut (mehr dazu im 2. Blogpost „erste Anwendung“), 

+ unter Bewegungen hat man eine Auswahl an den gängigsten Kamerabewegungen wie (Zoom in/out, Schwenk links/rechts/oben/unten, statisch oder Handheld)

– leider nur 2 Probevideos möglich, auf 5 Sekunden begrenzt

–> werde für das Projekt eventuell für 1-2 Monate Adobe Firefly Standard kaufen (je nach Intensivität der Nutzung und Länge des Endprodukts vllt sogar die Erweiterte Version)

(Quelle: https://firefly.adobe.com/?media=video )

RunwayML

RunwayML ist eine vielseitige KI-Plattform, die sich auf die Erstellung und Bearbeitung von Videos spezialisiert hat. Mit einer benutzerfreundlichen Oberfläche ermöglicht sie es, Videos aus Texten, Bildern oder Videoclips zu generieren. Besonders hervorzuheben ist die Text-zu-Video-Funktion, die es ermöglicht, aus einfachen Texteingaben realistische Videosequenzen zu erstellen. Zudem bietet RunwayML die Möglichkeit, erstellte Videos direkt zu exportieren, was den Workflow erheblich erleichtert.​

Preismodelle:

  • Basic: Kostenlos, 125 einmalige Credits, bis zu 3 Videoprojekte, 5 GB Speicher.
  • Standard: $15 pro Benutzer/Monat (monatliche Abrechnung), 625 Credits/Monat, unbegrenzte Videoprojekte, 100 GB Speicher.​
  • Pro: $35 pro Benutzer/Monat (monatliche Abrechnung), 2250 Credits/Monat, erweiterte Funktionen, 500 GB Speicher.​
  • Unlimited: $95 pro Benutzer/Monat (monatliche Abrechnung), unbegrenzte Videogenerierungen, alle Funktionen enthalten.​
  • Quelle: https://runwayml.com/pricing

Aber auch die Möglichkeit „Runway for Educators“. Kann man sich anmelden, werde ich definitiv versuchen (man bekommt einmal 5.000 Credits)

Side note: Runway is incorporated into the design and filmmaking curriculums at UCLA, NYU, RISD, Harvard and countless other universities around the world. Request discounted resources to support your students.

Fazit: sieht an sich sehr vielversprechend aus, werde ich defintiv noch genauer testen,

werde eine Anfrage für Runway for Educators stellen

–> ebenfalls eine Überlegung wert ein Abo abzuschließen für den Zeitraum des Projekts, wird aber je nach Anwendung und nach Ergebnissen noch entschieden

(Quelle: https://runwayml.com )

Midjourney

Midjourney ist ein KI-gestützter Bildgenerator, der durch die Eingabe von Textbeschreibungen hochwertige und künstlerische Bilder erzeugt. Die Plattform ist bekannt für ihre Fähigkeit, lebendige und detaillierte Bilder zu erstellen, die den Nutzervorgaben entsprechen. Allerdings liegt der Fokus von Midjourney hauptsächlich auf der Bildgenerierung, und es bietet keine dedizierten Text-zu-Video-Funktionen.​

Preismodelle:

  • Basis: $10 pro Monat, begrenzte Nutzung.​
  • Standard: $30 pro Monat, erweiterte Nutzung.​
  • Pro: $60 pro Monat, unbegrenzte Nutzung.​

Fazit:

Kann allerdings gut mit den anderen beiden KI-Tools kombiniert werden, z.B. Bilderstellung mit Midjourney und „Animation/Bewegung“ in den anderen Programmen

+ an sich ein tolles KI-Tool, vor allem das feature, dass 4 Bilder generiert werden und man sich mit den Verweisen auf die Bilder beziehen kann, liefert tolle Ergebnisse

– an sich „komplizierter“ als andere KI-Tools dadurch, dass eine „gewisse Sprache“ bei den Prompts verwendet werden muss, macht aber sobald man es einmal verstanden hat keine großen Unterschied

(Quelle: https://www.midjourney.com/home https://www.victoriaweber.de/blog/midjourney )

Sora

Sora ist ein von OpenAI entwickeltes KI-Modell, das es ermöglicht, realistische Videos basierend auf Texteingaben zu erstellen.

–  Text-zu-Video-Generierung: Sora kann kurze Videoclips von bis zu 20 Sekunden Länge in verschiedenen Seitenverhältnissen (Querformat, Hochformat, quadratisch) erstellen. Nutzer können durch Texteingaben Szenen beschreiben, die dann von der KI in bewegte Bilder umgesetzt werden. ​OpenAI

–  Remix: Mit dieser Funktion können Elemente in bestehenden Videos ersetzt, entfernt oder neu interpretiert werden, um kreative Anpassungen vorzunehmen. ​

–  Re-Cut: Sora ermöglicht es, Videos neu zu schneiden und zu arrangieren, um alternative Versionen oder verbesserte Sequenzen zu erstellen. ​

Preismodell:

– Plus:
20$/Monat
includes the ability to explore your creativity through video
Up to 50 videos (1.000 credits)
Limited relaxed videos
Up to 720p resolution and 10s duration videos

– Pro
200$/Monat
includes unlimited generations and the highest resolution for high volume workflows
Up to 500 videos (10.000 credits)
Unlimited relaxed videos
Up to 1080p resolution and 20s duration videos

Fazit:

+ tolles Tool, intuitiveres Interface, vor allem sehr attraktiv, da ich bereits ein ChatGPT Plus Abo haben und im Vergleich zu Adobe kein zusätzliches Abo für die Grundfunktionen notwendig ist

+ ebenfalls inspirierend ist die Startseite, auf der viel Inspo und andere Videos zu sehen sind. Keines der anderes Tools war so aufgebaut und förderte so stark und schnell die Kreativität, vor allem sehr gut, da die Prompts immer angeben sind und einen Einblick geben, wie Prompts formuliert werden müssen um gute Ergebnisse zu erhalten

+ ebenfalls sehr gut gelöst, ist die Tutorial Section

(Quelle: https://sora.com/subscription )

GESAMTFAZIT:

Für meinen weiteren Forschungs- und Projektprozess werde ich die verschiedenen KI-gestützten Videotools weiterhin intensiv testen und ausgiebige Experimente durchführen.

Besonders positiv überrascht hat mich bisher Sora, da der Einstieg dank meines ChatGPT Plus-Abos äußerst unkompliziert war. Bei den anderen KI-Tools prüfe ich derzeit noch, welche Anbieter für meine Anforderungen am besten geeignet sind und ob sich ein Abonnement lohnt. Adobe und Runway stehen dabei aktuell ganz oben auf meiner Liste. Besonders bei Runway hoffe ich, ein Educator-Abo erhalten zu können, um das Tool im vollen Umfang nutzen zu können.

Prototyping II: Image Extender – Image sonification tool for immersive perception of sounds from images and new creation possibilities

Expanded research on sonification of images / video material and different approaches:

Yeo and Berger (2005) write in “A Framework for Designing Image Sonification Methods” about the challenge of mapping static, time-independent data like images into the time-dependent auditory domain. They introduce two main concepts: scanning and probing. Scanning follows a fixed, pre-determined order of sonification, whereas probing allows for arbitrary, user-controlled exploration. The paper also discusses the importance of pointers and paths in defining how data is mapped to sound. Several sonification techniques are analyzed, including inverse spectrogram mapping and the method of raster scanning (which already was explained in the Prototyping I – Blog entry), with examples illustrating their effectiveness. The authors suggest that combining scanning and probing offers a more comprehensive approach to image sonification, allowing for both global context and local feature exploration. Future work includes extending the framework to model human image perception for more intuitive sonification methods.

Sharma et al. (2017) explore action recognition in still images using Natural Language Processing (NLP) techniques in “Action Recognition in Still Images Using Word Embeddings from Natural Language Descriptions.” Rather than training visual action detectors, they propose detecting prominent objects in an image and inferring actions based on object relationships. The Object-Verb-Object (OVO) triplet model predicts verbs using object co-occurrence, while word2vec captures semantic relationships between objects and actions. Experimental results show that this approach reliably detects actions without computationally intensive visual action detectors. The authors highlight the potential of this method in resource-constrained environments, such as mobile devices, and suggest future work incorporating spatial relationships and global scene context.

Iovino et al. (1997) discuss developments in Modalys, a physical modeling synthesizer based on modal synthesis, in “Recent Work Around Modalys and Modal Synthesis.” Modalys allows users to create virtual instruments by defining physical structures (objects), their interactions (connections), and control parameters (controllers). The authors explore the musical possibilities of Modalys, emphasizing its flexibility and the challenges of controlling complex synthesis parameters. They propose applications such as virtual instrument construction, simulation of instrumental gestures, and convergence of signal and physical modeling synthesis. The paper also introduces single-point objects, which allow for spectral control of sound, bridging the gap between signal synthesis and physical modeling. Real-time control and expressivity are emphasized, with future work focused on integrating Modalys with real-time platforms.

McGee et al. (2012) describe Voice of Sisyphus, a multimedia installation that sonifies a black-and-white image using raster scanning and frequency domain filtering in “Voice of Sisyphus: An Image Sonification Multimedia Installation.” Unlike traditional spectrograph-based sonification methods, this project focuses on probing different image regions to create a dynamic audio-visual composition. Custom software enables real-time manipulation of image regions, polyphonic sound generation, and spatialization. The installation cycles through eight phrases, each with distinct visual and auditory characteristics, creating a continuous, evolving experience. The authors discuss balancing visual and auditory aesthetics, noting that visually coherent images often produce noisy sounds, while abstract images yield clearer tones. The project draws inspiration from early experiments in image sonification and aims to create a synchronized audio-visual experience engaging viewers on multiple levels.

Software Interface for Voice of Sisyphus (McGee et al., 2012)

Roodaki et al. (2017) introduce SonifEye, a system that uses physical modeling sound synthesis to convey visual information in high-precision tasks, in “SonifEye: Sonification of Visual Information Using Physical Modeling Sound Synthesis.” They propose three sonification mechanisms: touch, pressure, and angle of approach, each mapped to sounds generated by physical models (e.g., tapping on a wooden plate or plucking a string). The system aims to reduce cognitive load and avoid alarm fatigue by using intuitive, natural sounds. Two experiments compare the effectiveness of visual, auditory, and combined feedback in high-precision tasks. Results show that auditory feedback alone can improve task performance, particularly in scenarios where visual feedback may be distracting. The authors suggest applications in medical procedures and other fields requiring precise manual tasks.

Dubus and Bresin review mapping strategies for the sonification of physical quantities in “A Systematic Review of Mapping Strategies for the Sonification of Physical Quantities.” Their study analyzes 179 publications to identify trends and best practices in sonification. The authors find that pitch is the most commonly used auditory dimension, while spatial auditory mapping is primarily applied to kinematic data. They also highlight the lack of standardized evaluation methods for sonification efficiency. The paper proposes a mapping-based framework for characterizing sonification and suggests future work in refining mapping strategies to enhance usability.

References

Yeo, Woon Seung, and Jonathan Berger. 2005. “A Framework for Designing Image Sonification Methods.” In Proceedings of ICAD 05-Eleventh Meeting of the International Conference on Auditory Display, Limerick, Ireland, July 6-9, 2005.

Sharma, Karan, Arun CS Kumar, and Suchendra M. Bhandarkar. 2017. “Action Recognition in Still Images Using Word Embeddings from Natural Language Descriptions.” In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), 978-1-5090-4941-7/17. DOI: 10.1109/WACVW.2017.17.

Iovino, Francisco, Rene Causse, and Richard Dudas. 1997. “Recent Work Around Modalys and Modal Synthesis.” In Proceedings of the International Computer Music Conference (ICMC).

McGee, Ryan, Joshua Dickinson, and George Legrady. 2012. “Voice of Sisyphus: An Image Sonification Multimedia Installation.” In Proceedings of the 18th International Conference on Auditory Display (ICAD-2012), Atlanta, USA, June 18–22, 2012.

Roodaki, Hessam, Navid Navab, Abouzar Eslami, Christopher Stapleton, and Nassir Navab. 2017. “SonifEye: Sonification of Visual Information Using Physical Modeling Sound Synthesis.” IEEE Transactions on Visualization and Computer Graphics 23, no. 11: 2366–2371. DOI: 10.1109/TVCG.2017.2734320.

Dubus, Gaël, and Roberto Bresin. 2013. “A Systematic Review of Mapping Strategies for the Sonification of Physical Quantities.” PLoS ONE 8(12): e82491. DOI: 10.1371/journal.pone.0082491.

Prototyping I: Image Extender – Image sonification tool for immersive perception of sounds from images and new creation possibilities

Shift of intention of the project due to time plan:

By narrowing down the topic to ensure the feasibility of this project the focus or main purpose of the project will be the artistic approach. The tool will still combine the use of direct image to audio translation and the translation via sonification into a more abstract form. The main use cases will be generating unique audio samples for creative applications, such as sound design for interactive installations, brand audio identities, or matching image soundscapes and the possibility to be a versatile instrument for experimental media artists and display tool for image information.

By further research on different possibilities of sonification of image data and development of the sonification language itself the translation and display purpose is going to get more clear within the following weeks.

Testing of Google Gemini API for AI Object and Image Recognition:

The first testing of the Google Gemini Api started well. There are different models for dedicated object recognition and image recognition itself which can be combined to analyze pictures in terms of objects and partly scenery. These models (SSD, EfficientNET,…) create similar results but not always the same. It might be an option to make it selectable for the user (so that in a failure case a different model can be tried and may give better results). The scenery recognition itself tends to be a problem. It may be a possibility to try out different apis.

The data we get from this AI model is a tag for the recognized objects or image content and a percentage of the probability.

The next steps for the direct translation of it into realistic sound representations will be to test the possibility of using the api of freesound.org to search directly and automated for the recognized object tags and load matching audio files. These search calls also need to filter by copyright type of the sounds and a choosing rule / algorithm needs to be created.

Research on sonification of images / video material and different approaches:

The world of image sonification is rich with diverse techniques, each offering unique ways to transform visual data into auditory experiences. The world of image sonification is rich with diverse techniques, each offering unique ways to map visual data into auditory experiences. One of the most straightforward methods is raster scanning, introduced by Yeo and Berger. This technique maps the brightness values of grayscale image pixels directly to audio samples, creating a one-to-one correspondence between visual and auditory data. By scanning an image line by line, from top to bottom, the system generates a sound that reflects the texture and patterns of the image. For example, a smooth gradient might produce a steady tone, while a highly textured image could result in a more complex, evolving soundscape. The process is fully reversible, allowing for both image sonification and sound visualization, making it a versatile tool for artists and researchers alike. This method is particularly effective for sonifying image textures and exploring the auditory representation of visual filters, such as “patchwork” or “grain” effects.(Yeo and Berger, 2006)

Principle raster scanning (Yeo and Berger, 2006)

In contrast, Audible Panorama (Huang et al. 2019) automates sound mapping for 360° panorama images used in virtual reality (VR). It detects objects using computer vision, estimates their depth, and assigns spatialized audio from a database. For example, a car might trigger engine sounds, while a person generates footsteps, creating an immersive auditory experience that enhances VR realism. A user study confirmed that spatial audio significantly improves the sense of presence. It contains a interesting concept regarding to choosing a random audio file from a sound library to avoid producing similar or same results. Also it mentions the aspect of postprocessing the audios which also would be a relevant aspect for the image extender project.

principle audible panorama (Huang et al. 2019)

Another approach, HindSight (Schoop, Smith, and Hartmann 2018), focuses on real-time object detection and sonification in 360° video. Using a head-mounted camera and neural networks, it detects objects like cars and pedestrians, then sonifies their position and danger level through bone conduction headphones. Beeps increase in tempo and pan to indicate proximity and direction, providing real-time safety alerts for cyclists.

Finally, Sonic Panoramas (Kabisch, Kuester, and Penny 2005) takes an interactive approach, allowing users to navigate landscape images while generating sound based on their position. Edge detection extracts features like mountains or forests, mapping them to dynamic soundscapes. For instance, a mountain ridge might produce a resonant tone, while a forest creates layered, chaotic sounds, blending visual and auditory art. It also mentions different approaches for sonification itself. For example the idea of using micro (timbre, pitch and melody) and macro level (rhythm and form) mapping.

principle sonic panoramas (Kabisch, Kuester, and Penny 2005)

Each of these methods—raster scanningAudible PanoramaHindSight, and Sonic Panoramas—demonstrates the versatility of sonification as a tool for transforming visual data into sound and lead keeping these different approaches in mind for developing my own sonification language or mapping method. It also leads to further research by checking some useful references they used in their work for a deeper understanding of sonification and extending the possibilities.

References

Huang, Haikun, Michael Solah, Dingzeyu Li, and Lap-Fai Yu. 2019. “Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery.” In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1–11. Glasgow, Scotland: ACM. https://doi.org/10.1145/3290605.3300851.

Kabisch, Eric, Falko Kuester, and Simon Penny. 2005. “Sonic Panoramas: Experiments with Interactive Landscape Image Sonification.” In Proceedings of the 2005 International Conference on Artificial Reality and Telexistence (ICAT), 156–163. Christchurch, New Zealand: HIT Lab NZ.

Schoop, Eldon, James Smith, and Bjoern Hartmann. 2018. “HindSight: Enhancing Spatial Awareness by Sonifying Detected Objects in Real-Time 360-Degree Video.” In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 1–12. Montreal, QC, Canada: ACM. https://doi.org/10.1145/3173574.3173717.

Yeo, Woon Seung, and Jonathan Berger. 2006. “Application of Raster Scanning Method to Image Sonification, Sound Visualization, Sound Analysis and Synthesis.” In Proceedings of the 9th International Conference on Digital Audio Effects (DAFx-06), 311–316. Montreal, Canada: DAFx.

Explore II: Image Extender – Image sonification tool for immersive perception of sounds from images and new creation possiblities

The Image Extender project bridges accessibility and creativity, offering an innovative way to perceive visual data through sound. With its dual-purpose approach, the tool has the potential to redefine auditory experiences for diverse audiences, pushing the boundaries of technology and human perception.

The project is designed as a dual-purpose tool for immersive perception and creative sound design. By leveraging AI-based image recognition and sonification algorithms, the tool will transform visual data into auditory experiences. This innovative approach is intended for:

1. Visually Impaired Individuals
2. Artists and Designers

The tool will focus on translating colors, textures, shapes, and spatial arrangements into structured soundscapes, ensuring clarity and creativity for diverse users.

  • Core Functionality: Translating image data into sound using sonification frameworks and AI algorithms.
  • Target Audiences: Visually impaired users and creative professionals.
  • Platforms: Initially desktop applications with planned mobile deployment for on-the-go accessibility.
  • User Experience: A customizable interface to balance complexity, accessibility, and creativity.

Working Hypotheses and Requirements

  • Hypotheses:
    1. Cross-modal sonification enhances understanding and creativity in visual-to-auditory transformations.
    2. Intuitive soundscapes improve accessibility for visually impaired users compared to traditional methods.
  • Requirements:
    • Develop an intuitive sonification framework adaptable to various images.
    • Integrate customizable settings to prevent sensory overload.
    • Ensure compatibility across platforms (desktop and mobile).

    Subtasks

    1. Project Planning & Structure

    • Define Scope and Goals: Clarify key deliverables and objectives for both visually impaired users and artists/designers.
    • Research Methods: Identify research approaches (e.g., user interviews, surveys, literature review).
    • Project Timeline and Milestones: Establish a phased timeline including prototyping, testing, and final implementation.
    • Identify Dependencies: List libraries, frameworks, and tools needed (Python, Pure Data, Max/MSP, OSC, etc.).

    2. Research & Data Collection

    • Sonification Techniques: Research existing sonification methods and metaphors for cross-modal (sight-to-sound) mapping and research different other approaches that can also blend in the overall sonification strategy.
    • Image Recognition Algorithms: Investigate AI image recognition models (e.g., OpenCV, TensorFlow, PyTorch).
    • Psychoacoustics & Perceptual Mapping: Review how different sound frequencies, intensities, and spatialization affect perception.
    • Existing Tools & References: Study tools like Melobytes, VOSIS, and BeMyEyes to understand features, limitations, and user feedback.
    object detection from python yolo library

    3. Concept Development & Prototyping

    • Develop Sonification Mapping Framework: Define rules for mapping visual elements (color, shape, texture) to sound parameters (pitch, timbre, rhythm).
    • Simple Prototype: Create a basic prototype that integrates:
      • AI content recognition (Python + image processing libraries).
      • Sound generation (Pure Data or Max/MSP).
      • Communication via OSC (e.g., using Wekinator).
    • Create or collect Sample Soundscapes: Generate initial soundscapes for different types of images (e.g., landscapes, portraits, abstract visuals).
    example of puredata with rem library (image to sound in pure data by Artiom
    Constantinov)

    4. User Experience Design

    • UI/UX Design for Desktop:
      • Design intuitive interface for uploading images and adjusting sonification parameters.
      • Mock up controls for adjusting sound complexity, intensity, and spatialization.
    • Accessibility Features:
      • Ensure screen reader compatibility.
      • Develop customizable presets for different levels of user experience (basic vs. advanced).
    • Mobile Optimization Plan:
      • Plan for responsive design and functionality for smartphones.

    5. Testing & Feedback Collection

    • Create Testing Scenarios:
      • Develop a set of diverse images (varying in content, color, and complexity).
    • Usability Testing with Visually Impaired Users:
      • Gather feedback on the clarity, intuitiveness, and sensory experience of the sonifications.
      • Identify areas of overstimulation or confusion.
    • Feedback from Artists/Designers:
      • Assess the creative flexibility and utility of the tool for sound design.
    • Iterate Based on Feedback:
      • Refine sonification mappings and interface based on user input.

    6. Implementation of Standalone Application

    • Develop Core Application:
      • Integrate image recognition with sonification engine.
      • Implement adjustable parameters for sound generation.
    • Error Handling & Performance Optimization:
      • Ensure efficient processing for high-resolution images.
      • Handle edge cases for unexpected or low-quality inputs.
    • Cross-Platform Compatibility:
      • Ensure compatibility with Windows, macOS, and plan for future mobile deployment.

    7. Finalization & Deployment

    • Finalize Feature Set:
      • Balance between accessibility and creative flexibility.
      • Ensure the sonification language is both consistent and adaptable.
    • Documentation & Tutorials:
      • Create user guides for visually impaired users and artists.
      • Provide tutorials for customizing sonification settings.
    • Deployment:
      • Package as a standalone desktop application.
      • Plan for mobile release (potentially a future phase).

    Technological Basis Subtasks:

    1. Programming: Develop core image recognition and processing modules in Python.
    2. Sonification Engine: Create audio synthesis patches in Pure Data/Max/MSP.
    3. Integration: Implement OSC communication between Python and the sound engine.
    4. UI Development: Design and code the user interface for accessibility and usability.
    5. Testing Automation: Create scripts for automating image-sonification tests.

    Possible academic foundations for further research and work:

    Chatterjee, Oindrila, and Shantanu Chakrabartty. “Using Growth Transform Dynamical Systems for Spatio-Temporal Data Sonification.” arXiv preprint, 2021.

    Chion, Michel. Audio-Vision. New York: Columbia University Press, 1994.

    Görne, Tobias. Sound Design. Munich: Hanser, 2017.

    Hermann, Thomas, Andy Hunt, and John G. Neuhoff, eds. The Sonification Handbook. Berlin: Logos Publishing House, 2011.

    Schick, Adolf. Schallwirkung aus psychologischer Sicht. Stuttgart: Klett-Cotta, 1979.

    Sigal, Erich. “Akustik: Schall und seine Eigenschaften.” Accessed January 21, 2025. mu-sig.de.

    Spence, Charles. “Crossmodal Correspondences: A Tutorial Review.” Attention, Perception, Psychophysics, 2011.

    Ziemer, Tim. Psychoacoustic Music Sound Field Synthesis. Cham: Springer International Publishing, 2020.

    Ziemer, Tim, Nuttawut Nuchprayoon, and Holger Schultheis. “Psychoacoustic Sonification as User Interface for Human-Machine Interaction.” International Journal of Informatics Society, 2020.

    Ziemer, Tim, and Holger Schultheis. “Three Orthogonal Dimensions for Psychoacoustic Sonification.” Acta Acustica United with Acustica, 2020.

    1.10 AI Companions vs. Traditional Therapy

    Can Technology Replace Human Connection?

    The rise of AI companions has sparked a significant debate: can technology truly replace human therapists in addressing mental health issues? AI-driven systems like Woebot and Wysa offer cognitive-behavioral therapy (CBT) techniques, providing instant support to users. However, while these AI companions are effective in alleviating feelings of loneliness and offering immediate assistance, they still fall short in replicating the depth of human connection provided by traditional therapy.

    Image Source: Vice

    AI as a Complementary Tool

    AI companions offer several advantages, such as accessibility, 24/7 availability, and anonymity, making them valuable tools for individuals who may not have immediate access to human therapists. For instance, 48% of people in the U.S. reported experiencing some form of mental health issue, and AI solutions could help bridge the gap where human therapists are unavailable or overwhelmed by demand. However, they lack the nuanced empathy and relational depth that human therapists bring to therapeutic conversations. Research indicates that while AI companions can provide immediate relief, they do not guarantee substantial long-term improvements in mental health.

    The Future of Mental Health Care

    Rather than replacing human therapists, AI companions could become part of a hybrid model. AI can handle initial assessments and offer support between therapy sessions, while human therapists provide ongoing treatment for deeper emotional and psychological issues. This collaborative approach can provide a more comprehensive mental health support system, blending the best of both worlds. For example, AI companions have been shown to reduce loneliness among seniors, enhancing their overall well-being.

    Effectiveness of AI in Addressing Mental Health Issues

    AI companions have demonstrated effectiveness in managing certain mental health conditions:

    Anxiety and Depression: AI-driven applications can provide immediate support and coping strategies for individuals experiencing anxiety and depression. They offer tools like mood tracking, mindfulness exercises, and cognitive-behavioral techniques to help users manage symptoms.

    Stress Management: AI companions can assist in stress reduction by guiding users through relaxation techniques, meditation, and providing real-time feedback on stress levels.

    However, AI companions are less effective in addressing:

    Severe Mental Health Disorders: Conditions such as schizophrenia, bipolar disorder, and severe personality disorders require comprehensive treatment plans that include medication management and intensive psychotherapy, areas where AI companions currently fall short.

    Crisis Situations: In cases of acute mental health crises, such as suicidal ideation or severe self-harm, immediate human intervention is crucial. AI companions are not equipped to handle such emergencies and may not provide the necessary support.

    Sources

    1. “AI In Mental Health: Opportunities And Challenges In Developing Intelligent Digital Therapies.” Forbes. Accessed: Jan. 25, 2024. [Online.] Available: https://www.forbes.com/sites/bernardmarr/2023/07/06/ai-in-mental-health-opportunities-and-challenges-in-developing-intelligent-digital-therapies/
    2. “AI Therapists vs. Human Therapists: Complementary Roles in Mental Health.” mindpeace.ai. Accessed: Jan. 25, 2024. [Online.] Available: https://mindpeace.ai/blog/ai-therapists-vs-human-therapists
    3. “Artificial intelligence in mental health care.” American Psychological Association. Accessed: Jan. 25, 2024. [Online.] Available: https://www.apa.org/practice/artificial-intelligence-mental-health-care
    4. “Exploring the Pros and Cons of AI in Mental Health Care.” Active Minds. Accessed: Jan. 25, 2024. [Online.] Available: https://www.activeminds.org/blog/exploring-the-pros-and-cons-of-ai-in-mental-health-care/
    5. “Can AI Companions Help Heal Loneliness? | Eugenia Kuyda | TED.” YouTube. Accessed: Jan. 25, 2024. [Online.] Available: https://www.youtube.com/watch?v=-w4JrIxFZRA
    6. Lee, E. E., Torous, J., De Choudhury, M., Depp, C. A., Graham, S. A., Kim, H. C., Paulus, M. P., Krystal, J. H., & Jeste, D. V. (2021). Artificial Intelligence for Mental Health Care: Clinical Applications, Barriers, Facilitators, and Artificial Wisdom. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 6(9), 856-864. https://doi.org/10.1016/j.bpsc.2021.02.001
    7. “Mental Health Apps and the Role of AI in Emotional Wellbeing.” Mya Care. Accessed: Jan. 25, 2024. [Online.] Available: https://myacare.com/blog/mental-health-apps-and-the-role-of-ai-in-emotional-wellbeing
    8. Thakkar, A., Gupta, A., & De Sousa, A. (2024). Artificial Intelligence in Positive Mental Health: A Narrative Review. Frontiers in Digital Health, 6, 1280235. https://doi.org/10.3389/fdgth.2024.1280235
    9. ” ‘They thought they were doing good but it made people worse’: why mental health apps are under scrutiny.” The Guardian. Accessed: Jan. 25, 2024. [Online.] Available: https://www.theguardian.com/society/2024/feb/04/they-thought-they-were-doing-good-but-it-made-people-worse-why-mental-health-apps-are-under-scrutiny
    10. “Why Some Mental Health Apps Aren’t Helpful?” Greater Good Magazine. Accessed: Jan. 25, 2024. [Online.] Available: https://greatergood.berkeley.edu/article/item/why_some_mental_health_apps_arent_helpful

    1.9 The Emotional Intelligence of AI: Can Chatbots Truly Understand Us?

    As AI technology advances, chatbots are evolving to recognize emotional cues, providing support in mental health, companionship, and conversational interfaces. By integrating techniques such as natural language processing (NLP), sentiment analysis, and machine learning, these systems aim to simulate empathy and create meaningful interactions. However, the development of empathetic AI comes with challenges, including technological limitations, ethical concerns, and potential risks of over-dependence.

    Advancements in Empathetic Algorithms

    Empathetic algorithms are designed to detect, interpret, and respond to human emotions using methods such as NLP, voice tone recognition, and facial expression analysis. For example: Woebot employs cognitive-behavioral therapy (CBT) techniques to guide users through stress and anxiety management, leveraging emotional cues from conversations. Wysa uses sentiment analysis to provide customized mindfulness exercises and mood tracking tools for emotional resilience.

    Beyond mental health, empathetic algorithms are being integrated into other sectors like education and customer service, tailoring interactions based on emotional cues to improve engagement and satisfaction.

    Chatbots as Relationship Simulators

    LLMs such as GPT power chatbots like Replika AI and Character AI, which simulate human-like relationships. Replika AI enables users to design virtual companions for friendship, mentorship, or even romantic connections, raising questions about emotional reliance and blurred boundaries between humans and machines. Character AI allows users to interact with AI representations of fictional or historical figures, blending entertainment with relationship simulation.

    Replika, Image Source: Every

    These developments reflect themes from the movie Her, where an AI operating system becomes a deeply personal companion. While such systems offer emotional support, they highlight risks like over-dependence, which could potentially hinder real-life emotional interactions.

    Movie Her, Image Source: IMDb

    The Role of Empathy in AI

    Empathetic AI is transforming human-AI interactions by making them more intuitive and emotionally aligned. However, achieving true emotional intelligence in machines remains a significant challenge:

    • Complex Emotions: Emotions are shaped by individual, cultural, and situational factors, making them difficult for AI to interpret consistently.
    • Simulated Empathy: Current AI systems simulate empathy by mimicking human responses rather than genuinely understanding emotions.
    • Ethical Concerns: Privacy risks arise from AI’s reliance on sensitive emotional data, making transparency and data security essential.

    Applications and Insights from Research

    Recent studies emphasize how empathetic algorithms can enhance human emotional intelligence by fostering emotional awareness and resilience. For instance:

    • Educational AI systems: Tailor learning environments to students’ emotional states, adapting content based on signs of frustration or confusion.
    • Healthcare applications: Use empathetic AI to assess patients’ emotional needs and deliver personalized support, improving outcomes for individuals with anxiety or depression.

    Despite these advancements, challenges such as cultural biases in emotion recognition and the need for interdisciplinary collaboration remain key areas for growth.

    Sources

    1. “Character.ai: Young people turning to AI therapist bots.” BBC. Accessed: Jan. 24, 2025. [Online.] Available: https://www.bbc.com/news/technology-67872693?utm_source=chatgpt.com
    2. ” ‘Maybe we can role-play something fun’: When an AI companion wants something more.” BBC. Accessed: Jan. 24, 2025. [Online.] Available: https://www.bbc.com/future/article/20241008-the-troubling-future-of-ai-relationships?utm_source=chatgpt.com
    3. “Replika CEO Eugenia Kuyda says it’s okay if we end up marrying AI chatbots.” The Verge. Accessed: Jan. 24, 2025. [Online.] Available: https://www.theverge.com/24216748/replika-ceo-eugenia-kuyda-ai-companion-chatbots-dating-friendship-decoder-podcast-interview?utm_source=chatgpt.com
    4. Velagaleit, S. B., Choukaier, D., Nuthakki, R., Lamba, V., Sharma, V., & Rahul, S. (2024). Empathetic Algorithms: The Role of AI in Understanding and Enhancing Human Emotional Intelligence. Journal of Electrical Systems, 20-3s, 2051–2060. https://doi.org/10.52783/jes.1806
    5. “Woebot Health – Mental Health Chatbot.” Woebot Health. Accessed: Jan. 24, 2025. [Online.] Available: https://woebothealth.com/
    6. “Wysa – Everyday Mental Health.” Wysa. Accessed: Jan. 24, 2025. [Online.] Available: https://www.wysa.com/

    1.7 Privacy vs. Personalization: Navigating Ethical Challenges in AI Mental Health Apps

    AI-driven mental health apps offer a remarkable combination of personalization and accessibility, providing users with tailored experiences based on their unique needs. For example, apps like Talkspace utilize AI to detect crisis moments and recommend immediate interventions, while platforms such as Wysa offer personalized exercises based on user interactions. However, these benefits come with significant privacy and ethical challenges. To deliver personalized support, such tools rely on sensitive data such as user emotions, behavioral patterns, and mental health histories. This raises critical questions about how this data is collected, stored, and used.

    Image Source: Government Technology Insider

    Ensuring privacy in these apps requires robust safeguards, including encryption, secure data storage, and compliance with regulations like GDPR in Europe and HIPAA in the United States. These laws mandate transparency, requiring developers to clearly explain how user data is handled. Companies like Headspace exemplify these practices by encrypting user data, limiting employee access, and providing users with the option to control data-sharing settings. Headspace also rigorously tests its AI for safety, particularly in detecting high-risk situations, and connects users to appropriate resources when needed.

    Beyond privacy, ethical concerns about fairness and inclusivity in AI algorithms are prominent. If the data used to train these algorithms isn’t diverse, the resulting tools may be less effective, or even harmful, for underrepresented groups. For example, biases in language or cultural context can lead to misunderstandings or inappropriate recommendations, potentially alienating users. To address this, platforms must ensure their datasets are diverse and representative, integrate cultural sensitivity into their development processes, and conduct ongoing audits to identify and rectify biases. Headspace’s AI Council, a group of clinical and diversity experts, serves as a model for embedding equity and inclusivity in AI tools.

    Transparency is another key pillar for ethical AI in mental health. Users must be informed about how the AI works, the types of data it collects, and its limitations. For example, AI is not a replacement for human empathy, and users should be made aware of when to seek professional help. Clear communication builds trust and empowers users to make informed choices about their mental health.

    While AI-driven mental health apps can enhance engagement and outcomes through personalization, the trade-off between privacy and functionality must be carefully managed. Ethical design practices, such as secure data handling, bias mitigation, and transparent user communication, are essential for balancing these priorities. By addressing these challenges proactively, developers can ensure that these tools support mental health effectively while respecting users’ rights and diversity.

    Sources

    1. “AI principles at Headspace.” Headspace. Accessed: Jan. 14, 2025. [Online.] Available: https://www.headspace.com/ai
    2. Basu, A., Samanta, S., Sur, S., & Roy, A. Digital Is the New Mainstream. Kolkata, India: Sister Nivedita University, 2023.
    3. “Can AI help with mental health? Here’s what you need to know.” Calm. Accessed: Jan. 14, 2025. [Online.] Available: https://www.calm.com/blog/ai-mental-health
    4. Coghlan, S., Leins, K., Sheldrick, S., Cheong, M., Gooding, P., & D’Alfonso, S. (2023). To chat or bot to chat: Ethical issues with using chatbots in mental health. Digital Health, 9, 1–11. https://doi.org/10.1177/20552076231183542
    5. Hamdoun, S., Monteleone, R., Bookman, T., & Michael, K. (2023). AI-based and digital mental health apps: Balancing need and risk. IEEE Technology and Society Magazine, 42(1), 25–36. https://doi.org/10.1109/MTS.2023.3241309
    6. Valentine, L., D’Alfonso, S., & Lederman, R. (2023). Recommender systems for mental health apps: Advantages and ethical challenges. AI & Society, 38(4), 1627–1638. https://doi.org/10.1007/s00146-021-01322-w

    1.6 How AI Is Reshaping Mental Health Support

    Artificial intelligence is revolutionizing mental health care by breaking down barriers like cost, stigma, and accessibility. With features like chatbots, biofeedback, and voice analysis, AI offers innovative solutions for mental health support. While AI can’t replace human therapists, its ability to complement traditional care makes it a valuable tool.

    Venture capital reports reveal that mental health is the fastest-growing marketplace category, with a growth rate exceeding 200% in 2023. This surge reflects a rising demand for accessible mental health solutions as AI continues to play a critical role in meeting that need.

    How AI Powers Mental Health Apps

    AI-Driven Chatbots

    AI chatbots provide immediate, tailored support for users in need:

    • Wysa offers CBT-based exercises and mindfulness prompts, creating a safe space for users to manage stress and anxiety.
    • Woebot adapts its conversations to users’ emotions, providing tools for real-time mental health management.
    • Cass combines emotional support and psychoeducation, offering adaptive responses that cater to individual needs.

    In May 2024, Inflection AI launched Pi, a bot designed for emotional support and conversational companionship. Unlike other chatbots, Pi openly acknowledges its limitations, avoiding the pretense of being human while focusing on honest and straightforward interactions.

    Wearables and Biofeedback

    Wearable devices enhance AI’s ability to provide real-time insights into users’ mental states:

    • Moodfit and Spring Health use wearable data, like heart rate and stress levels, to deliver personalized mental health strategies.
    • Kintsugi analyzes vocal biomarkers to detect signs of anxiety or depression, offering users actionable insights based on their voice patterns.
    Image Source: 9to5Mac

    These integrations bridge the gap between physical and emotional health, empowering users to take control of their well-being.

    Opportunities in AI Mental Health Care

    AI’s advantages lie in its ability to make mental health support more accessible, personalized, and inclusive:

    • Immediate and affordable: tools like Headspace’s Ebb and Wysa provide around-the-clock support at a fraction of the cost of traditional therapy.
    • Engagement and effectiveness: a 2022 review found that AI tools could improve engagement and reduce symptoms of anxiety and depression. However, experts emphasize that AI works best as a supplement, not a substitute, for traditional therapy. As Dr. Chris Mosunic of Calm explains, “Having a human in the driver’s seat with improved therapy AI tools might be just the right blend to maximize engagement, efficacy, and safety.”
    • Personalized support: apps like Woebot and Youper adapt their recommendations to the user’s changing emotional needs, creating a more tailored experience.
    Image Source: Business Wire

    Challenges and Ethical Considerations

    While AI offers promising solutions, it also presents challenges:

    • Limited empathy: AI tools often lack the emotional depth of human therapists, which can leave users feeling unsupported in complex situations.
    • Bias and inclusivity: non-diverse training data can lead to biased responses, potentially failing marginalized communities that rely more heavily on these tools due to systemic barriers.
    • Privacy concerns: AI tools require access to sensitive data. Apps like Talkspace use encryption to protect user information, but trust in data security remains a significant hurdle.

    As these tools evolve, balancing innovation with ethical responsibility will be critical – a topic that will be explored further in upcoming articles.

    Sources

    1. A. Fiske, P. Henningsen, & A. Buyx. (2019). Your robot therapist will see you now: Ethical implications of embodied artificial intelligence in psychiatry, psychology, and psychotherapy. Journal of Medical Internet Research, 21(5), e13216. https://doi.org/10.2196/13216
    2. A. Thakkar, A. Gupta, & A. De Sousa. (2024). Artificial intelligence in positive mental health: A narrative review. Frontiers in Digital Health, 6. https://doi.org/10.3389/fdgth.2024.1280235
    3. “Can AI help with mental health? Here’s what you need to know.” Calm. Accessed: Jan. 4, 2025. [Online.] Available: https://www.calm.com/blog/ai-mental-health
    4. “Meet Ebb | AI Mental Health Companion.” Headspace. Accessed: Jan. 4, 2025. [Online.] Available: https://www.headspace.com/ai-mental-health-companion
    5. P. Gual-Montolio, I. Jaén, V. Martínez-Borba, D. Castilla, & C. Suso-Ribera. (2022). Using artificial intelligence to enhance ongoing psychological interventions for emotional problems in real- or close to real-time: A systematic review. International Journal of Environmental Research and Public Health, 19(13), 7737. https://doi.org/10.3390/ijerph19137737
    6. “Rise of AI therapists.” VML. Accessed: Jan. 4, 2025. [Online.] Available: https://www.vml.com/insight/rise-of-ai-therapists