🎮Foley or music? Both! A guide to listening and designing video games with a different perspective
- lautaro Dichio
- Apr 14, 2025
- 11 min read
Hello! I'm Lautaro Dichio, a technical sound designer for video games, educator, researcher, and one of the creators of AVA (Audio para Videojuegos Argentina), a community that connects educators, students, and professionals in game audio across the country.
Several years ago, I completed my undergraduate thesis at the National University of Quilmes, where I researched how sound behaves in 3D video games and virtual reality. From that experience, I continued developing new ideas in articles, conferences, and in my current doctoral research in Science and Technology, where I explore the use of video games as a pedagogical tool in teaching music and technology.
In recent years, I have taught at UNA (National University of the Arts), UNQ (National University of Quilmes), and UCH (Champagnat University), in programs and diplomas dedicated to sound production for video games. In those spaces, we not only worked with tools like FMOD, Wwise, Unity, and Unreal, but I also proposed a broader perspective: to think of sound design not only as a technical craft but as an art that requires multiple prior analyses and a particular sensitivity. The creation of sounds in interactive environments involves understanding gameplay, immersion, emotions, narrative, perception... It is a deeply interdisciplinary process where sound becomes a driving force for artistic exploration and a way of conveying ideas through the game experience.
Some of you may already know that this year I decided to pause my teaching activities to fully focus on new personal and professional projects. But I didn’t want to miss the opportunity to share everything I’ve learned over the years. That’s why this blog was born: to open up some ideas I’ve developed in research and academic work, but in a more approachable, direct, and conversational format. It’s specially designed for curious people, students, and those taking their first steps into the world of game audio.
🎧 What sounds do we hear when we play?
Before moving on to more complex analysis, I think it’s important to take a moment and describe what we will be working with throughout this blog. In other words, we need to make an initial cut that allows us to name and understand the basic sound elements that make up the video game experience. This starting point doesn’t aim to box in or reduce the richness of interactive sound but rather to provide a common framework that will help us think, create, and discuss better in future posts. Because before exploring how sounds intersect or transform, we need to be clear about what those pieces are, how they are constructed, and what role they play in the game.
It may seem like an obvious question, but it’s not.
What sounds do we find in a video game?
How do we classify them?
How do we think about them when designing or analyzing them?
Often, from a technical or even academic perspective, video game sound is thought of as a simple extension of cinema or music adapted to a new medium. But there’s a fundamental difference that completely changes the rules of the game: interaction.
In video games, sound doesn’t just accompany. It guides, responds, builds worlds, creates expectations, provides feedback, and conveys emotions. Every sound serves an active experience. It’s not just about listening: it’s about interpreting and acting accordingly. This creates a constantly feeding cycle: the player’s actions generate sounds; those sounds, in turn, provide information that influences gameplay decisions, and those new decisions generate new sounds. Thus, the player is both listening and acting at the same time, continuously. Sound stops being decorative and becomes a key engine of the game.
A very clear example of this can be seen in racing video games. The sound of the engine, the wheels, the asphalt surface, the gear shifts, radio instructions, the DRS... this entire sonic universe provides us with crucial real-time information about how to drive. Beyond the visual or haptic feedback (like the steering wheel or the triggers on the controller), it is the auditory feedback that often makes the difference between taking a turn well or going off track.
Each new sound, like the change in road texture or the screech of a tire rubbing against a curb, forces us to react, adjust our driving, and anticipate what’s coming next. And those decisions generate new sounds. It’s a constant back-and-forth that is only possible thanks to a sound design created to interact with the gameplay.
In the next example, we’ll be able to hear exactly that: how different sounds modify the way we drive. From how and when to shift gears, to identifying the boundaries of the track. Everything is heard, everything is interpreted... and everything influences.
Sound, then, is not just a simple accompaniment: it is an active part of the game system. That’s why it’s not enough to think of it solely from an aesthetic or narrative standpoint. We need to ask ourselves what function each sound serves, how it is constructed, and how it connects with the player’s experience.
In my thesis, I proposed an approach that intersects two dimensions: on one side, how a sound is made, its sonic composition, and on the other, what it’s for, meaning its function within the video game. This intersection between technique and use not only helps us identify what we are hearing, but also understand what role it plays in the interactive dynamic.
Because in video games, where everything changes in real-time, the boundaries between different sounds become blurred. The same sound can serve different roles depending on the context, the game design, or even the interpretation of the player. This flexibility, so characteristic of the medium, forces us to rethink how we classify and analyze interactive sound.
Furthermore, this approach aims to be a meeting point between two worlds that are often thought of separately: the technical and the artistic. Organizing sounds based on their composition and function allows us not only to better understand the sonic language of video games but also to open pathways for more conscious, expressive, and sensitive creation.
This will be the starting point of the blog: an invitation to listen, think, and design interactive sound from a broad and deep perspective.
🧩 Classifying to Understand
For a long time, the study of audio in video games was approached as a mere extension of cinematic or musical languages. However, as we’ve begun to see, video games present a radically different structure, where interaction completely changes the way we listen to, produce, and analyze sound.
In response to this, ludomusicology emerged as a discipline created to fill this theoretical gap and open up new questions. Its initial focus was on music within the interactive context, but over time it expanded its scope to include other elements of the digital soundscape. Thus, a field was formed that no longer just talks about music but about sound in a broad sense, including effects, environments, voices, and any other sonic element that influences the gaming experience. Perhaps at another time, we’ll take a deeper dive into the discipline, but that is not the objective of this post.
As we have been analyzing, in this context, sound is not just an accompaniment or decoration. It is a response, an atmosphere, a stimulus, and also a language. It is an active part of the game design, and for this reason, we need tools that help us understand it in all its complexity.
From my work, I propose first a way of classifying the sounds in video games that stems from two complementary axes:
The Sonic Composition: how the sounds are made, what materials and techniques make them up.
Its functional use: what role they play within the game system, whether in terms of gameplay, narrative, or immersion.
This approach seeks to bridge the gap between the technical and the artistic, without reducing sound to a fixed category. Because in video games, where interaction and perception are constantly shifting variables, a single sound can take on multiple identities depending on the moment, the context, or the way it is played.
That’s why, more than categorizing, classification is a way of understanding: of recognizing patterns, relationships, and tensions in a sound ecosystem that is constantly changing.
🔊 The five major sound elements.
With this framework in mind, I proposed dividing the sound universe of video games into five major categories. They are not rigid compartments or absolute truths, but they help organize analysis and design:
🎼 Music
Use: Generates immersion, emotions, rhythm. It accompanies the narrative and often guides the experience. In many games, it serves as an emotional compass: if tense music plays, you know something is about to happen.
Composition: Instruments, voices, electronics, musical composition. In video games, it is always adapted to situations, which makes the music adaptive (we will explore this in more detail on another occasion).
📌 In The Legend of Zelda: Breath of the Wild, the music dynamically changes based on what you're doing: whether you're climbing a mountain or cooking a soup, a leitmotif awaits you.
🌲 Ambiences
Use: Places the player in an environment. Not just a specific location, but also a sense of time. They tell you whether you're in an abandoned temple, a bustling city, or a prehistoric jungle.
Composition: These can be made with field recordings, foley, or even sound effects. The source doesn't matter as much as its effect: creating an immersive atmosphere that's consistent with the virtual world.
📌 In Red Dead Redemption 2, if you sit down to fish, the environment sounds like a Sunday in the countryside: crickets, birds, and the wind through the trees. It makes you want to stay there.
👣 Foley
Use: Brings life to the objects and actions of the virtual world (footsteps, doors, interactions). It makes the world “respond” to what you do.
Composition: Edited recordings that emphasize clarity and function. The goal is naturalness, but also effectiveness so that the player understands what happened, when, and where.
📌 In The Last of Us, the sound of walking on glass or in the mud completely changes the tension... and sometimes even gives you away.
💥 Sound effects
Uso: Like foley, they represent actions or events, but are usually more abstract or intense. Often marking key moments of gameplay (explosions, powers, transitions).
Composition: Sounds designed from scratch, experimental combinations, synthesis, sound collages. This is the most creative area of sound design. Because it involves composing sound based on different logic than foley, it's categorized separately.
📌 The sound of collecting coins in Mario is so iconic that if you hear it in another context, you still know which game it’s from.
🗣️ Voice
Uso: They convey narrative information, build characters, and above all, humanize the experience. Sometimes, a scream or a laugh can be more powerful than a thousand words.
Composition: Recordings of voice actors and actresses, with or without processing. Sometimes they can also be part of the music or the environment, depending on how they are used.
📌 In Portal, GLaDOS’s sarcastic voice is a key part of the game’s humor. An AI with more personality than many NPCs.
🧰 Design and Analysis with Perspective
This categorization also helps create more refined tools when thinking about sound design, which we will explore in future posts. Knowing whether a sound functions as an ambient sound, effect, music, or voice can completely change how we position it in space, how we process it, or how much importance we give it in the mix.
For example, a sound that we initially consider as foley might work better if we treat it as an effect, emphasizing its symbolic or stylized character. Or a musical piece that is perceived as ambient can be redefined if we treat it as a voice of the environment, an entity that “speaks” about the world. These decisions are not merely technical; they are influenced by expressive, narrative, and perceptive considerations.
This perspective places the player's experience at the forefront. Because at the end of the day, it’s not just about how the sounds are made, but how they are heard, interpreted, and integrated into the logic of the game. Sound design is more than a sum of well-made sounds: it’s a network of relationships, meanings, and functions within the interactive system. And this is where this categorization becomes a powerful tool: not to pigeonhole, but to open up new questions and approaches.
🧠A categorization, many possibilities.
The categorization of sound elements in video games—ambient, effects and foley, music, voice—serves as a guide to make clearer decisions in design, implementation, or analysis. Each category has its own production logic, purpose, and impact. This proposal I've been working on is, in fact, a synthesis: a refined reduction of multiple theories, books, and approaches that have passed through my practical and academic experience.
As we've been analyzing, beyond serving as an organizational tool, this way of thinking helps us understand how elements relate to one another. Because the categories are not closed boxes. Often, they overlap, combine, or transform into one another. And it's in those intermediate areas where the most interesting aspects of the interactive medium start to appear. The boundary between music and environment can blur; a voice can acquire an ambient or emotional function; an effect can have a rhythmic and structural load akin to music.
These mixtures are not an accident or a mistake: they are part of the video game's own language. And understanding them is key to designing with more intention and analyzing with greater depth.
🌱 Towards sound hybrids.
The goal of this categorization proposal is to think about sound hybrids: sounds that do not exclusively fit into one category, but rather emerge from the exchange, combination, or transformation between different types of elements.
A classic and spectacular example of this is the radio in GTA: San Andreas. The radio in the car can function as music, due to its emotional charge and rhythmic connection with driving. But it is also ambiance because it is spatially placed inside the car and contextualizes the environment. At the same time, it acts as foley or effect, as it represents the operation of the car's stereo system. And as if that weren’t enough, it functions as voice: the lyrics of the songs, the radio shows, and advertisements provide relevant information about the plot, the characters, or the state of the world.
This type of sound construction cannot be analyzed from a single category. A more complex framework is needed. For this, I rely on the work of Jerrold Levinson, especially his text Hybrid Art Forms (1984), where he proposes three logics of combination between art forms: juxtaposition, fusion, and transformation.
In juxtaposition, the elements coexist as differentiated layers: for example, background music accompanying a landscape without merging with it.
In fusion, the elements combine so closely that they lose their original identity, giving rise to a new sonic form (like a design that is both voice and music).
In transformation, one element absorbs characteristics of the other, generating a hybrid with predominance: a sound that belongs to one category but behaves according to the logic of another (for example, an environment that functions as a narrative guide).
These models help us understand how sonic hybrids are constructed and perceived in video games. Some are deliberately sought after by designers and programmers; others emerge spontaneously from interaction. In these latter cases, it is the player who completes the process of combination through their perception and interpretation during the game.
Thus, sonic hybrids are not just stylistic mixes: they are products of design, technique, narrative, and perceptual experience. Thinking about them forces us to move beyond rigid classifications and embrace the complexity inherent in the interactive medium. Because, in the end, that is what video games invite us to do: to think of sound as a living, changing, and deeply contextual system.
🔜 What's coming next
In the next post, we’ll dive deep into the world of sonic hybrids: when the boundaries between music, ambience, foley, or voice blur, and new sonic forms appear that can only exist in a video game.
After that, I'll dedicate a post to each of the five elements, with concrete examples, analysis, and design strategies.
If you’re interested in this approach, have questions, ideas, or want to share your experience with sound in video games, I’m all ears in the comments. Thanks for stopping by and listening!
📚 Recommended Readings
If you're interested in continuing to explore this approach, here are some sources that helped me develop it in my thesis:
📘 Karen Collins – Game Sound: An Introduction to the History, Theory, and Practice of Video Game Music and Sound Design: A classic for understanding the emergence of the field of video game audio from a historical and technical perspective.
🎧 Winifred Phillips – A Composer’s Guide to Game Music: Clear and useful if you’re interested in musical composition in interactive environments.
🧠 Kristine Jørgensen – On the Functional Aspects of Computer Game Audio: A deeper look into how sound serves functions within gameplay.
📖 Ulrik Schmidt – Ambience and Ubiquity: Ideal for thinking about sound from a more philosophical and sensory dimension.
📄 My Thesis – Hybridization of Ambience-Foley in the Context of Three-Dimensional and Virtual Reality Video Games: Available for download. Here, I develop these five elements in more detail and explore how they intersect in what I call sonic hybrids.






Comments