🔊Sonic Hybrids: A New Way to Think About Audio in Games

lautaro Dichio
Apr 21
14 min read

Before diving into this new entry, I want to thank everyone for the warm reception the last one received. It was truly exciting to see the content circulate, spark comments and questions, and—above all—inspire people to keep thinking about video game audio from a different perspective. That’s the main goal of this blog: to open up the conversation, not close it down.

So, I want to invite you (yes, you reading this) to comment, ask questions, share examples—or even disagree. If anything you read here sparks an idea, a doubt, or a new point of view, you’re more than welcome to share it. And if you think the opposite of what I write—even better! Because that’s where the most solid frameworks come from: theories built through shared experiences. This isn’t about preaching a single truth, but about putting different ways of listening, thinking, and designing into play.

🌱 Back to the Starting Point

In the previous post, we began opening up questions and possible approaches for analyzing sound in video games. We looked at the need to rethink what we’re actually working with when we work with game audio. I proposed a simple but powerful model to organize sounds into five major categories: music, ambience, effects, foley, and voice. That model gave us a shared foundation—a sort of map to begin navigating from.

We also introduced the concept of the sound hybrid: a flexible category that appears when a sound no longer fits neatly into a single function or type. These are living sounds—they change, react, and are reinterpreted. They resist being boxed in, and that’s exactly why they invite us to listen more attentively and design with more freedom. A sound might be part of the environment, but also react to a specific action. It might be designed as an effect, but be perceived as music. It might function as voice, but provide environmental depth like a background element.

Today, we’re diving deep into that world: the world of hybrids. We’ll look at what they are, how they work, why they appear, and what they teach us about how interactive sound builds meaning in games.

🎭 What Is a Sound Hybrid?

A sound hybrid is a sound that blurs the lines between two or more traditional categories of sound design. It’s a blend, a crossover, a middle ground between functions or ways of being perceived. It’s not just about layering—it's about creating a new sound identity or function that emerges from the mix.

In games, this happens all the time. Not always because it was intentionally designed that way (though sometimes it is), but because of how player interaction and perception make it inevitable. Sound in games isn’t fixed—it’s a dynamic experience built in real time, in relation to what’s happening on screen, what the player is doing, and how they interpret what they hear.

That’s why the same sound can carry multiple meanings. In Guitar Hero, for example, the guitar sound functions as music, but also as foley—because it’s tied directly to the act of “playing” a physical instrument. The player triggers a note, but that note also represents a physical action, a performance. Within one gesture, we have music, mechanics, and symbolic function all coexisting. In the clip below, featuring “One” by Metallica, you can hear and see how music, gesture, and sound design merge into a hybrid experience—where playing, listening, and gaming become one.

Guitar Hero clip with “One” by Metallica. Music, movement, and sound design merge into a hybrid experience where playing, listening, and gaming become one.

In other cases, like The Last of Us, a simple footstep can function as foley, ambience, and an alert signal. The sound of stepping on broken glass is synced with the animation (foley), it blends into and enriches the surrounding environment (ambience), and it alters gameplay logic when enemies are nearby—it becomes a message, a danger signal (effect). And that’s not all—it tells us something about the place we’re in. That glass isn’t just there; its sound carries a story. It tells us about the space, about time, about what might’ve happened before. It’s narrative in audio form—a sonic trace that expands the game world without saying a word.

The video below takes you through locations in The Last of Us, focusing on footsteps and the act of walking. The attention to detail in the sound design is remarkable (especially for a game of this genre), and it shows how each space tells its story through sound textures. It’s a long video, so I recommend jumping around and listening to different sections.

The Last of Us clip: Exploring locations and soundscapes. Listen to how footsteps turn into spatial storytelling—every surface, every echo, every creak tells us something about the space we’re in.

These crossovers aren’t mistakes or confusions. On the contrary, they’re one of the richest features of the interactive medium. Sound hybrids open up expressive possibilities, expand the range of interpretation, and force us to think of design in more fluid, less rigid ways.

👂 Perception at the Center: Beyond the Technical and Visual

When we talk about sound hybrids, it’s crucial to understand that it’s not all about how a sound was designed, but also (and often more importantly) about how the player perceives it. Perception isn’t a passive layer of the experience—it’s an active part of the interaction. And it’s in the space between design intent and how it’s heard where all the richness appears.

A single sound can be interpreted in many ways depending on context, game state, the player’s focus, or their expectations. A soft melody might feel like ambient background when you’re exploring peacefully—but if you suspect someone’s watching you, that same music might feel like a warning, or even like a living presence in the game world.

Take Inside by Playdead, for example. Some sounds might be classified as foley by design, but due to their spatial positioning, repetition, or rhythm, they end up functioning like music. A lever being pulled, a mechanism turning, footsteps on metal—they all build tension, rhythm, and atmosphere, even though they’re not “music” in the traditional sense. And yet we hear them as if they were.

Scene from Inside. No composed music, but pure musicality in sound construction: each sound sustains the atmosphere like a part of a score.

Another striking case is Hellblade: Senua’s Sacrifice, where auditory perception is at the heart of the experience. The internal voices, recorded using binaural techniques, resist easy classification. Are they voices? Are they ambience? A narrator? Enemies? They’re all of that at once. And how the player interprets them completely changes their relationship to the game. It’s a perfect example of how sound can construct subjectivity, disorientation, and emotion through perception.

Clip from Hellblade: Senua’s Sacrifice. The voices create a binaural design that makes perception part of the conflict, and sound into direct emotional experience.

🔀 Types of Sound Hybrids

To better understand how sound hybrids work, I’m borrowing and adapting a classification from philosopher Jerrold Levinson, especially from his text Hybrid Art Forms (1984). He proposes that hybrids emerge through three main forms of combination: juxtaposition, fusion, and transformation. We mentioned this briefly in the last post, but now we’ll go deeper into each one with specific audio examples from games. These logics help us observe how a single sound can cross functional boundaries and construct new meanings depending on design, gameplay, or player perception.

🧷 Juxtaposition

Juxtaposition occurs when two elements exist side by side without losing their identity. Each maintains its own characteristics and can be perceived separately within the same unit. It’s coexistence without mixing.

Applied to sound: a juxtaposed hybrid is a single sound that contains two distinct functions (like foley and music, or effect and voice), which don’t blend or transform into each other, but are presented together while remaining separate.

In Far Cry 6, when the character drives a vehicle with the radio on, sometimes they start singing along. This creates a sound hybrid: on one hand, you have a recognizable song (diegetic music), and on the other, the character’s voice joining in (interactive voice). Both functions are heard separately—one doesn’t absorb the other—but coexist in the same composite sound. It’s not just dialogue over a song, nor a musical piece featuring vocals: it’s a single hybrid sound event built through juxtaposition.

🔄 Fusion

Fusion happens when two elements completely merge, to the point where their boundaries disappear and a new indivisible form arises. We can no longer tell where one ends and the other begins.

In sound terms, a hybrid by fusion combines different functions (like music and ambience) into a unit that can’t be perceptually separated. The original identity of each part dissolves in the new result.

In Shadow of the Tomb Raider, during the Day of the Dead scene in a Mexican market, the soundscape is designed to create an immersive atmosphere. Traditional music, background voices, footsteps, cloth textures, objects—they’re not perceived as separate layers but as a single flowing sound block. Music, ambience, and diegetic sound all blend into one cohesive texture, where each element behaves as part of the environment. There’s no hierarchy: everything sounds like the world.

🔁 Transformation

Transformation appears when an element changes form or function over time, moving from one category to another. It’s not a blend or layering—it’s a process of change.

In audio, a transformational hybrid is one where a sound (like voice or foley) becomes something else, altering its perceived meaning for the player. The sound remains formally continuous, but its function and significance shift.

Returning to Hellblade, the internal voices initially sound like inner narration. But depending on the moment, they may act as ambience, interactive cues (guiding the player), or even like music, underscoring emotional sequences. These voices transform category based on perception. They’re still “voices,” but don’t always serve the same purpose.

⚖️ Dominance and Balance: A Deeper Dive

So far, we’ve seen that sound hybrids can combine in different ways—juxtaposition, fusion, transformation. But when we look at actual game examples, more questions arise: How are these combinations perceived? Is there a dominant function in each hybrid? Or do they coexist in balance?

To explore this, let’s introduce a new layer to the model: a closer look at how those functions are heard when combined. Which one leads? Which one supports?

We can distinguish two behaviors:

Dominant Hybrid: One function clearly stands out. For example, it mostly sounds like music, even if it includes foley or ambient elements. That dominant function shapes how we interpret the sound.
Balanced Hybrid: Functions coexist without one taking over. The result is a stable, complex unit where everything sounds at the same level. There are no clear hierarchies—it’s a network of meanings in equilibrium.

This approach can apply to any pair among the five base categories (music, ambience, effects, foley, voice). Focusing on two at a time helps us see how these relationships are built and how they impact the audio experience.

A particularly rich case is the ambient–foley hybrid. These two elements often overlap, blur, and redefine each other. Let’s zoom in on that.

⏸️ PAUSE

Okay, cool—concepts, classifications, types of combinations, perception, balance... but what’s the point of all this?

What’s it for? Why bother going in circles if what we want is to make or analyze sound in video games? Sometimes it all seems like too much theory, too little practical use. But that’s not the case. This all has a very clear purpose.

The idea behind this model isn’t just to understand how sound hybrids work in isolation. What we’re building is a tool that lets us take things a step further: to analyze entire mixes within video games. But we’re not quite there yet—that’ll take a few more posts.

Because that’s where things get really interesting. It’s not just about how two functions combine or are perceived in a single sound, but about how all sound elements interact with each other within the overall mix (music, foley, ambience, voice, effects). What role does each one play? What place does it occupy? And also (this is key): what happens when it’s not there?

Because silence speaks too. And in this case, we’re not talking about general silence—we’re talking about the silence of each function. What happens when there’s no music? When there’s no voice? When an ambience fades out, or when foley disappears? Omission is also a choice, and it shapes the mix just as much as the things that are heard.

All of this will help us better understand what each element does in a game’s sound mix. What it’s supposed to do. What it actually does. And what it stops doing when it’s gone.

Let’s get back into it.

🌲⚙️ The Ambient–Foley Hybrid

If there’s a place where all of this comes to life, it’s in ambient–foley hybrids. They’re constantly stepping on each other’s toes—similar, but not the same. And when they mix, the results vary widely depending on how they combine and how they’re perceived.

On one side, we have ambience—a surrounding sound field that places us in time and space, without a center. On the other, foley—detailed, specific sounds that bring actions and objects to life. Both build the game world, but from different logics. When combined, they can do so in various ways. And that’s where the model of combination + perception helps us understand what’s going on.

🛠️ Ambient–Foley Hybrid with Foley Dominance

In this type of hybrid, the ambient layer is built using foley sounds, even without direct interaction. What dominates isn’t so much the idea of an enveloping sound field, but the presence and detail of each individual sound.

A clear example is Horizon Zero Dawn. In many natural landscapes, we don’t hear a diffuse or ethereal background, but distinct, defined sounds: branches cracking, insects, small mechanical movements from robotic fauna. Even though we’re not triggering these sounds directly, they’re full of nuance and presence. The environment is perceived as ambience—but it’s constructed with a foley logic. That’s why foley takes the lead.

🌬️ Ambient–Foley Hybrid with Ambient Dominance

Here we have the opposite. The environment sounds like a cohesive, immersive field, and any foley elements inserted into it don’t break the ambient illusion—they blend in without shifting the overall logic.

A good example is BioShock Infinite. In the floating city of Columbia, you walk through beaches, fairs, plazas, and hear music, voices, ambient noise. Sometimes you encounter a radio, a machine, a fairground game, or a live band. You can get close, even interact. Technically, these are foley or effect sounds. But they feel like part of the ambient layer. Perception is still dominated by the scene’s immersive logic.

⚖️ Balanced Ambient–Foley Hybrid

And then there are cases where ambient and foley are presented equally, with neither taking over. The result is a hybrid texture where both functions intertwine without hierarchy.

This happens in Gone Home, a game about exploring an empty house. You turn on lights, play tapes, open doors, switch on TVs. Everything sounds (and sounds good). But it doesn’t feel like isolated actions or background noise: ambience and foley are so well integrated that what you hear is a single unified thing. You can’t fully separate which is which. That ambiguity is part of the game’s charm. What you do affects the environment—and the environment responds. Foley and ambience live in balance.

🌍 Programmed vs. Perceived Ambience

So far, we’ve broken down the idea of sound hybrids—how functions cross, how those crossings are built, and what role they play in games. But when we take a closer look at ambient–foley hybrids, something else emerges. Something not only about how sounds combine, but how we understand them, how we hear them, and what happens when that hearing shifts.

Because in ambient–foley hybrids, the line between one and the other gets so thin—so sensitive to player attention—that we need to think differently. It’s not enough to ask what function a sound performs—we need to ask what was programmed and what is perceived. That’s where this new idea comes in: the difference between programmed ambience and perceived ambience.

🎛️ What Is Programmed Ambience?

Programmed ambience is what the sound designer creates. It’s what’s defined, implemented, and tweaked in the game engine. It’s made of audio tracks, layers, technical and aesthetic criteria. It starts from an intention: to generate a sense of environment. It’s a constructed experience, based on what the designer wants the player to hear.

But like any construction, it’s shaped by cultural, narrative, and design decisions. It’s not just about copying reality, but representing it with limited resources and clear goals. Some games need simple, efficient backgrounds (like a jungle or city loop), while others allow for more complexity and layering.

👂 What Is Perceived Ambience?

This is where things get more interesting. Perceived ambience is not what the game programs, but what the player interprets as ambience. It’s how the player organizes what they hear. And that perception might be very different from what the designer had in mind.

The same sound might be part of the ambience one moment and be perceived as foley the next. What does it depend on? Attention, interaction, context. If the player approaches a running TV, observes it, uses it—its sound becomes salient: it becomes foley. But if they ignore it, stop paying attention, it fades back into the ambience. Interaction changes everything. When the player touches, activates, ignores, or modifies something, their way of listening changes too. The programmed ambience can vanish, shift, or be rebuilt depending on the experience. At that point, what matters is not just the design—it’s the listening.

🎧 Perception Builds the Ambience

This idea gets even stronger if we stop thinking of ambience as just a collection of sounds and start thinking of it as a non-centered sound field. Ambience is what surrounds you. And if no sound stands out too much—if none break the illusion—the whole thing is perceived as a single, immersive unit. When that unit breaks (because something grabs your attention or you interact), the balance shifts and a hybrid emerges.

This constant dance between attention and immersion is what defines perceived ambience. Ulrik Schmidt puts it beautifully: ambience isn’t a sound object—it’s an experience of being surrounded. It’s ubiquitous, fluid, without hierarchy. The opposite, in fact, is foley, which needs focus, clarity, and direction.

🎬 To Wrap Up (For Now)

This entry took us on a deep journey. We started with the idea of the sound hybrid and opened new questions: how do sound functions combine? how are those combinations perceived? what role do they play in the game? and what happens when those functions overlap or disappear?

We developed a model that helps us understand not just what sounds do, but how we listen to them and how they relate to each other within the game’s overall sound mix. Along the way, we explored concepts like dominance, balance, programmed ambience, perceived ambience, attention, and immersion—all essential if we want to see sound not just as decoration, but as a system.

This way of thinking opens up new paths for designing, analyzing, and talking about what we hear when we play. And if this helps with that—even just a little—then it’s already doing its job.

🔜 What’s Next

So far, we’ve focused on one specific crossover: ambient and foley. But that’s not the only one.

In the next post, we’ll explore other functional pairings: music and effects, voice and ambience, foley and music… and we’ll keep digging into what happens when sounds break free from traditional categories and start building new ways of meaning.

See you there.

💬 What Do You Think?

That’s all for this entry—but not for the conversation. As always, the goal here isn’t to close topics, but to open them up.

So here are a few questions for us to keep thinking together. If any of them resonate (or bother you, or spark something else), feel free to leave a comment, message me, or share it with someone who might enjoy it:

Have you ever perceived a sound differently than how it was clearly designed?
Can you recall a moment in a game where ambience spoke louder than dialogue or music?
Do you think viewing sound as a system (and not just decoration) changes the way we design or play?
Are there other functional crossovers (beyond ambient–foley) that you’d like to analyze or share?
What examples would you add to keep thinking about these hybrids?

Every contribution counts. The questions this blog raises aren’t here to be “answered correctly”—they’re here to help us listen from new perspectives. So if anything sparked an idea, a memory, a critique, or a doubt… I’m all ears.

Want me to analyze a sound from your favorite game in the next post? Drop it in the comments and I’ll add it to the list.

📚 Further Reading

If this post left you curious to keep exploring, dig deeper, or even argue with these ideas (please do!), here are some recommended reads that complement and expand on what we’ve covered:

🎓 Lautaro Dichio – “La percepción sonora como marco de análisis”The theoretical core of this post. It goes deeper into the sound hybrid concept (especially ambient–foley), the difference between programmed and perceived ambience, and the intersection of perception theory, sound design, and interactivity. Available online, and great for both designers and researchers.
📖 Ulrik Schmidt – “Ambience and Ubiquity” (2013)A philosophical and perceptual look at what sonic ambience is, how it’s built, and how it’s experienced. Many of this post’s ideas about immersion, attention, and spatiality are in direct dialogue with this text.
🎬 Walter Murch – “Dense Clarity, Clear Density” (2005)A classic in cinematic sound design, very helpful for thinking about how ambience is built in games too. Murch argues that after three simultaneous sounds, perception begins to create a “field”—a concept we referenced here to explore how a sense of soundscape emerges.
👂 Sutojo et al. – “Gestalt Principles in Auditory Perception” (2020)Though originally for visual perception, Gestalt laws also apply to sound. This paper helps understand how we group, differentiate, and prioritize sound elements when listening. A perfect complement for the sections on perception and hybrids.