At some point or another, everyone should learn how audio files work. This knowledge may seem trivial or unimportant, but it can come in handy when recording music, creating a podcast or optimizing your music library.

This post will explore the various factors affecting audio quality and audio file size. Striking the perfect balance between the two isn't easy, but you should know enough to feel comfortable and experiment for yourself by the end.

Note: To put this knowledge into practice, you'll want to grab a free audio editor like Audacity or any alternatives. Learning those tools is beyond the scope of this piece.

1. Sample Rate

In real life, sound is a wave. When someone speaks or claps their hands, what you're actually hearing is a change in pressure that travels through the air and eventually hits your eardrums.

But how do we capture that sound and convert it into digital data? We can't just record the full sound wave as it is; instead, we have to take periodic "snapshots" of the sound over time. When you play it all back in sequence, you get an approximate recreation of the original sound.

audio file size sample rate
Image Credit: Pluke/Wikimedia

Each snapshot is called a sample, and the interval used between each snapshot is called the sample rate. To define them, it is the number of digital snapshots taken per second in an audio file by an analog to digital converter. The sampling rate is measured in Hertz, so it can be expressed as a frequency.

The shorter the interval, the faster the frequency. Faster frequencies produce more accurate recordings but also require more data to store each second of recorded sound.

For example, CD-quality audio uses a sample frequency of 44.1 kHz (or 44,100 samples per second), whereas TV and DVD quality audio uses a sample frequency of 48 kHz. Given a 10-minute uncompressed mono audio recording, the former might be 51.7 MB while the latter would be 56.3 MB.

You can drop to 32 kHz for speech-only recordings and not experience much loss in quality, but stick to 44.1 kHz if music is involved or if you need utmost quality. Dropping to 22.05 kHz will sound closer to AM radio.

2. Bitrate

Bitrate is not the same thing as sample rate. A lot of people tend to conflate the two, but it's important that you don't. First of all, if the sample rate is how often the snapshots of sound are taken, then bit depth is how much data is recorded during each snapshot.

To illustrate, imagine a sound wave as a stream of water, and you're trying to capture (i.e., record) that water with a bucket. Sample rate would be how often you dip your bucket into the stream, while bit depth would be the size of your bucket. The measurement for bit depth is bits. For each one bit increase, the accuracy of the recording doubles.

audio file size bit depth
Image Credit: Aquegg/Wikimedia

The higher the bit depth, the more data is captured per sample. This leads to a more accurate recording at the expense of more space required to store that data.

But if you reduce the bit depth too much, sound data gets lost. Audio CDs use 16 bits per sample, while DVD and Blu-ray discs use 24 bits for each sample.

Bitrate is how much actual sound data is processed (expressed in kilobits per second). To get the bitrate, you multiply the sample rate by the bit depth. A CD audio file with a 44.1 kHz sample rate and a 16-bit depth would have an uncompressed bitrate of 44100*16, i.e., 705.6 kbps.

To give you an idea of the difference in file size, let's consider a five-minute uncompressed song recorded in a two-channel stereo audio

  1. 44.1kHz/16-bit: 44100*16*2 = 1411200 bits per second (1.4 Mbps)
  2. 192kHz/24-bit: 192000*24*2 = 9216000 bits per second (9.2Mbps)

Using the bitrate calculated, multiply it by the length of the recording

  1. 1.4*300 = 420Mb or 52.5 MB
  2. 9.2*300 = 2760Mb or 345 MB

So audio recorded in 192kHz/24-bit will take six times more space, but it all boils down to what you want to do with the audio recording. Sometimes the full bitrate isn't needed in a given snapshot, such as when there's silence.

In that case, you can use variable bitrate (VBR) supported by MP3, OGG, AAC, and WMA. In the past, VBR wasn't widely supported, but nowadays isn't much of an issue.

3. Stereo vs. Mono

This point is pretty straightforward, so I'll keep it brief. Mono means one channel, while Stereo means two channels. The two channels in a stereo audio file can be referred to as the "left" and "right" channels.

With a pair of headphones, you'll be able to hear one of the stereo channels in one ear and the other stereo channel in the other ear. When listening to a mono audio file, you'll hear the same exact channel in both ears.

audacity split channels stereo

In a sense, stereo audio files are essentially two mono audio files in one, which means that a stereo audio file is always twice as big as a mono audio file, assuming the sample rate, bit depth, source sound, etc. are the same between the two. So the easiest way to instantly cut an audio file size in half is to convert it from stereo to mono.

For voice-only recordings, mono is almost always preferred since it makes the sound powerful, clear, and upfront. But if you want to record two or more vocalists in a room with unique acoustics, the vocals should be stereo.

Similarly, podcast recording can be mono as well. However, in music recordings, a stereo is what makes a lot of music sound more three-dimensional, as if the music is playing around you rather than at you (i.e., mono sounds flatter).

4. Compression

If you're working with WAV files, the only way to reduce file size is by tinkering with one of the above settings (sample rate, bit depth, or number of channels). For everything else, compression is the biggest factor in audio file size. There are two kinds of compression:

  • Lossy compression removes "unnecessary" data from the audio, such as sounds that are beyond the hearing range of most people. Once compressed, this discarded data can't be recovered.
  • Lossless compression takes an audio file and packs it down as much as possible using mathematical algorithms. However, it must be decompressed at the time of playback, which requires more processing power. No actual data is lost.

The compression mode you want to use depends upon the intended use of the audio file. Generally, you should go with lossless compression when you want to store a nearly perfect copy of the source material and lossy compression when the imperfect copy is good enough for day-to-day usage.

For example, you might want to preserve your ripped CD collection in FLAC (if storage space isn't an issue) and use MP3 to store them on the phone. If you don't know much about compression, here's our complete guide on how file compression works and a list of tools to compress large audio files effectively.

5. File Format

Once you've decided to go with lossy compression, you have to decide which file format is best for you. As of this writing, the three most popular options are MP3, OGG, and AAC. To know more, read our guide on the comparison of various audio file formats.

MP3 is the most popular by far, mainly because it was the first of the three to arrive on the scene. AAC is technically better than MP3 but doesn't have the same usage rate. OGG is good too, but not many devices support it, so stick with MP3 or AAC.

Regardless of which one you use, you'll end up compressing to a target bitrate. If we assume you're going to use the MP3 format, then these are the five most common bitrates currently used:

  • 64 kbps is AM radio quality. Perfect for talk-only podcasts because voices aren't as complex as music.
  • 96 kbps is FM radio quality. Music will sound fine, but you'll be able to tell that it isn't full-bodied, mainly because certain hearable frequencies were removed.
  • 128 kbps is CD audio quality. This is as standard as it gets. Music sounds "good enough" for most folks at this bitrate.
  • 256 kbps is high audio quality. You may notice certain sounds and instruments that were not detectable at lower bitrates.
  • 320 kbps is the best audio quality. You can go higher, but you probably won't be able to tell the difference, even if you consider yourself to be an audiophile.

In terms of file size reduction, an MP3 compressed to 128 kbps loses approximately 90% of the original sound data, whereas an MP3 compressed to 320 kbps only loses about 60%.

Also, if you have an MP3 and an AAC both compressed to the same bitrate, the AAC will often sound better because it uses a more advanced compression algorithm. This means you can get more "quality per megabyte" with AAC than MP3.

Optimize Your Audio Files Sizes

Understanding these five factors will help you decide the best way to record and compress music and/or podcasts that you've created and help you decide what kind of music formats to purchase or which streaming services to use.