Everyone should, at some point or another, learn how audio files work. This knowledge may seem trivial or unimportant, but it can really come in handy — such as when recording music, creating a podcast, or optimizing your music library.
In this post, we’re going to explore the various factors that affect audio quality and audio file size. Striking the perfect balance between the two isn’t easy, of course, but by the end you should know enough to feel comfortable and experiment for yourself.
Note: To put this knowledge into practice, you’ll want to grab a free audio editor like Audacity or one of the many Audacity alternatives out there. Learning those tools is beyond the scope of this piece.
1. Sample Rate
In real life, sound is a wave. When someone speaks or claps their hands, what you’re actually hearing is a change in pressure that travels through the air and eventually hits your eardrums.
But how do we capture that sound and convert it into digital data? We can’t just record the full sound wave as it is; instead, we have to take periodic “snapshots” of the sound over time. When you play it all back in sequence, you get an approximate recreation of the original sound.
Each snapshot is called a sample and the interval used between each snapshot is called the sample rate. The shorter the interval, the faster the frequency. Faster frequencies produce more accurate recordings, but also require more data to store each second of recorded sound.
For example, CD quality audio uses a sample frequency of 44.1 KHz (or 44,100 samples per second) whereas TV and DVD quality audio uses a sample freqency of 48 KHz. Given a 10-minute uncompressed mono audio recording, the former might be 51.7 MB while the latter would be 56.3 MB.
You can drop to 32 KHz for speech-only recordings and not experience much loss in quality, but stick to 44.1 KHz if music is involved or if you need utmost quality. Dropping to 22.05 KHz will sound closer to AM radio.
Bitrate is not the same thing as sample rate. A lot of people tend to conflate the two, but it’s important that you don’t. First of all, if sample rate is how often the snapshots of sound are taken, then bit depth is how much data is recorded during each snapshot.
To illustrate, imagine a sound wave as a stream of water and you’re trying to capture (i.e. record) that water with a bucket. Sample rate would be how often you dip your bucket into the stream while bit depth would be the size of your bucket.
The higher the bit depth, the more data is captured per sample. This leads to a more accurate recording at the expense of more space required to store that data. But if you reduce the bit depth too much, sound data gets lost.
Bitrate is how much actual sound data is processed per second; in this case, you multiply the sample rate by the bit depth. A CD audio file with a 44.1 KHz sample rate and a 16-bit depth would have an uncompressed bitrate of 705.6 kbps.
For more on optimal bitrates, read the last section in this article on File Formats.
Sometimes the full bitrate isn’t needed in a given snapshot, such as when there’s silence. In that case, you can use variable bitrate (VBR), which is supported by MP3, OGG, AAC, and WMA. In the past, VBR wasn’t widely supported, but nowadays isn’t much of an issue.
3. Stereo vs. Mono
This point is pretty straightforward so I’ll keep it brief. Mono means one channel while stereo means two channels. The two channels in a stereo audio file can be referred to as the “left” and “right” channels.
With a pair of headphones, you’ll be able to hear one of the stereo channels in one ear and the other stereo channel in the other ear. When listening to a mono audio file, you’ll hear the same exact channel in both ears.
In a sense, stereo audio files are essentially two mono audio files in one — which means that a stereo audio file is always twice as big as a mono audio file, assuming the sample rate, bit depth, source sound, etc. are the same between the two.
So the easiest way to instantly cut an audio file size in half is to convert it from stereo to mono. For voice-only recordings, mono is almost always preferred for this reason.
Note that stereo is what makes a lot of music sound more 3D, as if the music is playing around you rather than at you (i.e. mono sounds flatter). But a lot of people can’t tell the difference, so you might be fine with it. Only you can decide if it’s worth the cut.
If you’re working with WAV files, the only way to reduce file size is by tinkering with one of the above settings (sample rate, bit depth, or number of channels). For everything else, compression is the biggest factor in audio file size.
There are two kinds of compression:
- Lossy compression removes “unnecessary” data from the audio, such as sounds that are beyond the hearing range of most people. Once compressed, this discarded data can’t be recovered.
- Lossless compression takes an audio file and packs it down as much as possible using mathematical algorithms, but these must be decompressed at the time of playback, which requires more processing power. No actual data is lost.
Lossless compression results in the same quality as uncompressed audio, but even at its best, lossless compression results in file sizes that are at least twice as large as lossy compression. For optimal file sizes, go with lossy compression.
If you’ve never compressed an audio file before, or if you’re looking for a good tool to get the job done, consider using one of these easy and effective ways to compress audio.
5. File Format
Once you’ve decided to go with lossy compression, you have to decide which file format is best for you. As of this writing, the three most popular options are MP3, OGG, and AAC. Learn more in our comparison of audio file formats.
MP3 is the most popular by far, mainly because it was the first of the three to arrive on the scene. AAC is technically better than MP3, but doesn’t have the same usage rate. OGG is good too, but not many devices support it, so stick with MP3 or AAC.
Regardless of which one you use, you’ll end up compressing to a target bitrate. If we assume you’re going to use the MP3 format, then these are the five most common bitrates currently used:
- 64 kbps is AM radio quality. Perfect for talk-only podcasts because voices aren’t as complex as music.
- 96 kbps is FM radio quality. Music will sound fine, but you’ll be able to tell that it isn’t full-bodied, mainly because certain hearable frequencies were removed.
- 128 kbps is CD audio quality. This is as standard as it gets. Music sounds “good enough” for most folks at this bitrate.
- 256 kbps is high audio quality. You may notice certain sounds and instruments that were not detectable at lower bitrates.
- 320 kbps is best audio quality. You can go higher, but you probably won’t be able to tell the difference — even if you consider yourself to be an audiophile.
In terms of file size reduction, an MP3 compressed to 128 kbps loses approximately 90% of the original sound data, whereas an MP3 compressed to 320 kbps only loses about 60%.
Also, if you have an MP3 and an AAC both compressed to the same bitrate, the AAC will often sound better because it uses a more advanced compression algorithm. This means you can get more “quality per megabyte” with AAC than MP3.
Understanding these five factors will not only help you to decide the best way to record and compress music and/or podcasts that you’ve created, but can also help you decide what kind of music formats to purchase or which streaming services to use.
As a listener, what’s your preferred file format and bitrate for music? As a creator, what settings do you use for your music or podcasts? Let us know with a comment below!