We’ve all heard of file compression. Anyone who regularly downloads files from the web is familiar with formats like ZIP and RAR, and anyone who edits media files knows that compression is necessary to share images, music and videos on the web without using up all of your bandwidth. File compression is at the core of how the web works, you might argue, because it allows us to share files that would otherwise take too long to transfer. But how does it work?
It’s nothing magical, but it is the result of a lot of hard work by many very smart people. Let’s explore how file compression works by looking over the two main types of compression – lossless and lossy.
Just a warning – I’m going to oversimplify things here in an attempt to make this readable by non-math majors. Check out the linked-to Wikipedia articles for more depth, and Wikipedia’s sources for even more.
Lossless Compression
Lossless compression basically works by removing redundancy. What does that mean? Let’s simplify things. This stack of bricks will represent our data:

As you can see we’ve got two red bricks, five yellow and three blue. The simplest way to represent this is as you see above: the bricks themselves. But it’s not the only way I can represent this. I could also do this:

In the above image you can see the exact same information – two red, five yellow and three blue – but it takes up significantly less space. I’ve represented redundant bricks using numbers, meaning I need only three bricks to represent ten.
This gives you a rough idea how lossless compression is possible. Information that’s redundant is replaced with instructions telling the computer how much identical data repeats. Another simplified example:
fffffffuuuuuuuuuuuu
Can be “compressed” to:
f7u12
This is only one method of lossless compression, of course, but it points to how this is possible. Other math tricks are used, but the main thing to remember about lossless compression is that while space is temporarily saved, it is possible to reconstruct the original file entirely from the compressed one. If you see three bricks with numbers you know exactly how to make the stack. No information is lost, just as the name lossless implies.
Programs like WinZip are based on lossless compression. They remove this redundant information when you compress (or “zip”) the file and restore it when you uncompress (or “unzip”). Nothing is lost.
In the image world, PNG files also use lossless compression. This is why they offer a smaller file size for images with lots of uniform space: that redundant information is represented using instructions.
Of course, this is all an oversimplification, but it gets the basic point across. Read more about lossless compression on Wikipedia, if you’re interested.
Lossy Compression
Of course, there’s only so much you can accomplish using only lossless methods. Happily they’re not the only option: you can also simply remove information. This is called lossy compression, and it’s not as crazy as it sounds; in fact, you probably have many files on your computer made using lossy compression.
An MP3, for example. If you’re like most people your computer stores thousands of them for you, but did you know they don’t contain all of the audio information the original recording did? Some sounds, which humans cannot or can barely hear, are removed as part of the compression. The more you compress a file the more information is removed, which is why an overly compressed file will start to sound muddy.
Lossy compression tends to mostly be used for media files – pictures, sound and video. Using lossy compression for a text file would be problematic, as the resulting information would be garbled. It’s not always necessary for media files to include all the information, however.
Another example of lossy compression is the JPEG image. Generally speaking images seen on the web do not need to be as high-quality as images intended for printing. As such, you can remove a lot of redundant information in a web image, even if doing so would look awful printed.
Of course, repeatedly compressing a file using lossless methods decreases the quality – every time you do it more data is lost. Below is a photo I’ve compressed three times to demonstrate this:

You can see from left to right how the quality decreases. It may not matter, depending on what the image will be used for, and that’s why lossy compression exists.
It’s important to remember that files compressed using lossy methods actually lose data, meaning you cannot recreate the original file from one compressed using lossy methods. It’s obvious when you think about it, but many printing projects have been ruined for lack of understanding this key point.
I’ve really only scratched the surface here, so please: read more about lossy compression on Wikipedia. It’s kind of fascinating.
Conclusion
Compression helped make the web what it is. In the days of dialup compressed images brought photos to our browser, at least not at an acceptable speed. Compressed video makes sites like YouTube possible, and anyone who uses file sharing networks is familiar with ZIP and RAR files.
Do you have anything to add? I’m sure I’ve missed some key points so educate me (and the other readers) in the comments below.
Image Credit: Spring image via Shutterstock
More articles about:
Hide 27 Comments
Nice explanation :)
Is a jpeg photo “lossy” or “lossless” ?? You say: “Another example of lossless compression is the JPEG image.” but you have it under the “lossy” header.
I believe it’s a typo. JPEG is lossy.
JPEG is loosy (lower quality), PNG is loosles (maintains quality).
Oops! That’s a typo.
So – are you going to fix the typo in the article – or just leave it there to confuse future generations ???
Funny story: I don’t actually have the ability to edit articles. I’ve made the editors aware, so it should be fixed soon.
Thanks – Done.
That was a quite nice explanation. I did know that jpg files were compressed, but I thought that png files weren’t, due to their bigger size. This actually creates a very nice comparison between lossy and lossless compression: a file usually becomes 5 times smaller when converted from png to jpg!
Nice explanation. I’ll refer people to it instead of doing the work myself next time ;)
Good explanation but the missing data in mp3s is vary notable if you know what to listen for
Depends how much you compress it, but yes: you can hear it. You can see the compression in images and videos too, if you know what you’re looking for, but many decided the trade off is worth it.
Thanks for this! :D
Had to learn this in college! Thanks for the article.
there are six yellow bricks :))
I thought i knew this,but now i see i was wrong…thanks
Thank you! Finally a simple explanation, I like how you showed it with the LEGO.
I did not understand this in college.But now I do.
Nice explanation Justin, I enjoyed reading. Thanks
I use WinRar- any idea what is the default compression offered (lossy/loss-less)?
Winrar and software like it is completely lossless. It’s why the file is exactly the same as before once you extract it.
Thanks Justin.
“Another example of lossless compression is the JPEG image.” LOSSY.
Good Examples!
Clear explanation. Now I understand how those softwares give our files ‘magic slimming pills’.
Now is there any cloud storage that enable us to upload compressed folder, then choose just one file from that folder to download?
If it exists I don’t know about it. I’ll come back here if I find anything.
Clear as water. Thanks for illustration that make things more simple to understand.
Simplest definition i have ever seen.
Thanks bro