How Does File Compression Work? [MakeUseOf Explains]

compression icon   How Does File Compression Work? [MakeUseOf Explains]We’ve all heard of file compression. Anyone who regularly downloads files from the web is familiar with formats like ZIP and RAR, and anyone who edits media files knows that compression is necessary to share images, music and videos on the web without using up all of your bandwidth. File compression is at the core of how the web works, you might argue, because it allows us to share files that would otherwise take too long to transfer. But how does it work?

It’s nothing magical, but it is the result of a lot of hard work by many very smart people. Let’s explore how file compression works by looking over the two main types of compression – lossless and lossy.

Just a warning – I’m going to oversimplify things here in an attempt to make this readable by non-math majors. Check out the linked-to Wikipedia articles for more depth, and Wikipedia’s sources for even more.

Lossless Compression

Lossless compression basically works by removing redundancy. What does that mean? Let’s simplify things. This stack of bricks will represent our data:

lossless before   How Does File Compression Work? [MakeUseOf Explains]

As you can see we’ve got two red bricks, five yellow and three blue. The simplest way to represent this is as you see above: the bricks themselves. But it’s not the only way I can represent this. I could also do this:

lossless after   How Does File Compression Work? [MakeUseOf Explains]

In the above image you can see the exact same information – two red, five yellow and three blue – but it takes up significantly less space. I’ve represented redundant bricks using numbers, meaning I need only three bricks to represent ten.

This gives you a rough idea how lossless compression is possible. Information that’s redundant is replaced with instructions telling the computer how much identical data repeats. Another simplified example:

fffffffuuuuuuuuuuuu

Can be “compressed” to:

f7u12

This is only one method of lossless compression, of course, but it points to how this is possible. Other math tricks are used, but the main thing to remember about lossless compression is that while space is temporarily saved, it is possible to reconstruct the original file entirely from the compressed one. If you see three bricks with numbers you know exactly how to make the stack. No information is lost, just as the name lossless implies.

Programs like WinZip are based on lossless compression. They remove this redundant information when you compress (or “zip”) the file and restore it when you uncompress (or “unzip”). Nothing is lost.

In the image world, PNG files also use lossless compression. This is why they offer a smaller file size for images with lots of uniform space: that redundant information is represented using instructions.

Of course, this is all an oversimplification, but it gets the basic point across. Read more about lossless compression on Wikipedia, if you’re interested.

Lossy Compression

Of course, there’s only so much you can accomplish using only lossless methods. Happily they’re not the only option: you can also simply remove information. This is called lossy compression, and it’s not as crazy as it sounds; in fact, you probably have many files on your computer made using lossy compression.

An MP3, for example. If you’re like most people your computer stores thousands of them for you, but did you know they don’t contain all of the audio information the original recording did? Some sounds, which humans cannot or can barely hear, are removed as part of the compression. The more you compress a file the more information is removed, which is why an overly compressed file will start to sound muddy.

Lossy compression tends to mostly be used for media files – pictures, sound and video. Using lossy compression for a text file would be problematic, as the resulting information would be garbled. It’s not always necessary for media files to include all the information, however.

Another example of lossy compression is the JPEG image. Generally speaking images seen on the web do not need to be as high-quality as images intended for printing. As such, you can remove a lot of redundant information in a web image, even if doing so would look awful printed.

Of course, repeatedly compressing a file using lossless methods decreases the quality – every time you do it more data is lost. Below is a photo I’ve compressed three times to demonstrate this:

compression jpg   How Does File Compression Work? [MakeUseOf Explains]

You can see from left to right how the quality decreases. It may not matter, depending on what the image will be used for, and that’s why lossy compression exists.

It’s important to remember that files compressed using lossy methods actually lose data, meaning you cannot recreate the original file from one compressed using lossy methods. It’s obvious when you think about it, but many printing projects have been ruined for lack of understanding this key point.

I’ve really only scratched the surface here, so please: read more about lossy compression on Wikipedia. It’s kind of fascinating.

Conclusion

Compression helped make the web what it is. In the days of dialup compressed images brought photos to our browser, at least not at an acceptable speed. Compressed video makes sites like YouTube possible, and anyone who uses file sharing networks is familiar with ZIP and RAR files.

Do you have anything to add? I’m sure I’ve missed some key points so educate me (and the other readers) in the comments below.

Image Credit: Spring image via Shutterstock

The comments were closed because the article is more than 180 days old.

If you have any questions related to what's mentioned in the article or need help with any computer issue, ask it on MakeUseOf Answers—We and our community will be more than happy to help.

27 Comments -

Roman Vávra

Nice explanation :)

Mike Merritt

Is a jpeg photo “lossy” or “lossless” ?? You say: “Another example of lossless compression is the JPEG image.” but you have it under the “lossy” header.

Joel Lee

I believe it’s a typo. JPEG is lossy.

Florin Ardelian

JPEG is loosy (lower quality), PNG is loosles (maintains quality).

Justin Pot

Oops! That’s a typo.

Mike Merritt

So – are you going to fix the typo in the article – or just leave it there to confuse future generations ???

Justin Pot

Funny story: I don’t actually have the ability to edit articles. I’ve made the editors aware, so it should be fixed soon.

Mike Merritt

Thanks – Done.

Manuel Guillermo López Buenfil

That was a quite nice explanation. I did know that jpg files were compressed, but I thought that png files weren’t, due to their bigger size. This actually creates a very nice comparison between lossy and lossless compression: a file usually becomes 5 times smaller when converted from png to jpg!

Eric Wilborn

Nice explanation. I’ll refer people to it instead of doing the work myself next time ;)

Craig Friday

Good explanation but the missing data in mp3s is vary notable if you know what to listen for

Justin Pot

Depends how much you compress it, but yes: you can hear it. You can see the compression in images and videos too, if you know what you’re looking for, but many decided the trade off is worth it.

Raj Sarkar

Thanks for this! :D

Harish Jonnalagadda

Had to learn this in college! Thanks for the article.

wickedbros

there are six yellow bricks :))

Igor Rizvi?

I thought i knew this,but now i see i was wrong…thanks

Alex Perkins

Thank you! Finally a simple explanation, I like how you showed it with the LEGO.

Jacob Mathew

I did not understand this in college.But now I do.

Sam Kar

Nice explanation Justin, I enjoyed reading. Thanks

I use WinRar- any idea what is the default compression offered (lossy/loss-less)?

Justin Pot

Winrar and software like it is completely lossless. It’s why the file is exactly the same as before once you extract it.

Grr

Thanks Justin.

Doc

“Another example of lossless compression is the JPEG image.” LOSSY.

Kaashif Haja

Good Examples!

Lisa Santika Onggrid

Clear explanation. Now I understand how those softwares give our files ‘magic slimming pills’.

Now is there any cloud storage that enable us to upload compressed folder, then choose just one file from that folder to download?

Justin Pot

If it exists I don’t know about it. I’ll come back here if I find anything.

Douglas Mutay

Clear as water. Thanks for illustration that make things more simple to understand.

Haider-Bakhsh Janyaro

Simplest definition i have ever seen.
Thanks bro