We’ve all heard of file compression. Anyone who regularly downloads files from the web is familiar with formats like ZIP and RAR, and anyone who edits media files knows that compression is necessary to share images, music and videos on the web without using up all of your bandwidth. File compression is at the core of how the web works, you might argue, because it allows us to share files that would otherwise take too long to transfer. But how does it work?

It’s nothing magical, but it is the result of a lot of hard work by many very smart people. Let’s explore how file compression works by looking over the two main types of compression – lossless and lossy.

Just a warning – I’m going to oversimplify things here in an attempt to make this readable by non-math majors. Check out the linked-to Wikipedia articles for more depth, and Wikipedia’s sources for even more.

## Lossless Compression

Lossless compression basically works by removing redundancy. What does that mean? Let’s simplify things. This stack of bricks will represent our data:

As you can see we’ve got two red bricks, five yellow and three blue. The simplest way to represent this is as you see above: the bricks themselves. But it’s not the only way I can represent this. I could also do this:

In the above image you can see the exact same information – two red, five yellow and three blue – but it takes up significantly less space. I’ve represented redundant bricks using numbers, meaning I need only three bricks to represent ten.

This gives you a rough idea how lossless compression is possible. Information that’s redundant is replaced with instructions telling the computer how much identical data repeats. Another simplified example:

`fffffffuuuuuuuuuuuu`

Can be “compressed” to:

`f7u12`

This is only one method of lossless compression, of course, but it points to how this is possible. Other math tricks are used, but the main thing to remember about lossless compression is that while space is temporarily saved, it is possible to reconstruct the original file entirely from the compressed one. If you see three bricks with numbers you know exactly how to make the stack. No information is lost, just as the name lossless implies.

Programs like WinZip are based on lossless compression. They remove this redundant information when you compress (or “zip”) the file and restore it when you uncompress (or “unzip”). Nothing is lost.

In the image world, PNG files also use lossless compression. This is why they offer a smaller file size for images with lots of uniform space: that redundant information is represented using instructions.

Of course, this is all an oversimplification, but it gets the basic point across. Read more about lossless compression on Wikipedia, if you’re interested.

## Lossy Compression

Of course, there’s only so much you can accomplish using only lossless methods. Happily they’re not the only option: you can also simply remove information. This is called lossy compression, and it’s not as crazy as it sounds; in fact, you probably have many files on your computer made using lossy compression.

An MP3, for example. If you’re like most people your computer stores thousands of them for you, but did you know they don’t contain all of the audio information the original recording did? Some sounds, which humans cannot or can barely hear, are removed as part of the compression. The more you compress a file the more information is removed, which is why an overly compressed file will start to sound muddy.

Lossy compression tends to mostly be used for media files – pictures, sound and video. Using lossy compression for a text file would be problematic, as the resulting information would be garbled. It’s not always necessary for media files to include all the information, however.

Another example of lossy compression is the JPEG image. Generally speaking images seen on the web do not need to be as high-quality as images intended for printing. As such, you can remove a lot of redundant information in a web image, even if doing so would look awful printed.

Of course, repeatedly compressing a file using lossy methods decreases the quality – every time you do it more data is lost. Below is a photo I’ve compressed three times to demonstrate this:

You can see from left to right how the quality decreases. It may not matter, depending on what the image will be used for, and that’s why lossy compression exists.

It’s important to remember that files compressed using lossy methods actually lose data, meaning you cannot recreate the original file from one compressed using lossy methods. It’s obvious when you think about it, but many printing projects have been ruined for lack of understanding this key point.

I’ve really only scratched the surface here, so please: read more about lossy compression on Wikipedia. It’s kind of fascinating.

## Conclusion

Compression helped make the web what it is. In the days of dialup compressed images brought photos to our browser, at least not at an acceptable speed. Compressed video makes sites like YouTube possible, and anyone who uses file sharing networks is familiar with ZIP and RAR files.

Do you have anything to add? I’m sure I’ve missed some key points so educate me (and the other readers) in the comments below.

Image Credit: Spring image via Shutterstock

1. L
July 2, 2017 at 12:58 am

2. nat
April 18, 2017 at 1:32 pm

LOVED the Article

3. Vineetha
February 25, 2017 at 7:50 am

Superb

4. 6Block
February 3, 2017 at 1:45 pm

There's 6 Yellow blocks not 5.

• Zack milner
April 28, 2017 at 1:40 pm

Shut up

• definitely6
May 25, 2017 at 2:25 am

good catch

• Joshua
July 19, 2017 at 10:25 pm

Exactly my observation!

5. Carol Farmer
January 27, 2017 at 11:26 pm

Ridiculous! No help at all! Just alot of blah blah blah!

• hckkkker6969
February 3, 2017 at 2:31 pm

this is why nobody wants to be your friend carrot

• ye xd
June 1, 2017 at 3:48 am

lol carrot nice

January 12, 2017 at 10:32 pm

Very nice explanation

January 12, 2017 at 10:31 pm

Very nice way of explaining

8. Martin Pereda
August 27, 2016 at 5:57 pm

que onda, no le entendi, auiida por favor

• hello_world
December 11, 2016 at 5:49 am

puto

• Joshua
July 19, 2017 at 10:25 pm

ivela bekitta ninge? yaargu sumne artha aagalla

9. Di
August 17, 2016 at 2:52 pm

Thanks for the article! Loved it. Ehmm... one comment on loseless compression: two red, five yellow and three blue is LOSSY since it lost one yellow brick (There are 6!) tc! ;)

10. Anonymous
August 9, 2016 at 5:49 am

I know some compression works like this:
it will find common strings (example: 01001100 [Yes I know the compressors don't see it as binary just let me explain]) or what it finds to be common, and will change that into say a 7 or some unicode character or basically something that the software/compression software recognizes and when the file is being rebuilt will change that 7 back into the example string. So in theory using lossless compression methods that do this, could you keep compressing a file using those different methods, aka: file > file.zip > file.zip.rar > file.zip.rar.7z > file.zip.rar.7z.tar and etc (I don't know if those ones are lossless or not but I'm just making an example.) would the file get smaller every time? Of course it would, right? So basically I'm asking is that possible, one of the main reasons I had this question is because sometimes I download files that are .tar.gz so does that mean they compressed the .tar into a .gz? Or am I just stupid. (Hopefully not that one).

-Sorry it's a long question, just something that's been bothering me.

• C. Lupus
August 9, 2016 at 10:12 pm

So depending on the size of that header, your compressed file could end up bigger than the original file if compressed too many times.

It all depends on the algorithms used.

• anon
March 30, 2017 at 7:00 pm

I have repeatedly compressed files with lossless methods and the file stops getting any smaller. That tells me I have reached the limit for that method (lossless). Another method might find more compression, but it is clearly less and less effective.

11. Magicrafter13
August 9, 2016 at 5:37 am

I know some compression works like this:
it will find common strings (example: 01001100 [Yes I know the compressors don't see it as binary just let me explain]) or what it finds to be common, and will change that into say a 7 or some unicode character or basically something that the software/compression software recognizes and when the file is being rebuilt will change that 7 back into the example string. So in theory using lossless compression methods that do this, could you keep compressing a file using those different methods, aka: file > file.zip > file.zip.rar > file.zip.rar.7z > file.zip.rar.7z.tar and etc (I don't know if those ones are lossless or not but I'm just making an example.) would the file get smaller every time? Of course it would, right? So basically I'm asking is that possible, one of the main reasons I had this question is because sometimes I download files that are .tar.gz so does that mean they compressed the .tar into a .gz? Or am I just stupid. (Hopefully not that one).

-Sorry it's a long question, just something that's been bothering me.

• Joshua
July 19, 2017 at 10:35 pm

I am glad it's bothering you :)

12. Magicrafter13
August 9, 2016 at 5:30 am

This isn't really a question about the article but I know this much some (maybe all but I think just most) compression methods involve finding common strings, what I mean is if a certain block of code is really common, anyway they find those and replace them with a smaller block, possibily even one character that the compression software recognizes so that when it is reconstructed it replaces that block/single character with whatever it's representing. I know that explanation might not make sense so tell me if it doesn't, but anyway my question is, theoretically if you compressed a file with say .zip, then compressed the .zip into a .rar, then compressed that into a .7z and so on into other different formats that use the method I just mentioned above would each compression help make the file smaller (and we are only talking about Lossless compressors because obviously lossy would be smaller since you said it yourself, it loses data.)

-Sorry for this long question, it's just something that's been on my mind. the main reason is sometimes I download files that are like .tar.gz or something so I assume the .tar was compressed using .gz

October 23, 2016 at 4:06 am

Actually, tar files don't offer any compression. The gz does all the work of compressing it. Tar just contains the gz as is with no compression.

• Joshua
July 19, 2017 at 10:35 pm

I am glad you apologized for the long question :)

13. L Brock
August 1, 2016 at 1:16 am

Good simple explanation.

14. Ashwath
July 24, 2016 at 10:44 pm

Beautifully Written! Loved the way you described it!

15. Arun
July 7, 2016 at 9:27 am

Very Simple.Awesome......
thanx...

16. pearl
July 5, 2016 at 12:12 pm

Very clear, helpful and interesting. Thanks a lot

• superjim1000JimjIM
July 28, 2016 at 2:41 am

You stop that

• Bebazled_chipmonkey
July 28, 2016 at 2:54 am

Ding dong, phr33 b8 plox

wow! how could I ever eat all this b8!

• Joshua
July 19, 2017 at 10:34 pm

I am glad you ate all this :)

• Joshua
July 19, 2017 at 10:34 pm

17. puppy0cam
June 3, 2016 at 4:19 am

what about using division to make the character amount go down by getting the data into the zeroes and ones and then you use division on it to bring it to a smaller value.

18. Jason
June 1, 2016 at 8:19 pm

Great examples, I was always curious how ZIP files could compress so much data.

May 21, 2016 at 12:26 pm

Straight and to the point. Really enjoyed while reading the article.

• Joshua
July 19, 2017 at 10:34 pm

I am glad you enjoyed :)

20. Stuart
April 7, 2016 at 11:02 pm

Simple but brilliant. You've made the complicated sound simple. Not an easy thing to do! Well done.

21. Anthony
March 17, 2016 at 8:11 pm

An excellent explanation. Thanks!

• Justin Pot
March 22, 2016 at 2:07 pm

22. ankit sarode
February 6, 2016 at 5:01 pm

nice explanation :)

• Justin Pot
February 6, 2016 at 5:14 pm

I'm glad it was useful for you!

23. Anonymous
February 2, 2016 at 9:34 pm

thanks man. great explanation.

24. Steve Yeldon
January 25, 2016 at 3:14 pm

There are 6 yellow bricks.

• Justin Pot
January 25, 2016 at 3:26 pm

Compression errors happen.

• James
March 9, 2016 at 11:23 am

"I see what you did there."

• Varun
April 8, 2016 at 8:11 pm

well said lol

• Anonymous
February 2, 2016 at 9:35 pm

true.

• Joshua
July 19, 2017 at 10:33 pm

I am glad you found it to be truthful :)

25. Mary
January 10, 2016 at 6:58 pm

Thank you for the simplicity (compression?) in this information. I am using it for an entirely different application!

• foobar
March 26, 2016 at 11:56 pm

Funny stuff m8

• Joshua
July 19, 2017 at 10:28 pm

I am glad you find it funny

26. Nachos ho
December 6, 2015 at 8:52 am

I have to say,thanks, humour is the line. Needing to cross it is apparent from what I read but i cannot add two more than enough of the same thing blah blah. F7U12 beast

• Justin Pot
December 6, 2015 at 4:24 pm

Umm...thanks? I think? I'm not sure what you're trying to say but I'm glad you stopped by.

• aneesh joshi
March 6, 2016 at 1:43 pm

i think he used lossy compression on that...

• Joshua
July 19, 2017 at 10:28 pm

I am glad you thought I used lossy compression :)

27. Pooya
November 16, 2015 at 10:11 pm

awesome explanation,
Of course, repeatedly compressing a file using lossless methods decreases the quality.
its about lossy compressing not lossless.

• Justin Pot
November 16, 2015 at 11:22 pm

Three years later with a great correction, thanks so much!

28. Anonymous
August 31, 2015 at 10:02 am

Wonderful explanation.
Before doing Compression on my site i used to upload compressed images that i used to compress from http://CompressPic.com. Now i have used Gzip Compression script that does compression on runtime. How that works, here i found my answer. It is really awesome.
Good Job.

• Joshua
July 19, 2017 at 10:29 pm

I am glad you found it awesome :)

29. Anonymous
July 20, 2015 at 6:26 pm

Awesome explanation bro. Thank you. :)

30. Haider-Bakhsh Janyaro
November 20, 2012 at 4:19 am

Simplest definition i have ever seen.
Thanks bro

31. Douglas Mutay
October 31, 2012 at 11:07 am

Clear as water. Thanks for illustration that make things more simple to understand.

32. Lisa Santika Onggrid
October 14, 2012 at 11:28 am

Clear explanation. Now I understand how those softwares give our files 'magic slimming pills'.

Now is there any cloud storage that enable us to upload compressed folder, then choose just one file from that folder to download?

• Justin Pot
October 14, 2012 at 1:48 pm

If it exists I don't know about it. I'll come back here if I find anything.

• puppy0cam
June 3, 2016 at 4:22 am

doing that would mean uncompressing the entire thing, so what your asking is to upload a compressed file and then the cloud decompresses it and then stores the contents.

33. Kaashif Haja
October 14, 2012 at 2:31 am

Good Examples!

34. Doc
October 13, 2012 at 9:08 pm

"Another example of lossless compression is the JPEG image." LOSSY.

35. Sam Kar
October 13, 2012 at 12:58 pm

Nice explanation Justin, I enjoyed reading. Thanks

I use WinRar- any idea what is the default compression offered (lossy/loss-less)?

• Justin Pot
October 13, 2012 at 1:50 pm

Winrar and software like it is completely lossless. It's why the file is exactly the same as before once you extract it.

• Grr
October 13, 2012 at 2:56 pm

Thanks Justin.

36. Jacob Mathew
October 13, 2012 at 6:01 am

I did not understand this in college.But now I do.

• Joshua
July 19, 2017 at 10:29 pm

I am glad you do :)

37. Alex Perkins
October 12, 2012 at 5:47 pm

Thank you! Finally a simple explanation, I like how you showed it with the LEGO.

38. Igor Rizvi?
October 12, 2012 at 1:30 pm

I thought i knew this,but now i see i was wrong...thanks

• Joshua
July 19, 2017 at 10:32 pm

I am glad you're wrong :)

39. wickedbros
October 12, 2012 at 1:16 pm

there are six yellow bricks :))

• Joshua
July 19, 2017 at 10:30 pm

I am glad you found it :)

October 12, 2012 at 12:15 pm

Had to learn this in college! Thanks for the article.

41. Raj Sarkar
October 12, 2012 at 9:39 am

Thanks for this! :D

42. Craig Friday
October 12, 2012 at 5:13 am

Good explanation but the missing data in mp3s is vary notable if you know what to listen for

• Justin Pot
October 13, 2012 at 1:51 pm

Depends how much you compress it, but yes: you can hear it. You can see the compression in images and videos too, if you know what you're looking for, but many decided the trade off is worth it.

43. Eric Wilborn
October 12, 2012 at 2:54 am

Nice explanation. I'll refer people to it instead of doing the work myself next time ;)

• Joshua
July 19, 2017 at 10:30 pm

I am glad you'll refer to this :)

44. Manuel Guillermo López Buenfil
October 11, 2012 at 11:59 pm

That was a quite nice explanation. I did know that jpg files were compressed, but I thought that png files weren't, due to their bigger size. This actually creates a very nice comparison between lossy and lossless compression: a file usually becomes 5 times smaller when converted from png to jpg!

45. Mike Merritt
October 11, 2012 at 8:45 pm

Is a jpeg photo "lossy" or "lossless" ?? You say: "Another example of lossless compression is the JPEG image." but you have it under the "lossy" header.

• Joel Lee
October 11, 2012 at 9:00 pm

I believe it's a typo. JPEG is lossy.

• Florin Ardelian
October 11, 2012 at 9:12 pm

JPEG is loosy (lower quality), PNG is loosles (maintains quality).

• Justin Pot
October 13, 2012 at 1:52 pm

Oops! That's a typo.

• Mike Merritt
October 15, 2012 at 9:00 pm

So - are you going to fix the typo in the article - or just leave it there to confuse future generations ???

• Justin Pot
October 16, 2012 at 6:16 pm

Funny story: I don't actually have the ability to edit articles. I've made the editors aware, so it should be fixed soon.

• Mike Merritt
October 19, 2012 at 3:32 pm

Thanks - Done.

• Joshua
July 19, 2017 at 10:31 pm

I am glad you found it :)

46. Roman Vávra
October 11, 2012 at 8:23 pm

Nice explanation :)