Pinterest Stumbleupon Whatsapp
Ads by Google

We’ve all heard of file compression. Anyone who regularly downloads files from the web is familiar with formats like ZIP and RAR, and anyone who edits media files knows that compression is necessary to share images, music and videos on the web without using up all of your bandwidth. File compression is at the core of how the web works, you might argue, because it allows us to share files that would otherwise take too long to transfer. But how does it work?

It’s nothing magical, but it is the result of a lot of hard work by many very smart people. Let’s explore how file compression works by looking over the two main types of compression – lossless and lossy.

Just a warning – I’m going to oversimplify things here in an attempt to make this readable by non-math majors. Check out the linked-to Wikipedia articles for more depth, and Wikipedia’s sources for even more.

Lossless Compression

Lossless compression basically works by removing redundancy. What does that mean? Let’s simplify things. This stack of bricks will represent our data:

how does file compression work

As you can see we’ve got two red bricks, five yellow and three blue. The simplest way to represent this is as you see above: the bricks themselves. But it’s not the only way I can represent this. I could also do this:

Ads by Google

how file compression works

In the above image you can see the exact same information – two red, five yellow and three blue – but it takes up significantly less space. I’ve represented redundant bricks using numbers, meaning I need only three bricks to represent ten.

This gives you a rough idea how lossless compression is possible. Information that’s redundant is replaced with instructions telling the computer how much identical data repeats. Another simplified example:

fffffffuuuuuuuuuuuu

Can be “compressed” to:

f7u12

This is only one method of lossless compression, of course, but it points to how this is possible. Other math tricks are used, but the main thing to remember about lossless compression is that while space is temporarily saved, it is possible to reconstruct the original file entirely from the compressed one. If you see three bricks with numbers you know exactly how to make the stack. No information is lost, just as the name lossless implies.

Programs like WinZip are based on lossless compression. They remove this redundant information when you compress (or “zip”) the file and restore it when you uncompress (or “unzip”). Nothing is lost.

In the image world, PNG files also use lossless compression. This is why they offer a smaller file size for images with lots of uniform space: that redundant information is represented using instructions.

Of course, this is all an oversimplification, but it gets the basic point across. Read more about lossless compression on Wikipedia, if you’re interested.

Lossy Compression

Of course, there’s only so much you can accomplish using only lossless methods. Happily they’re not the only option: you can also simply remove information. This is called lossy compression, and it’s not as crazy as it sounds; in fact, you probably have many files on your computer made using lossy compression.

An MP3, for example. If you’re like most people your computer stores thousands of them for you, but did you know they don’t contain all of the audio information the original recording did? Some sounds, which humans cannot or can barely hear, are removed as part of the compression. The more you compress a file the more information is removed, which is why an overly compressed file will start to sound muddy.

Lossy compression tends to mostly be used for media files – pictures, sound and video. Using lossy compression for a text file would be problematic, as the resulting information would be garbled. It’s not always necessary for media files to include all the information, however.

Another example of lossy compression is the JPEG image. Generally speaking images seen on the web do not need to be as high-quality as images intended for printing. As such, you can remove a lot of redundant information in a web image, even if doing so would look awful printed.

Of course, repeatedly compressing a file using lossy methods decreases the quality – every time you do it more data is lost. Below is a photo I’ve compressed three times to demonstrate this:

how does file compression work

You can see from left to right how the quality decreases. It may not matter, depending on what the image will be used for, and that’s why lossy compression exists.

It’s important to remember that files compressed using lossy methods actually lose data, meaning you cannot recreate the original file from one compressed using lossy methods. It’s obvious when you think about it, but many printing projects have been ruined for lack of understanding this key point.

I’ve really only scratched the surface here, so please: read more about lossy compression on Wikipedia. It’s kind of fascinating.

Conclusion

Compression helped make the web what it is. In the days of dialup compressed images brought photos to our browser, at least not at an acceptable speed. Compressed video makes sites like YouTube possible, and anyone who uses file sharing networks is familiar with ZIP and RAR files.

Do you have anything to add? I’m sure I’ve missed some key points so educate me (and the other readers) in the comments below.

Image Credit: Spring image via Shutterstock

  1. Martin Pereda
    August 27, 2016 at 5:57 pm

    que onda, no le entendi, auiida por favor

  2. Di
    August 17, 2016 at 2:52 pm

    Thanks for the article! Loved it. Ehmm... one comment on loseless compression: two red, five yellow and three blue is LOSSY since it lost one yellow brick (There are 6!) tc! ;)

  3. Magicrafter13 Gaming
    August 9, 2016 at 5:49 am

    I know some compression works like this:
    it will find common strings (example: 01001100 [Yes I know the compressors don't see it as binary just let me explain]) or what it finds to be common, and will change that into say a 7 or some unicode character or basically something that the software/compression software recognizes and when the file is being rebuilt will change that 7 back into the example string. So in theory using lossless compression methods that do this, could you keep compressing a file using those different methods, aka: file > file.zip > file.zip.rar > file.zip.rar.7z > file.zip.rar.7z.tar and etc (I don't know if those ones are lossless or not but I'm just making an example.) would the file get smaller every time? Of course it would, right? So basically I'm asking is that possible, one of the main reasons I had this question is because sometimes I download files that are .tar.gz so does that mean they compressed the .tar into a .gz? Or am I just stupid. (Hopefully not that one).

    -Sorry it's a long question, just something that's been bothering me.

    • C. Lupus
      August 9, 2016 at 10:12 pm

      What you're talking about doesn't put headers and metadata into the equation. Compression is basically a tradeoff so everytime you compress a file you are adding a certain amount of metadata to it.

      So depending on the size of that header, your compressed file could end up bigger than the original file if compressed too many times.

      It all depends on the algorithms used.

  4. Magicrafter13
    August 9, 2016 at 5:37 am

    I know some compression works like this:
    it will find common strings (example: 01001100 [Yes I know the compressors don't see it as binary just let me explain]) or what it finds to be common, and will change that into say a 7 or some unicode character or basically something that the software/compression software recognizes and when the file is being rebuilt will change that 7 back into the example string. So in theory using lossless compression methods that do this, could you keep compressing a file using those different methods, aka: file > file.zip > file.zip.rar > file.zip.rar.7z > file.zip.rar.7z.tar and etc (I don't know if those ones are lossless or not but I'm just making an example.) would the file get smaller every time? Of course it would, right? So basically I'm asking is that possible, one of the main reasons I had this question is because sometimes I download files that are .tar.gz so does that mean they compressed the .tar into a .gz? Or am I just stupid. (Hopefully not that one).

    -Sorry it's a long question, just something that's been bothering me.

  5. Magicrafter13
    August 9, 2016 at 5:30 am

    This isn't really a question about the article but I know this much some (maybe all but I think just most) compression methods involve finding common strings, what I mean is if a certain block of code is really common, anyway they find those and replace them with a smaller block, possibily even one character that the compression software recognizes so that when it is reconstructed it replaces that block/single character with whatever it's representing. I know that explanation might not make sense so tell me if it doesn't, but anyway my question is, theoretically if you compressed a file with say .zip, then compressed the .zip into a .rar, then compressed that into a .7z and so on into other different formats that use the method I just mentioned above would each compression help make the file smaller (and we are only talking about Lossless compressors because obviously lossy would be smaller since you said it yourself, it loses data.)

    -Sorry for this long question, it's just something that's been on my mind. the main reason is sometimes I download files that are like .tar.gz or something so I assume the .tar was compressed using .gz

    • tomysshadow
      October 23, 2016 at 4:06 am

      Actually, tar files don't offer any compression. The gz does all the work of compressing it. Tar just contains the gz as is with no compression.

  6. L Brock
    August 1, 2016 at 1:16 am

    Good simple explanation.

  7. Ashwath
    July 24, 2016 at 10:44 pm

    Beautifully Written! Loved the way you described it!

  8. Arun
    July 7, 2016 at 9:27 am

    Very Simple.Awesome......
    thanx...

  9. pearl
    July 5, 2016 at 12:12 pm

    Very clear, helpful and interesting. Thanks a lot

    • superjim1000JimjIM
      July 28, 2016 at 2:41 am

      You stop that

      • Bebazled_chipmonkey
        July 28, 2016 at 2:54 am

        Ding dong, phr33 b8 plox

        wow! how could I ever eat all this b8!

        Also R8 me thread

  10. puppy0cam
    June 3, 2016 at 4:19 am

    what about using division to make the character amount go down by getting the data into the zeroes and ones and then you use division on it to bring it to a smaller value.

  11. Jason
    June 1, 2016 at 8:19 pm

    Great examples, I was always curious how ZIP files could compress so much data.

  12. Mohammad Sharaf Ali
    May 21, 2016 at 12:26 pm

    Straight and to the point. Really enjoyed while reading the article.

  13. Stuart
    April 7, 2016 at 11:02 pm

    Simple but brilliant. You've made the complicated sound simple. Not an easy thing to do! Well done.

  14. Anthony
    March 17, 2016 at 8:11 pm

    An excellent explanation. Thanks!

    • Justin Pot
      March 22, 2016 at 2:07 pm

      I'm glad it was useful!

  15. ankit sarode
    February 6, 2016 at 5:01 pm

    nice explanation :)

    • Justin Pot
      February 6, 2016 at 5:14 pm

      I'm glad it was useful for you!

  16. Grayson Cash
    February 2, 2016 at 9:34 pm

    thanks man. great explanation.

  17. Steve Yeldon
    January 25, 2016 at 3:14 pm

    There are 6 yellow bricks.

    • Justin Pot
      January 25, 2016 at 3:26 pm

      Compression errors happen.

      • James
        March 9, 2016 at 11:23 am

        "I see what you did there."

      • Varun
        April 8, 2016 at 8:11 pm

        well said lol

    • Grayson Cash
      February 2, 2016 at 9:35 pm

      true.

  18. Mary
    January 10, 2016 at 6:58 pm

    Thank you for the simplicity (compression?) in this information. I am using it for an entirely different application!

    • foobar
      March 26, 2016 at 11:56 pm

      Funny stuff m8

  19. Nachos ho
    December 6, 2015 at 8:52 am

    I have to say,thanks, humour is the line. Needing to cross it is apparent from what I read but i cannot add two more than enough of the same thing blah blah. F7U12 beast

    • Justin Pot
      December 6, 2015 at 4:24 pm

      Umm...thanks? I think? I'm not sure what you're trying to say but I'm glad you stopped by.

      • aneesh joshi
        March 6, 2016 at 1:43 pm

        i think he used lossy compression on that...

  20. Pooya
    November 16, 2015 at 10:11 pm

    awesome explanation,
    please correct this part :
    Of course, repeatedly compressing a file using lossless methods decreases the quality.
    its about lossy compressing not lossless.

    • Justin Pot
      November 16, 2015 at 11:22 pm

      Three years later with a great correction, thanks so much!

  21. Parvaiz Ahmad
    August 31, 2015 at 10:02 am

    Wonderful explanation.
    Before doing Compression on my site i used to upload compressed images that i used to compress from http://CompressPic.com. Now i have used Gzip Compression script that does compression on runtime. How that works, here i found my answer. It is really awesome.
    Good Job.

  22. achu unnikrishnan
    July 20, 2015 at 6:26 pm

    Awesome explanation bro. Thank you. :)

  23. Haider-Bakhsh Janyaro
    November 20, 2012 at 4:19 am

    Simplest definition i have ever seen.
    Thanks bro

  24. Douglas Mutay
    October 31, 2012 at 11:07 am

    Clear as water. Thanks for illustration that make things more simple to understand.

  25. Lisa Santika Onggrid
    October 14, 2012 at 11:28 am

    Clear explanation. Now I understand how those softwares give our files 'magic slimming pills'.

    Now is there any cloud storage that enable us to upload compressed folder, then choose just one file from that folder to download?

    • Justin Pot
      October 14, 2012 at 1:48 pm

      If it exists I don't know about it. I'll come back here if I find anything.

    • puppy0cam
      June 3, 2016 at 4:22 am

      doing that would mean uncompressing the entire thing, so what your asking is to upload a compressed file and then the cloud decompresses it and then stores the contents.

  26. Kaashif Haja
    October 14, 2012 at 2:31 am

    Good Examples!

  27. Doc
    October 13, 2012 at 9:08 pm

    "Another example of lossless compression is the JPEG image." LOSSY.

  28. Sam Kar
    October 13, 2012 at 12:58 pm

    Nice explanation Justin, I enjoyed reading. Thanks

    I use WinRar- any idea what is the default compression offered (lossy/loss-less)?

    • Justin Pot
      October 13, 2012 at 1:50 pm

      Winrar and software like it is completely lossless. It's why the file is exactly the same as before once you extract it.

      • Grr
        October 13, 2012 at 2:56 pm

        Thanks Justin.

  29. Jacob Mathew
    October 13, 2012 at 6:01 am

    I did not understand this in college.But now I do.

  30. Alex Perkins
    October 12, 2012 at 5:47 pm

    Thank you! Finally a simple explanation, I like how you showed it with the LEGO.

  31. Igor Rizvi?
    October 12, 2012 at 1:30 pm

    I thought i knew this,but now i see i was wrong...thanks

  32. wickedbros
    October 12, 2012 at 1:16 pm

    there are six yellow bricks :))

  33. Harish Jonnalagadda
    October 12, 2012 at 12:15 pm

    Had to learn this in college! Thanks for the article.

  34. Raj Sarkar
    October 12, 2012 at 9:39 am

    Thanks for this! :D

  35. Craig Friday
    October 12, 2012 at 5:13 am

    Good explanation but the missing data in mp3s is vary notable if you know what to listen for

    • Justin Pot
      October 13, 2012 at 1:51 pm

      Depends how much you compress it, but yes: you can hear it. You can see the compression in images and videos too, if you know what you're looking for, but many decided the trade off is worth it.

  36. Eric Wilborn
    October 12, 2012 at 2:54 am

    Nice explanation. I'll refer people to it instead of doing the work myself next time ;)

  37. Manuel Guillermo López Buenfil
    October 11, 2012 at 11:59 pm

    That was a quite nice explanation. I did know that jpg files were compressed, but I thought that png files weren't, due to their bigger size. This actually creates a very nice comparison between lossy and lossless compression: a file usually becomes 5 times smaller when converted from png to jpg!

  38. Mike Merritt
    October 11, 2012 at 8:45 pm

    Is a jpeg photo "lossy" or "lossless" ?? You say: "Another example of lossless compression is the JPEG image." but you have it under the "lossy" header.

    • Joel Lee
      October 11, 2012 at 9:00 pm

      I believe it's a typo. JPEG is lossy.

    • Florin Ardelian
      October 11, 2012 at 9:12 pm

      JPEG is loosy (lower quality), PNG is loosles (maintains quality).

    • Justin Pot
      October 13, 2012 at 1:52 pm

      Oops! That's a typo.

      • Mike Merritt
        October 15, 2012 at 9:00 pm

        So - are you going to fix the typo in the article - or just leave it there to confuse future generations ???

        • Justin Pot
          October 16, 2012 at 6:16 pm

          Funny story: I don't actually have the ability to edit articles. I've made the editors aware, so it should be fixed soon.

        • Mike Merritt
          October 19, 2012 at 3:32 pm

          Thanks - Done.

  39. Roman Vávra
    October 11, 2012 at 8:23 pm

    Nice explanation :)

Leave a Reply

Your email address will not be published. Required fields are marked *