What is File Compression?

Clamp and floppy disc

You can squash computer files to take up less space

File compression is the practice of writing (saving) files so that they take up less space on the storage medium than would otherwise be the case (whether the storage medium is a hard drive, USB flash drive, cloud storage or whatever).

How does it work?

Let’s just take one simple example. Supposing that the file contains 12 consecutive occurrences of the letter “A” – ie “AAAAAAAAAAAA”. This “string” of 12 characters would normally require 12 bytes of storage space (one for each character). Now let’s suppose that instead of saving this as one byte per character we save this as “12A”. In that case, we would have reduced the storage requirement to two or three bytes. This is a simplification, but I think it’s good enough to give an idea of what we mean by compression. For an accessible, but slightly fuller, explanation of file compression, click this link to How Stuff Works.

7Zip logoLossless Compression

If the file to be compressed is, for instance, a program file, then it is essential that when it is opened (or “run”) then every piece of information in it has to be exactly the same as before it was compressed. All of the information that was present in the original file must be present again when the file is run. This is known as “lossless compression”. As it happens, most program files are probably “optimised” for size, so you probably wouldn’t gain much by compressing them.

Lossy Compression

Imagine a photograph that has a large area that is all one colour. We have already seen that, using compression, we do not have to record the colour of every dot (pixel). Instead, we can just record the number of pixels and the colour (plus the location of the pixels in the picture, of course). Now imagine that that same area of photograph is not one colour but a mixture of three shades of a colour that are so close to being the same shade that the human eye can barely (if at all) see the difference. We can reduce the file size more by assuming that all those colours are the same than by recording the three different shades that we started with.

So, the compression is higher (ie the resulting file is smaller), but we have lost some of the information in the file. We have lost those different shades that we think we don’t need. This is known as “lossy compression”. We have compressed the file more than lossless compression would have achieved, but we’ve lost some information in the process. In practice, when saving a file in a lossy format we are often given the option of deciding where to fix the trade-off between size and quality of the saved image.

Other Costs of Compression

As well as potentially losing some information in the file, there are other costs to compressing files – such as :

  • Time costs – it takes a definite amount of time to compress a file as it is being saved and a definite amount of time to de-compress it ready for use
  • Flexibility – if a file has been compressed using a particular program or technique, then that technique has to be available for the de-compression

File CompressionSo, how do you decide?

There’s a trade-off between the size of the file (on the one hand) and, on the other hand, the costs of compressing it – loss of information, time costs, flexibility. This can get even more complicated when you consider that there might a time-benefit in compressing a file (eg before uploading it to a cloud storage account), but a time-cost in performing the compression (eg by manually compressing a file using a format such as a “zip file”).

Clearly, there’s not room or time to go into all the details and implications here. I’m just attempting to give some idea of what file compression means and the factors that might come into play when deciding whether or not to compress a file.

However, I’ll mention just one type of compressed file as it is ubiquitous in all computer areas – the “jpg” file (pronounced “jay peg”). This is the most common format for saving photographs. It is a compressed format. Not only that, it’s also “lossy”. Whenever you save a file in a “jpg” format, the saving process will attempt to see if the file size can be reduced by dropping some information. So, if you edit a jpg file (eg by cropping it or removing red-eye) and then save it and close it, and then open it and edit it some more, and then save it, and then edit it……….you are gradually reducing the quality of the image. So, if you have to work in jpg format (because you don’t know any other way), then try to do all your editing in one session.

There is not much point in trying to compress a jpg further by, for instance, creating a zip file of it. The process of zipping the file won’t compress it much. Of course, you may wish to zip jpg files so that you can place many files into one zip file, but that’s a different matter for another day…

7-Zip is a file archiver with a high compression ratio.

See this previous blog post for more on digital image file formats.