Lesson 5: Lossy vs. Lossless Compression
Students learn the difference between lossy and lossless compression by experimenting with a simple lossy compression widget for compressing text. Students then research three real-world compressed file formats to fill in a research guide. Throughout the process they review the skills and strategies used to research computer science topics online, in particular to cope with situations when they don't have the background to fully understand everything they're reading (a common situation even for experienced CS students).
The first goal of this lesson is straightforward: understand what lossy compression is and when/why it might be used. Students should see a number of examples of this distinction throughout the lesson and should leave the lesson being able to describe the relative benefits of each.
The second goal of this lesson is to build up students' research skills both for the project they will complete in the next lesson and for the Explore PT at the end of the year. Students will need practice finding reliable sources, reading technical articles, and synthesizing information. The teacher's role in calling out the skills being used, not merely the facts being found, is significant.
Getting Started (15 mins)
Activity (40 mins)
Wrap-up (5 mins)
Students will be able to:
- Explain the difference between lossy and lossless compression.
- Explain the relative benefits or drawbacks of different file formats, particularly in terms of how they compress information.
- Identify reliable sources of information when doing research
- Explain the difference between open source and licensed software.
- Prepare print or digital copies of File Formats and Compression Activity Guide
For the Teacher
- KEY - File Formats and Compression - Answer Key
For the Students
- Lossless Compression - a data compression algorithm that allows the original data to be perfectly reconstructed from the compressed data.
- Lossy Compression - (or irreversible compression) a data compression method that uses inexact approximations, discarding some data to represent the content. Most commonly seen in image formats like .jpg.
Getting Started (15 mins)
Goal: As you circulate first make sure that all groups try typing their own text into the app. They should see that the app keeps the first character of every word but then removes all vowels.
During the share out aim to hear from multiple groups. If you ask leading questions you might direct the conversation towards the following points.
- The text is certainly "smaller" than before so at least it might be compression.
- This is different from what they saw with the text compression widget though. Some information is being entirely thrown away and if you only had the compressed text there's no way to get back the original text.
- That said, you can usually still read the text. The meaning hasn't been lost even if some parts of the original message could never be perfectly recovered.
- The word "lossy" probably has something to do with the fact that some letters (information) are entirely lost here.
Your goal in this discussion isn't that students arrive at a definition. You're aiming for all students to see that this type of compression is different from what they've seen before and prep them all for the following remarks.
Quick Discovery: Lossy Text Compression
Prompt: With a partner, go to the Lossy Text Compression App - App Lab. Use it for a couple minutes, discuss with your partner what's happening, and then answer the following questions.
- Should this “count” as text compression? Why or why not?
- What do you think the word “lossy” refers to?
Discuss: Give pairs of students a couple minutes to explore the app. Then ask discuss what they are seeing. Finally ask them to write down brief responses to the two prompts. Once all pairs have had a chance to write responses have a few groups share opinions with the groups.
Lossless vs. Lossy Compression
This is an example of compression since the total number of bits needed to represent the message is reduced. As we've seen, though, it's different than the compression we saw with the text compression widget.
The text compression widget uses "lossless" compression. This means that it is possible to reverse the compression and recover the original information (message) in its entirety. This is what the dictionary was for.
The text compression we just saw is "lossy" compression. This means that it isn't possible to perfectly reverse the compression and recover the original (message). Some of it is lost forever. If you saw the word “fd” it could be “food”, “feed”, “feud”, or “fad”. You might be able to guess what it was supposed to be, but there’s no algorithm that will always give you the original message.
Vocabulary: Display these definitions
- Lossless Compression: a data compression algorithm that allows the original data to be perfectly reconstructed from the compressed data.
- Lossy Compression: (or irreversible compression) a data compression method that uses inexact approximations, discarding some data to represent the content. Most commonly seen in image formats like .jpg.
Goal: This prompt serves two roles. It first should get students a chance to practice using the vocabulary you just introduced. Use their responses to judge whether students understand the distinction between the two types of compression. If necessary review the two before continuing to the main activity.
Secondly it prompts students to begin thinking critically about the tradeoffs of the two compression types which they will be doing for the remainder of the lesson.
Prompt: Have students briefly discuss the prompt below with a neighbor.
- When you use lossy compression you lose the ability to decompress your information and get back a perfect copy. Even so, people use lossy compression all the time. Can you think of reasons or situations where someone would still use lossy compression?
Discuss: Have pairs briefly discuss with one another. Circulate and listen to ideas as they're discussed. Then have a couple groups share with the whole room.
Transition: We’ve been looking at image file formats. And we’ve also seen text compression. Both of those attempted to render perfectly every piece of information.
Both the image file format and the text compression scheme we used were lossless. Lossy compression schemes usually take advantage of the fact that a human is supposed to interpret the data at the other end, and human brains are good at filling the gaps when information is missing.
Activity (40 mins)
Formats in Everyday Life: The file extension you often see on a file (for example: myPhoto.jpg) is really just an indicator to the computer of how the underlying bits are organized, so the computer can interpret them. If you change the name of the file to myPhoto.gif, that does not magically change the underlying bits; all you’ve done is confuse the computer. It won’t be able to open the file because it will attempt to interpret the file as a GIF when really the bits are in JPG format.
Group: Place students in pairs. Each group will only need a single computer.
Distribute: File Formats and Compression - Activity Guide to each group.
Rapid Research - Compression Formats
JPG: Ask all groups to research the answers to the first column in the activity guide together. In particular remind them to keep track of their sources of information.
Share: Have pairs share their answers with another pair. If there are disagreements have them discuss with one another the sources of their information and try to arrive at a shared solution.
Discuss: Share the solution to the first column as a class. Resolve any discrepancies. Afterwards briefly acknowledge the following points.
- JPG is lossy compression. Many images on the web are converted to JPG. Odds are if you've seen a grainy or pixelated image on the Internet, it's a JPG.
- Lossy compression like this results in a lower quality image with fewer details, but as humans we can still tell what's in the picture. Especially in instances where you don't need a high quality image and bandwidth is limited JPG makes a lot of sense.
- This is a free and open format, but even so there have been disputes about its ownership.
Researching CS Topics: When researching a computer science topic (e.g. as students will have to do on the Explore PT) it is important to remember that this is a unique and separate skill. It often requires students to read text that they don't completely understand or which uses advanced vocabulary. At other times they'll need to consult sources typically considered unreliable like online forums or Wikipedia. Part of the goal of this lesson is calling out the fact that this is a separate skill, practicing the individual component, and preparing students to do it more independently in the following lesson.
Record Research Strategies: Throughout this activity consider writing and recording strategies that students are using in order to conduct their research. This could be on a poster, the board, or some kind of shared digital notes. Students are going to have another chance to research formats in the following lesson, so an important goal in this lesson is highlighting how students are succeeding at doing this research, not just the information they're finding.
Prompt: Let's think for a minute about how you are doing your research. What kinds of sources are you finding? How are you actually reading these sources?
Discuss: Have students briefly share their thoughts at their tables before discussing as a class. Use the discussion to lead into to the comments below.
Conducting research online about CS topics is a skill. Often it means hunting through articles you don't completely understand for the key pieces of information that you do. Other subjects like history or math may have classic agreed upon texts but in the world of CS you'll often end up on Wikipedia or online forums. This is ok. We're going to keep working on this skill of reading technical articles. If you sometimes are confused that's ok and entirely normal. There is no one on earth who understands everything about CS with how large a subject it is and how quickly it's changing. When you're doing research as a computer scientist it usually means sticking with it even if a lot of the content doesn't make sense at first.
MP3 and PNG: Have pairs fill in the MP3 and PNG columns of the table.
Share: Once both columns have been filled out have pairs share their results across the table again.
Discuss: Share out the results of the research with the whole class, reviewing the results.
Wrap-up (5 mins)
Goal: This discussion is primarily a way to check that students have understood the difference between lossy and lossless compression. Lossy compression is fine if you just need a "good enough" version of something and care about saving space or bandwidth. For example, if you don't want to use up the data plan on your phone you would use compressed images.
Prompt: Lossy compression seems to be "worse" than lossless compression but obviously both are being used all the time. Write down three reasons or situations where someone would be willing to use lossy compression even though it means some loss of quality.
Discuss: Students should silently list their responses, then share with a neighbor, then finally discuss with the entire class. Have multiple pairs or tables share their responses.
Today we accomplished a lot. We explored the difference between lossy and lossless compression, we practiced reading and researching a CS topic, and we learned a little bit about the complicated world of file formats! Next time we'll close out this unit by digging a little deeper on all of these topics.
When to use lossy compression?
In general, what are reasons that someone would choose to use lossy compression? Write a brief response below including at least one example of a situation when lossy compression would be appropriate.
CSTA K-12 Computer Science Standards (2011)
CD - Computers & Communication Devices
- CD.L2:4 - Use developmentally appropriate, accurate terminology when communicating about technology.
CL - Collaboration
- CL.L2:3 - Collaborate with peers, experts and others using collaborative practices such as pair programming, working in project teams and participating in-group active learning activities.
CT - Computational Thinking
- CT.L2:7 - Represent data in a variety of ways including text, sounds, pictures and numbers.
- CT.L3A:6 - Analyze the representation and trade-offs among various forms of digital information.
Computer Science Principles
3.3 - There are trade offs when representing information as digital data.
3.3.1 - Analyze how data representation, storage, security, and transmission of data involve computational manipulation of information. [P4]
- 3.3.1A - Digital data representations involve trade offs related to storage, security, and privacy concerns.
- 3.3.1C - There are trade offs in using lossy and lossless compression techniques for storing and transmitting data.
- 3.3.1D - Lossless data compression reduces the number of bits stored or transmitted but allows complete reconstruction of the original data
- 3.3.1E - Lossy data compression can significantly reduce the number of bits stored or transmitted at the cost of being able to reconstruct only an approximation of the original data.
- 3.3.1G - Data is stored in many formats depending on its characteristics (e.g., size and intended use)
7.3 - Computing has a global affect -- both beneficial and harmful -- on people and society.
7.3.1 - Analyze the beneficial and harmful effects of computing. [P4]
- 7.3.1F - Open source and licensing of software and content raise legal and ethical concerns.
- 7.3.1Q - Open source and free software have practical, business, and ethical impacts on widespread access to programs, libraries, and code.
7.5 - An investigative process is aided by effective organization and selection of resources. Appropriate technologies and tools facilitate the accessing of information and enable the ability to evaluate the credibility of sources.
7.5.2 - Evaluate online and print sources for appropriateness and credibility [P5]
- 7.5.2A - Determining the credibility of a soruce requires considering and evaluating the reputation and credentials of the author(s), publisher(s), site owner(s), and/or sponsor(s).
- 7.5.2B - Information from a source is considered relevant when it supports an appropriate claim or the purpose of the investigation
CSTA K-12 Computer Science Standards (2017)
AP - Algorithms & Programming
- 3A-AP-20 - Evaluate licenses that limit or restrict use of computational artifacts when using resources such as libraries.
DA - Data & Analysis
- 2-DA-09 - Refine computational models based on the data they have generated.
- 3A-DA-09 - Translate between different bit representations of real-world phenomena, such as characters, numbers, and images.