Lesson 11: Structuring Data
In this lesson, students go further into the collection and interpretation of data, including cleaning and visualizing data. Students first look at the how presenting data in different ways can help people to understand it better, and they then create visualizations of their own data. Using the results of a preferred pizza topping survey, students must decide what to do with data that does not easily fit into the visualization scheme that they have chosen. Finally, students look at which parts of this process can be automated by a computer and which need a human to make decisions.
This lesson demonstrates that raw data must be interpreted in some way to help people use it to make decisions. Students engage in both visualization and cleaning of data, and they see how data can be misinterpreted if it is not cleaned properly. Students also experience working with data by hand and with computational tools, and they see how data must be structured in particular ways to be used by a computer.
Identify and remove irrelevant data from a data set.
As students clean their data in the digital activity, circulate and ask them about the choices that they are making. You may also use the discussion afterwards as a time for them to explain what data they identified as irrelevant.
Create a bar chart based on a set of data.
Activity Guide: The bar chart should be filled in. Answers may vary slightly, but should overall be approximately the same as in the exemplar.
Explain why a set of data must be cleaned before a computer can use it.
Activity Guide: Students should identify data that needs to be cleaned and explain why it is problematic in its raw form. You can also use the discussion afterwards to prompt students to give a more explicit explanation of why it is necessary.
Warm Up (5 mins)
Visualizing Data (70 mins)
Wrap Up (15 min)
Students will be able to:
- Identify and remove irrelevant data from a data set.
- Create a bar chart based on a set of data.
- Explain why a set of data must be cleaned before a computer can use it.
Heads Up! Please make a copy of any documents you plan to share with students.
For the Teachers
- Structuring Data - Exemplar
- Pizza Data (csv download) - Optional Resource
- Pizza Data (GSheets) - Optional Resource
For the Students
- Structuring Data - Activity Guide
Attention, teachers! If you are teaching virtually or in a socially-distanced classroom, please read the full lesson plan below, then click here to access the modifications.
Warm Up (5 mins)
It's possible to have students look at this level themselves, but they will be off the computer for the next activity, so it may be easier to display the level on the board.
Prompt Show Code Studio Level 2 at the front of the room.
Ask students to think for themselves for a moment, then discuss their answers with a partner.
Students should understand that different forms of data make it easier for people to make decisions. They should also see that people often do best with visuals, such as the bar chart, while computer do better with numbers, such as the table.
Discuss Have students share out their answers for the questions on the board.
Sometimes the "raw" data, the way the information is first collected, needs to be put in a different form so that humans and computers can more easily understand what it means.
Visualizing Data (70 mins)
Group students into pairs and give each pair a copy of the activity guide.
Read the instructions together as a class, ensuring that students understand the problem that they are trying to solve (choosing a pizza topping for the pizza party).
Students should see that there are several ways that answers might be difficult to categorize, whether they are completely irrelevant, not specific enough, or not a given choice. Ignore spelling for now if kids don't bring it up.
Students are asked to create the bar chart for the set of raw data given. Some of the answers will not easily fall into the given choices. Encourage students to use their best judgment on the answers that are difficult to put into the chart, and that these challenges are a normal part of the data problem solving process.
Discuss: After students finish making the chart and filling out the reflection questions, have students share their answers with the class.
You can also complete this activity using Google Sheets or Excel. The relevant spreadsheet files are linked in Level 3 of the online lesson or in the resource links area of this lesson plan.
We've made this chart by hand, but it's also possible for the computer to make it for us. This is especially useful when you have lots of data.
Send students to Lesson 11, Puzzle 3, and ask them to follow the instructions on the level.
Students should note that the computer used all the answers in the chart, even ones that were irrelevant. They should also note that different spellings of the same choice were not grouped together.
Prompt Ask students to discuss in pairs why the chart looks the way it does, then share their answers with the class. Why wasn't the computer able to put everything into the correct category?
When we created our charts, we knew that we needed to leave off some of the answers that didn't make sense, and that some answers, such as "peppers" and "green peppers", actually meant the same thing. We also put everything that had been misspelled into the correct category. Computers don't know how to do this, because they don't actually understand what a "pepper" is, or that a misspelled word is the same as a correctly spelled word. That means that we have to clean the data before the computer is able to use it.
Tell students that they will create a new column of "clean" data that will be easier for the computer to interpret.
Send students on to Puzzle 4.
Model Clicking on a topping in the "Cleaned Data" list and editing or deleting it. Demonstrate that when you delete/change answers in the clean data column, the chart automatically changes.
Ask students to finish in pairs, cleaning the data until only the seven original choices are shown, then decide which pizza topping is the best choice.
Students should notice that some data, such as minor misspellings, could be easily "cleaned" into an appropriate category, but that other data, such as "I will be absent", needed to be removed from the set entirely.
Prompt: What changes did you need to make to the data? Was there any that you just needed to throw away completely? Why?
In the end, students should realize that constraining a user's choices by using multiple choice rather than a write in answer makes it easier for a computer to use the data.
Prompt: This was a lot of work, and it was only about fifty votes. How much time do you think it would take to clean the data for a nationwide survey? Can you think of any ways to make sure that we got clean data from the beginning, to save us all of this work?
Allow students to discuss in pairs, then share out with the class.
When we work with large amounts of data, we want to automate as much of the problem soving process as we can. Because computers can't make the same connections that people can, that means that people have to help organize data in a way that computers can understand it. That means either cleaning the data, or collecting data in a way that makes sure it's clean when we get it.
Wrap Up (15 min)
Prompt: Have students reflect on their development of the five practices of CS Discoveries (Problem Solving, Persistence, Creativity, Collaboration, Communication). Choose one of the following prompts as you deem appropriate.
Choose one of the five practices in which you believe you demonstrated growth in this lesson. Write something you did that exemplified this practice.
Choose one practice you think you can continue to grow in. What’s one thing you’d like to do better?
Choose one practice you thought was especially important for the activity we completed today. What made it so important?
Best Class Pet
Here are three different ways to show the results of a vote for best class pet.
Which one makes it easiest for a human to make a decision about which pet is the most popular?
Which one makes it easiest for a computer to make a decision?
The pizza party data has been put into an app for you, and the answers from another class have been added. Because this is an app, we can automate the creation of the bar chart from the given data.
- Click "Run" to see the list of answers that the classes have given.
- Discuss with a partner what you think the chart of this data will look like.
- Click "Show Chart" to see for yourself.
The pizza party data has also been put into a spreadsheet for you, if you would like to use it instead. To use the spreadsheet, you'll need to make your own copy of it.
When people work with data, they know to leave off answers that don't make sense, and that some answers, such as "peppers", "pepppers" and "green peppers", actually meant the same thing. Computers don't know how to do this, so we have to clean the data before the computer is able to use it.
This version of the app has a second column for data to be cleaned.
- Click "Run" to see the new list of answers to be cleaned.
- Click on each answer that needs to be cleaned and correct it so that the computer will chart it properly. (You may want to delete some answers entirely.)
- When you are finished, click "Show Chart" to see the new chart of cleaned answers.
CSTA K-12 Computer Science Standards (2017)
DA - Data & Analysis
- 2-DA-08 - Collect data using computational tools and transform the data to make it more useful and reliable.