Unit 2 - Digital Information (Last update: May 2016)
Chapter 2: Manipulating and Visualizing Data
- What is the relationship between data, information and knowledge?
- What are the best ways to find, see, and extract meaningful trends and patterns from raw data?
- Where and how does human bias affect the collection, processing and interpretation of data?
- 1.3 Computing can extend traditional forms of human expression and experience.
- 3.1 People use computer programs to process information to gain insight and knowledge.
- 3.2 Computing facilitates exploration and the discovery of connections in information.
- 3.3 There are trade offs when representing information as digital data.
- 7.1 Computing enhances communication, interaction, and cognition.
- 7.3 Computing has a global affect -- both beneficial and harmful -- on people and society.
Unplugged , External Tools , Individual and Group DiscoveryIn this kickoff to the Data Unit, students begin thinking about how data is collected and what can be learned from it. To begin the lesson, students will take a short online quiz that supposedly determines something interesting or funny about their personality. Afterwards they will brainstorm other sources of data in the world around them, leading to a discussion of how that data is collected. This discussion motivates the introduction of the Class Data Tracker project that will run through the second half of this unit. Students will take the survey for the first time and be shown what the results will look like. To close the class, students will make predictions of what they will find when all the data has been collected in a couple weeks.
External Tools , Research , PresentationStudents use the Google Trends tool in order to visualize historical search data. They will need to identify interesting trends or patterns in their findings and will attempt to explain those trends, based on their own experience or through further research online. Afterwards, students will present their findings to ensure they are correctly identifying patterns in a visualization and are providing plausible explanations of those patterns.
Research , Class DiscussionThis lesson asks students to consider carefully the assumptions they make when interpreting data and data visualizations. The class begins by examining how the Google Flu Trends project tried and failed to use search trends to predict flu outbreaks. They will then read a report on the Digital Divide which highlights how access to technology differs widely by personal characteristics like race and income. This report challenges a widespread assumption that data collected online is representative of the population at large. To practice identifying assumptions in data analysis, students are provided a series of scenarios in which data-driven decisions are made based on flawed assumptions. They will need to identify the assumptions being made (most notably those related to the digital divide) and explain why these assumptions lead to incorrect conclusions.
Analyzing Artifacts , Group Discovery , Class DiscussionThis is a pretty fun lesson that has two main parts. First students warm up by reflecting on the reasons data visualizations are used to communicate about data. This leads to the main activity in which students look at some collections of (mostly bad) data visualizations, rate them, explain why a good one is effective, and also suggest a fix for a bad one.
External Tools , Individual Skill Building , TutorialNow that students have had the chance to see and evaluate various data visualizations, they will learn to make visualizations of their own. This lesson teaches students how to build visualizations from provided datasets. The levels in Code Studio provide a detailed walkthrough of how to use Google Sheets to create several different kinds of charts. While this lesson focuses on the Google Sheets tool, other tools may be substituted at the teacher’s discretion, and MS Excel support is coming soon to the lesson.
External Tools , Collaborative Artifact Creation , WritingIn this lesson, students will collaboratively investigate some datasets and use visualization tools to “discover a data story.” The lesson assumes that students know how to use some kind of visualization tool - in the previous lesson we used the charting tools of a basic spreadsheet program. Students should be working with a partner but without much teacher hand-holding. Most of the time should be spent with students poking around the data and trying to discover connections and trends using data visualization tools. It is up to them to discover a trend, make a chart, and accurately write about it.
External Tools , Analyzing , Group Skill BuildingIn this lesson, students begin working with the data that they have been collecting since the first lesson of the chapter in the class “data tracker.” They are introduced to the first step in analyzing data: cleaning the data. Students will follow a guide in Code Studio, which demonstrates the common techniques of filtering and sorting data to familiarize themselves with its contents. Then they will correct errors they find in the data by either hand-correcting invalid values or deleting them. Finally they will categorize any free-text columns that were collected to prepare them for analysis. This lesson introduces many new skills with spreadsheets and reveals the sometimes subjective nature of data analysis.
External Tools , Artifact Creation , AnalyzingIn this lesson students learn how create their own summary tables from raw data. A summary table typically represents one or more aggregations (groupings of items) and computations that are performed on the raw dataset. In most spreadsheet programs, a summary table is called a pivot table. In the lesson, students learn how to make pivot tables in Google Sheets using a provided dataset. Then students turn to the data they’ve collected as a class and, with their partner, use pivot tables to investigate it further.
Practice PT , External Tools , Artifact CreationFor this Practice PT students will analyze the data that they have been collecting as a class in order to demonstrate their ability to discover, visualize, and present a trend or pattern they find in the data. Leading up to this lesson, students will have been working in pairs to clean and summarize their data. Students should complete this project individually but can get feedback on their ideas from their data-cleaning partner.
The lessons in this chapter often have two things going on at once. In the background the class is daily collecting some data about themselves (the “class data tracker project”) in order to accumulate data to process later on.
In the interim students are learning about and developing skills with spreadsheet and visualization tools. The teacher should connect the skills students are learning in the exercises to potential things they might do with the class data. The pedagogical “insight” behind the data tracker project is that because the students themselves are the subject of the data, and that they collected it themselves, students will have some natural intuitions about interesting avenues for investigation. We want to build toward the enduring understanding that (3.2) Computing facilitates exploration and the discovery of connections in information.
The tasks students perform in these lessons are done from the computer scientist’s perspective, looking at such things like making sure that data types match the ways we anticipate computing on them (don’t collect text when you need a number), cleaning data after it is inevitably “dirtied” and then performing some aggregations and visualizations to look for patterns. Along the way we need to understand how human bias can be introduced at each step so that we can accurately convey what any patterns in the data are or are not telling us. These activities help build toward the enduring understanding that (3.3) There are trade offs when representing information as digital data.