Lesson 5: Identifying People With Data
Students begin this lesson by investigating some of the world’s biggest data breaches to get a sense for how frequently data breaches happen within companies and organizations, and what kinds of data and information is lost or given up. Afterwards, students will use the Data Privacy Lab tool to investigate just how easily they could be uniquely identified with a few seemingly innocuous pieces of information. At the conclusion of the lesson, students will research themselves online to determine just how much someone could learn about them by conducting the same searches and “connecting the dots.”
While there are many potential benefits associated with the collection and analysis of large amounts of data, these advances pose a constant risk to our collective security and privacy. Large-scale data breaches mean that the details of our personal, professional, and financial lives may be at risk. In order to prevent personal data from being linked to an individual person, personally identifying information, such as name, address, or identification number, is often removed from publicly available data. Nevertheless, through the use of computational analysis, it is often possible to “re-identify” individuals within data, based on seemingly innocuous information. As more of our lives is digitized, questions of security and privacy become ever more prevalent.
Getting Started (10 mins)
Activity (30 mins)
Students will be able to:
- Explain privacy concerns that arise through the mass collection of data
- Use online search tools to find and connect information about a person or topic of interest.
- Explain how multiple sources of data can be combined in order to uncover new knowledge or information.
- Analyze the personal privacy and security concerns that arise with any use of computational systems.
- Familiarize yourself with the external web sites and tools involved in this lesson.
- KEY Research Yourself
Heads Up! Please make a copy of any documents you plan to share with students.
For the Students
Getting Started (10 mins)
Explore: World’s Biggest Data Breaches Visualization
Students will have been thinking primarily about the beneficial effects of collecting and analyzing data. This look at data breaches is intended to be a transition into a set of lessons exploring the potential harmful effects of collecting data, specifically with regards to privacy and security.
Direct students towards the external website: World's Biggest Data Breaches Visualization - Web Site (link in code studio). They should spend a couple minutes browsing through the different breaches there. Ask them to make a few notes about the following questions:
- What kind of data is being lost? And how much?
- What kinds of issues could arise from this data getting into the wrong hands?
Discuss: In small groups or as a class, have students share their findings. The main points to draw out from this conversation are:
- All kinds of personal data, from usernames to social security numbers and credit card information, is lost fairly regularly.
- This information can be used to steal money or identities, get access to classified information, blackmail people, etc.
We’ve spent a lot of time looking at potential benefits of collecting and analyzing data. As we’ve already seen today, however, there are some risks associated with collecting all of this information. If it falls into the wrong hands or is used in ways we didn’t intend, there may be serious risks imposed on our privacy or security. We’re going to start looking more deeply at this problem.
Activity (30 mins)
Data Privacy Lab: How easily can you be identified?
In the data breaches we just looked at, some fairly important pieces of information were stolen. Credit card numbers, passport information, or government security clearances are obviously not something we’d like to fall into the wrong hands. Other pieces of information, however, don’t seem that bad. So what if people know your ZIP code? So what if people know your birthday? This is information we usually share without a second thought.
If the Data Privacy Lab tool does not work as well for some individuals, they could try the birthday and ZIP code of a parent or close relative.
You should try to keep the Data Privacy Lab tool to 10-15 minutes. Make sure to keep in mind the main part of the activity is the second half when students research themselves.
- Direct students to the Data Privacy Lab - Web Site (scroll to the bottom of the page).
- Students should type in their information (birthday, ZIP code, and gender) to determine how many other people share those characteristics.
- In most instances, they will find that those three pieces of information can uniquely identify them.
- "Why is it significant that you are one of only a few people with your birthday, gender, and ZIP code? What concerns does this raise?"
In small groups or as a full class, students should discuss their responses. The main points to draw out from this conversation are:
- We can be uniquely identified from just a few pieces of information.
- Even information we would not normally consider to be “sensitive” can still be used to identify us.
- There are security and privacy concerns raised as a result of most information about us being available online.
As we just saw, there are security and privacy issues that are raised, even when small, seemingly unimportant pieces of information are available online. Most of the time, we don’t actually think about what kinds of information are available about us, or how someone might connect the dots with that information.
Research Yourself Online
Activity Guide - Research Yourself
Timing the Activity: Students should be given 15-20 minutes to research themselves, filling in their findings on the Activity Guide. This activity can likely grow or shrink to fit the time you have in your period, but leave time for a wrap-up discussion at the end of the class.
Alternate Version: Some students may not have extensive online presences. In these instances, you can ask students to research another member of the community (public official, business person, community leader, etc.) There should still be plenty of fodder for discussion later.
Distribute: Activity Guide - Research Yourself - Activity Guide. Students will work individually and will need access to a computer and the Internet. They will be asked to research themselves online, making note of any and all pieces of information they are able to find. Some guidelines follow:
They should focus their attention on information that is already publicly available (e.g., through a Google search, on the public pages of their school website, a social network, etc.)
- If students are prevented from accessing some sites on the school’s network, they should still list information they know is publicly available elsewhere.
- Students should try to make connections between the data they find. “If I knew this about me and that about me, then I’d also know …”
Class shares their findings
Time permitting it's very interesting to share findings.
Wrap Up Goal
This activity is aimed at opening the conversation about privacy and security in a highly personal way. The main goal of this closing discussion is just to share what students found about themselves.
Rather than a whole-class presentation you might put students in pairs or small groups. This would reduce both the time needed and potentially sensitive information being over-exposed.
- "What information were you able to find about yourself? Were you able to make connections in the data you collected to figure out anything else? Were you concerned about anything you were able to find?"
Discuss: Students should share their findings, either in small groups or as a class. The main points to draw out from this conversation are below:
- A great deal of information about us is freely and easily available online.
- By making connections in this data or to other sources of data. it is possible to form a more complete picture of who we are and what we do.
- There are security and privacy concerns raised by the data we post online about ourselves.
- See the examplar response to the worksheet. Available in the teacher only area lesson 3 in code studio.
- See other assessment items in code studio.
You may want to check out Chapter 2 of Blown to Bits, which goes into some depth about issues and concerns with data and privacy. In particular, pages 32-35 are related to this lesson.
It takes a bit more reading but the Data Privacy Lab project out of Harvard has another fascinating (and scary) project called The Data Map
- You could take a lot of the information there for more rigorous research into how data can be used to identify people
- AP Practice Response - Identify the Data Concern
- Student Overview
AP Practice - Identify the Data Concern
One component of the AP Explore Performance Task is describing a data concern related to a computing innovation.
2d. Using specific details, describe
Two of the responses below qualify as a data storage, privacy or security concern and two do not. Can you identify the two that do?
Response A: Facial recognition technology stores data mapping a user's face, for example to unlock a phone. A privacy concern for this technology is that governments could force technology companies to turn over this data allowing them to passively and continuously monitor the movements of its citizens without their knowledge or consent.
Response B: Software that tracks soccer player movements on the field can be used to generate new statistics that help value contributions of individual players. A data concern is that this information may be used to justify getting rid of less productive players.
Response C: Social networks allow users to share vast amounts of private information about their lives. A security concern of this technology is that this publicly available data may enable stalkers or other criminals to identify potential targets.
Response D: Self-driving vehicles store vast amounts of information about their location and the world around them. A data concern for the trucking industry is that all of this information could be coordinated to make trucks more efficient causing many people who drive trucks for a living to lose their jobs.
Here's the scoring guide for this part of the question
Choose the two (2) responses that are data concerns
Choose the two responses (A, B, C, or D) that would earn the point as data storage, security or privacy concern. Then justify why you chose them.
- Check Your Understanding
- (click tabs to see student view)
Consider the following scenario:
In order to dampen the effect of a potential data breach or accidental release of records a health care company has decided to remove a lot of personally identifiable information in its health records, like names, phone numbers and so on. In its place, along with all medical information, they plan to store ONLY the gender, age, and zip code of the patient.
Give your opinion: Is this health care company doing enough to protect the personal information of patients? If yes, explain why this is the best they can do. If no, explain what they should do instead. (Limit your response to a few sentences).
Computer Science Principles
3.2 - Computing facilitates exploration and the discovery of connections in information.
3.2.2 - Use large data sets to explore and discover information and knowledge. [P3]
- 3.2.2D - Maintaining privacy of large data sets containing personal information can be challenging.
3.3 - There are trade offs when representing information as digital data.
3.3.1 - Analyze how data representation, storage, security, and transmission of data involve computational manipulation of information. [P4]
- 3.3.1B - Security concerns engender tradeoffs in storing and transmitting information.
- 3.3.1F - Security and privacy concerns arise with data containing personal information.
7.3 - Computing has a global affect -- both beneficial and harmful -- on people and society.
7.3.1 - Analyze the beneficial and harmful effects of computing. [P4]
- 7.3.1G - Privacy and security concerns arise in the development and use of computational systems and artifacts.
- 7.3.1J - Technology enables the collection, use, and exploitation of information about, by, and for individuals, groups, and institutions.
- 7.3.1K - People can have instant access to vast amounts of information online; accessing this information can enable the collection of both individual and aggregate data that can be used and collected.
- 7.3.1L - Commercial and governmental curation of information may be exploited if privacy and other protections are ignored.
CSTA K-12 Computer Science Standards (2017)
IC - Impacts of Computing
- 3A-IC-24 - Evaluate the ways computing impacts personal, ethical, social, economic, and cultural practices.
- 3A-IC-29 - Explain the privacy concerns related to the collection and generation of data through automated processes that may not be evident to users.
- 3A-IC-30 - Evaluate the social and economic implications of privacy in the context of safety, law, or ethics.
NI - Networks & the Internet
- 3A-NI-05 - Give examples to illustrate how sensitive data can be affected by malware and other attacks.