Optional Lesson: Hard Problems - Traveling Salesperson Problem

Overview

In this lesson, students examine a classic problem in computer science, the Traveling Salesperson Problem (TSP). Students solve small instances of the problem, try to formulate algorithms to solve it, and discuss why these algorithms take a long time for computers (and humans) to compute. Students see how the TSP grows in size much faster than the problem of adding characters to a password. Even though we use encryption to motivate a desire to learn about computationally hard problems, they are valuable to know about, in and of themselves. This lesson covers some territory about how we reason formally and mathematically about algorithms and figuring out how “hard” something is for a computer to do.

Purpose

In this final algorithm detour, students are introduced to the Traveling Salesperson Problem, a classic problem in computer science for which there is no known algorithm to solve it, other than brute force. The number of possible solutions grows extremely fast, even for small inputs, and quickly becomes “unreasonable” to solve, making it a computationally hard problem. The ideas of computationally hard problems are leveraged for encryption to make ciphers that take an unreasonable amount of time to crack (as in thousands of trillions of years), but computationally hard problems are also important in their own right. There are many problems for which we wish we had reasonable algorithmic solutions - especially in medical fields - and we’re still on the hunt to find them. No one has yet mathematically proven whether or not the problems we currently think are “hard” actually are.

Agenda

Getting Started

Activity

Extended Learning

Assessment

Wrap-up

View on Code Studio

Objectives

Students will be able to:

  • Describe the TSP.
  • Explain why TSP is computationally hard.
  • Solve the TSP on small graphs.
  • Connect the properties of computationally hard problems with desirable properties for encryption.

Preparation

  • Note that Wrap-up for this lesson is more extensive than usual. Some preparation to familiarize yourself with the ideas is recommended.
  • (Optional) Copies of the Worksheet for each student

Links

Heads Up! Please make a copy of any documents you plan to share with students.

For the Teachers

For the Students

Teaching Guide

Getting Started

Recall: Say: In a previous lesson we looked at the problem of trying to crack passwords and learned it would take a computer a very long time to try every possible combination of letters. We learned that the longer a password is, the harder it is to crack. This is because every character you add multiplies the number of possible passwords by the number of possibilities for that character. That’s pretty fast growth.

Say: Computationally Hard Problems are problems that force a computer to run through many possibilities to find the right answer. In cryptography, our desire is to have an encryption function that is easy compute if you have the key, but really hard if you don’t.

Transition: Today we’re going to look at one of the most famous problems that is thought to be computationally hard. We’ll see if we can find an algorithm to solve it, and along the way, we’ll get a better sense of what “computationally hard” really means.

Activity

Introduce the traveling salesperson problem.

Distribute the Introduction to the Traveling Salesperson Problem - Worksheet.

  • Give individual students time to work on finding the shortest routes for the small examples in the worksheet.
  • Make sure that students to compare their answers.
    • If two partners disagree, they should resolve it to find the shortest tour.

Teaching Tips

Note: This problem at first might seem like the Shortest Path problem we looked at in a previous lesson, but it’s actually a much, much harder problem to solve with an algorithm. In fact there is no known algorithm to solve the TSP, other than by “brute force” -- trying every single possible tour of the nodes and then picking the one that gives the shortest possible tour.

If you ask students how they arrived at their solutions they might say, “I tried a bunch of different paths and kept track of the shortest.” Or some form of greedy algorithm: “I want to avoid that large edge” or “I want to use that small edge.” These are good strategies, but they can always be thwarted with the question, “How do you know for sure it’s the shortest?”

Discuss.

Goal: The goal of the discussion is to bring out the fact that this would be a much harder problem for computers to solve than the shortest path problem or minimum spanning tree problem. “Hard” means that there is no known way to find the correct answer without generating ALL possible answers. Connect this problem to our desire to understand problems that are computationally hard for computers for encryption.

Prompts:

  • What makes this problem harder than the shortest path problem?
    • First of all you need to make a closed path (route) of all vertices without revisiting vertices. This is a harder thing to do than just finding a path from one vertex to another. Just constructing a valid route is difficult.
    • Second, you don’t know where to start. Any edge could be part of the shortest tour. And once you pick an edge to start with, you still have no way to choose the next one. There is no way to eliminate possibilities from the start or as you go.
  • What kinds of algorithms did you think about in trying to solve the problem?
    • Students might suggest strategies that leverage what they know about solving the minimum spanning tree and shortest path problems from previous lessons, such as starting with the least-cost edge in the graph, and then “greedily” connecting nodes by picking subsequent least-cost edges.
    • This is a good heuristic and might yield a tour that is better than the worst possible tour, but there is no guarantee it’s the best.
    • The only algorithm that is known to find the absolute shortest tour every time is brute force, generating every possible tour and comparing them.

Extended Learning

  • If you do a quick Google search on the Traveling Salesperson Problem, you will find many many interesting resources. For example, here is a two-minute video that shows a nice visualization of some heuristics for TSP that give pretty good tours, but still not optimal, because there are just too many possibilities to consider.

Assessment

Questions:

1) Describe what it means for a problem to be “computationally hard.”

2) What strategies do people use to solve large computationally hard problems?

3) Why are computationally hard problems important in encryption strategies?

Wrap-up

Teaching Tips

This is a fairly extended wrap-up that asks the teacher to lead students through some reasoning about the traveling salesperson problem. Some preparation is recommended.

The entire section below could serve as a script for the session.

Say:

If I told you that we could make an encryption scheme that was as hard to crack as it is to solve the traveling salesperson problem, would you feel reasonably assured of your security?

To answer that, let’s figure out just how hard TSP is.

Explain Exponential Growth - How many possible tours are there?

Many people for many years have tried to design an algorithm that works in all cases to find the exact optimum route for the traveling salesperson problem, with no success. So there is currently no algorithm to find the exact answer besides “brute force,” trying all the possibilities.

Since we have to calculate the distances of all the tours anyway, we might as well count how many tours there are in the first place, just so we know how many calculations we have to perform.

For this demonstration, let’s also consider a version of the problem faced by, say, an airline. For an airline, there are cities all over the world, and you can fly from any one directly to any other. So for an airline, there is a possible path that connects every possible node; we say the graph is “fully connected.”

Let’s see how the problem grows very quickly, by constructing all the possible tours for larger and larger numbers of nodes.

Let’s start with 3 nodes. There is only one possible tour. Easy.

With 4 nodes, there are only 3 possible tours. Not bad.

Here’s where it starts to get crazy. With 5 nodes there are 12 possible tours. How can you figure this out? The way shown below was to reason that a 5th node could be added between each pair of nodes for every possible 4-node tour. Here, the actual distances don’t matter because we’re just trying to count the number of tours.

Below is another view of the same thing. In this version we took each 4-node tour and “broke” one of the edges, connecting it to the new 5th node. Just a different way to think about it.

For 6 nodes we could repeat the process: add a node to every possible edge of the 12 solutions 5-node solutions. Since each of 12 solutions above has 5 edges, it means we have 12*5 = 60 tours!

Notice how fast this grows: between a graph with 3 nodes and a graph of 6 nodes, our total number of possible tours grew from 1 to 60. This problem grows very quickly!

With just 10 nodes, this grows to about 181,440 possible tours.

With just 26 nodes, this grows to a 25-digit number: 7,755,605,021,665,492,992,000,000. (By comparison, the width of the entire observable universe(!!!!) in miles is roughly a 25-digit number.)

With 100 nodes you’re up to a roughly a 155-digit number.

The math: With n nodes there are (n-1)!/2 (“n-1 factorial divided by 2”) possible tours. n! (or “n factorial”) is n(n-1)(n-2)(n-3)...(2)(1). So for example 10! = 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1. You can confirm the formula with the examples above: 5 nodes = (5-1)! / 2 = 12.

Reasonable or Unreasonable Time

Computers work fast, but they have limits. In computer science, we have an actual mathematical hard line between reasonable and unreasonable runtimes.

"Reasonable" means the number of things the computer has to do is proportional to the size of the input to the problem. For example, the Minimum Spanning Tree and Shortest Path Problems are “reasonable” because they had algorithms that solved them by considering every edge in the graph once. The amount of time it takes is proportional to the number of edges. If the number of edges is n, even if there was an algorithm that had to look at the edge n^2 times, or n^3 times, that’s still reasonable.

“Unreasonable” means the number of things the computer has to do grows as an exponent of the size of the input. So if you discovered that an algorithm made the computer do 2^n things, that’s not reasonable, because it means every time the size of the input (n) gets bigger, the solution gets massively further out of reach. n! is another running time that is considered unreasonable. In real life, “unreasonable” problems would take a modern computer trillions and trillions of years to churn through all the possibilities.

So the brute force solution to TSP is unreasonable -- at least as far as we know! In fact, it's an open question how to solve this problem efficiently. If anyone finds a solution that runs in a reasonable time, a lot of security and encryption algorithms are in trouble, because many are based on the fact that this (and related problems) are unreasonable to solve.

So now we know what “hard” is for a computer. And in the next lesson, we’ll discuss how to leverage hard problems like this one to create public keys that are tough for computers to crack.

View on Code Studio

Unit 4: Lesson 1 - What is Big Data?

Background

The explosion in computing power over the last several decades has led to a similarly astounding growth in the amount of data that is transmitted, collected, and analyzed every day. Nearly every industry and field is making use of data to better understand what is happening around them and develop new and creative solutions to the problems they face. As the scale of data gets bigger and bigger, new techniques and tools are necessary to be able to meaningfully make use of it.

Vocabulary

Click to enlarge.
  • Moore's Law: the observation that computing power roughly doubles every two years.
  • Big Data: a broad term for datasets so large or complex that traditional data processing applications are inadequate.

Lesson

  • Video - Big Data is Better Data
  • Where do we use data? Where does it come from?
  • Explore a data visualization and identify its sources

Resources

Continue

  • Levels
  • 2
  • 3
  • (click tabs to see student view)
View on Code Studio

Teaching Tip

Moore's Law is discussed at length in the lesson plan as a lead-in to the "Big Data Sleuth" activity.

Moore's Law - a predication made by Gordon Moore in 1965 that computing power will double every 1.5-2 years, it has remained more or less true ever since.

For purposes of the AP exam a looser definition is supported - Moore's Law: the observation that computing power roughly doubles every two years.

Student Instructions

View on Code Studio

Teaching Tip

The typical threshold for "Big Data" is when common desktop tools and techniques for handling, processing and manipulating data are not enough. This is not a hard and fast definition, but the important note is that what makes it "big" is not only a function of the number of bytes, but how it needs to be processed as well.

Student Instructions

Standards Alignment

View full course alignment

CSTA K-12 Computer Science Standards (2011)

CT - Computational Thinking
  • CT.L3B:1 - Classify problems as tractable, intractable, or computationally unsolvable.
  • CT.L3B:2 - Explain the value of heuristic algorithms to approximate solutions for intractable problems.
  • CT.L3B:3 - Critically examine classical algorithms and implement an original algorithm.
  • CT.L3B:4 - Evaluate algorithms by their efficiency, correctness, and clarity.

Computer Science Principles

4.2 - Algorithms can solve many but not all computational problems.
4.2.1 - Explain the difference between algorithms that run in a reasonable time and those that do not run in a reasonable time. [P1]
  • 4.2.1A - Many problems can be solved in a reasonable time.
  • 4.2.1B - Reasonable time means that as the input size grows, the number of steps the algorithm takes is proportional to the square (or cube, fourth power, fifth power, etc.) of the size of the input.
  • 4.2.1C - Some problems cannot be solved in a reasonable time, even for small input sizes.
  • 4.2.1D - Some problems can be solved but not in a reasonable time. In these cases, heuristic approaches may be helpful to find solutions in reasonable time.
4.2.3 - Explain the existence of undecidable problems in computer science. [P1]
  • 4.2.3A - An undecidable problem may have instances that have an algorithmic solution, but there is no algorithmic solution that solves all instances of the problem.
4.2.4 - Evaluate algorithms analytically and empirically for efficiency, correctness, and clarity. [P4]
  • 4.2.4A - Determining an algorithm’s efficiency is done by reasoning formally or mathematically about the algorithm.
  • 4.2.4B - Empirical analysis of an algorithm is done by implementing the algorithm and running it on different inputs.
  • 4.2.4C - The correctness of an algorithm is determined by reasoning formally or mathematically about the algorithm, not by testing an implementation of the algorithm.