Lesson 7: Encoding and Sending Formatted Text

Overview

In this lesson, students are first introduced to the standard number-to-text encoding scheme used in computers and on the Internet known as ASCII encoding. Students will invent a communication protocol that uses only plain text ASCII characters to encode fancier formatting for text such as fonts, colors, sizes, etc. Students will demonstrate their protocol by using the Internet Simulator to send an encoded message to a partner, who must correctly interpret the formatting and draw the result on a piece of paper.

Purpose

This lesson gives a glimpse into "coding languages" by having students invent a way to use plain ASCII text to encode other text. At this point we really begin to see how layers upon layers of encodings - all tracing back to binary - work together to encode complex information.

We also want to make a connection to the Internet and protocols. Information traveling across the Internet will often need to contain both the contents of the message, itself, and information that helps to format, route, or interpret this data.

Developing means for differentiating between these two types of information has led to the creation of a number of ubiquitous protocols and languages. HTML (short for HyperText Markup Language) is the language in which the content and formatting of a web page are written. And the protocol HTTP, or HyperText Transfer Protocol, is another ASCII-based protocol that is the foundation of communication on the web - it was designed to send and receive web page data over the internet. In both cases, plain ASCII text is imbued with deeper meaning through the development of well-defined protocols.

Agenda

Getting Started (15 mins)

Activity

Wrap-up

Assessment

Extended Learning

View on Code Studio

Objectives

Students will be able to:

  • Describe the ASCII encoding scheme.
  • Design/invent a protocol for sending formatted text using the Internet Simulator.
  • Invent a text formatting language.
  • Explain the connection between binary and more complex encodings of formatted text

Preparation

  • (Optional) Poster Paper
  • Markers or Crayons
  • Section prepped to use Internet Simulator in Code Studio.

Links

Heads Up! Please make a copy of any documents you plan to share with students.

For the Teachers

For the Students

Vocabulary

  • ASCII - American Standard Code for Information Interchange; the universally recognized raw text format that any computer can understand
  • code - (v) to write code, or to write instructions for a computer.
  • Protocol - A set of rules governing the exchange or transmission of data between devices.

Teaching Guide

Getting Started (15 mins)

Remarks

In previous lessons we explored how to encode numbers in binary, and you also developed protocols for sending a list of numbers. Today we’re going to take that method one step further and look at how we can encode text with a binary representation. Hopefully you are beginning to realize that if we can figure out a way to represent information as a set of numbers, then we can encode it in bits and store that information in a computer or send it over the Internet.

Teaching Tip

Possible encoding

  • A conventional way to do this is to make a simple mapping of the 26 letters of the alphabet to numbers, something similar to the example on the right. This is not, however, the only solution.
  • Students should have the freedom to invent a text-encoding scheme however they like. Some students may wish to give common phrases or words their own sequences of bits rather than letter-to-number mappings.
  • If students are moving quickly encourage them to add more functionality. How will their protocols encode punctuation, capital letters, special characters?

Think Pair Share - How would you encode text?

Prompt:

"One of the most powerful uses of the internet is sending text to people. Since the internet can only send bits around we need a way to encode text with bits..."

"If it were up to you, how would you encode text in binary? Quickly, jot down an idea for encoding text."

  • Provide students a couple of minutes to write their thoughts.
  • Emphasize that they do not need to actually specify every detail of the scheme but merely need to outline its structure.

Discussion Goal

Points to draw out in the discussion:

  • Most probably invented a scheme that mapped letters of the alphabet and other text characters to numbers.
  • There are tradeoffs in deciding how many bits per character you think you need. How much do students' suggestions vary?
  • In order to transmit text data we're all going to have to agree on an encoding scheme (foreshadow ASCII)

Discuss

Have students compare and contrast encoding schemes with their neighbor first, then open up the discussion to the class. Here are a few prompts:

  • Did you and your neighbor come up with the exact same idea? What was different?
  • How many bits does your encoding scheme require? For example, how many bits would you need to say "hello"?
  • Did you account for anything besides the letters of alphabet (or whole words)?
  • Of the encoding schemes mentioned so far, which one is "best"? why?

Introduce the ASCII encoding scheme.

Remarks

You just invented your own scheme for encoding text with numbers. It turns out that there is a standard encoding for most of the symbols you can type on an American keyboard.

That encoding is called the American Standard Code for Information Interchange or ASCII (pronoucned: “Ask-ee”).

Teaching Tip

Time permitting you might send students off to do their own "rapid research" on ASCII and report back. You should be able to draw out the same points.

  • Project or share an ASCII table for your students. Here’s an example chart.
  • Then present the key components of this encoding scheme.

    • ASCII codes were originally 7 bits long and so there are 128 possible values.
    • 0-31 are “control characters” that are largely defunct and go unused; they were formerly used to control various aspects of machines and printers.
    • 32-126 are printable characters and include the numbers 0-9, all 26 letters (both lowercase and uppercase), and many common punctuation symbols.
    • 127 is the symbol for delete.
    • Over time, 8 bits became a standard “chunk-size” for encoding information. ASCII made the transition to this 8-bit encoding by just adding an extra 0 to the front of the old 7-bit codes.

Quick Activity: write your name in ASCII codes

Using the ASCII table, translate your first name from letters to numbers using the ASCII table.

  • Write your name as: *Name!" (capital first letter, exclamation point at the end)
  • Give students a minute to do this

Transitional Remark

  • Having a standardized protocol like ASCII to encode text enables us to send and receive textual information.
  • This is very useful, but there are still instances when we will want even greater expressive power in our digital communications.

Activity

Formatting Text Challenge: Create a protocol for encoding formatted text

Introduction:

"What if you wanted to send formatted text that included things like the ability to underline, bold, or italicize words....specify a different font size, or color?"

Teaching Tip

You might poll the class for other kinds of formatting you see with text. Some things that might come out: tables or grids of text different font faces (e.g. Arial, Times, gill sans, etc.) * placement of text on the page, like floating text boxes, etc.

Things like this:

Remarks

Today your challenge is to:

  • Invent a protocol for sending formatted text
  • Use the Internet Simulator to test out your protocol.

You will also notice that the Internet Simulator has been updated so that you can now type ASCII text characters to send.

Transition to Code Studio

Teaching Tip

Before starting the activity you might want to have students just practice sending ASCII text messages to each other to get a feel for the new environment.

While doing this you should point out that everything is still binary underneath the hood.

You can use the My Device tab in the Internet Simulator to turn different encoding schemes on or off.

  • Students should notice the new ability to send text.

Activity Goal

Students must define a protocol that allows for the encoding of text formatting, while only making use of the printable ASCII character set, i.e. 32-126.

A conventional way to do this is to specify a set of “reserved words” or characters, that should be interpreted as formatting instructions. For example, HTML uses angle-brackets <...> with opening and closing tags, for example:

Students who have seen HTML prior to this class may come up with schemes for the simple problems easily; give them some of the more challenging encodings, such as adding column formatting to their text, placing text at an arbitrary location on the page (like a text box), creating tables, etc.

Invent a protocol for sending formatted text with the Internet Simulator

(from the activity guide)

Directions

Work with a partner or in a small team to develop a protocol that allows you to send formatted text.

Guidelines

Both the text and the formatting instructions must be derived from the printable ASCII character set (i.e. codes 32-126).

Your protocol must encode at least:

Teaching Tip

Give teams time to develop their protocols, either on their Activity Guides or on a separate poster, document, slideshow, etc.

Encourage students to iteratively test their protocols to make sure they have not overlooked any gaps in their protocol.

  • bold, italics, and underlining
  • three different font sizes (large, medium, and small)
  • three different font colors (red, black, blue)
  • You will demonstrate that your protocol works by sending a message with the Internet Simulator
  • You will send a message and the recipient must be able to faithfully draw (or produce in some fashion) the formatted text, based only on the data she received. Here's a sample message:

Develop Your Protocol

Use the space below to brainstorm ideas for your protocol. Iteratively improve your protocol by testing it out with simple sample messages.

Demonstrate:

  • Test students’ protocols by providing a formatted message to one member of the team and having them send the message to their partner using the Internet Simulator.
  • Students may recreate the message by hand or in a text document and compare results to the intended message.
  • For stronger proof, ask one member of each group to move to the other side of the room or hallway.

Wrap-up

Content Corner

What students likely did in the activity was invent a text-based code. Whether they are formatting languages like HTML, or Markdown or programming languages like Java, C, or Python, all of these languages have one thing in common: they use ASCII text to encode other text or information.

Don't be shy about telling students that they just invented a coding language. At this point in the course a code and a protocol are very similar. Even though it's probably something no one else will use, the process students just went through gives a taste of inventing any kind of formal language or protocol that ultimately needs to be interpreted and processed by a computer.

Discuss the results of the activity

Discuss Use a group discussion strategy to address these questions:

  • Were most groups successful?
  • If not, what caused the most trouble?
  • Were some components of the challenge easier to address than others?

Compare/contrast encoding schemes with HTML

Optional Demonstration

  • Most web browsers allow you to view the source code for a web site (e.g. Chrome allows this in Developer Tools).
  • Point out the tagging system used to structure the text of a website. Ask students to consider how similar or different this protocol is to their own.
  • W3Schools Introductory HTML: http://www.w3schools.com/html/html_intro.asp
  • Today’s activity is motivated by the real-life challenge of bringing web pages to life.

  • Students more familiar with HTML / CSS may recognize many of these ideas, but it can still be instructive to show the class that much of the information required to view a web page isn’t the content itself, but information on how it should be formatted.

Discussion Goal

There are many ways to answer this question. Any response that acknowledges:

  • sequences of binary states can be used to represent numbers
  • numbers can be assigned to letters of the alphabet to encode text
  • with plain text you can make a code you can use to apply other meanings (or formats) to text
  • You can invent a "formatting language" (like HTML) to represent different ways you want text messages to appear.

Discuss layers of encodings

Do a quick Think Pair Share or perhaps assign this question as written work.

Prompt:

"Take a moment to think about the layers of encodings that allowed for formatted text to be transmitted over the Internet."

"Imagine someone pointed to piece of formatted text and asked: 'Can you explain to me how this is encoded in binary?' How would you explain it?"

  • Give students a moment to jot ideas down.
  • Discuss with neighbor.

  • Share explanations with the class.

Assessment

Lesson Assessment

Rubric:

Questions (found both on the Rubric and in Code Studio):

  • How many bits are required to store the number “150” in ASCII?
    • 3 bits
    • 8 bits
    • 16 bits
    • 24 bits
    • 32 bits
  • The word “Apple” translated into its ASCII number equivalent is:
    • 097 112 112 108 101
    • 097 108 108 111 119
    • 065 112 112 108 101
    • 065 110 110 105 101
    • 065 108 108 111 119
  • What problems arose in your efforts to create a working protocol? How did you think about the problems in order to solve them?
  • Describe one instance in which collaboration with a partner influenced the final protocol your team produced.

Chapter Assessment

Teaching Tip

Some questions on these assessment may seem ‘out there’ or only indirectly related to the material in the lessons. That is intentional as it is a good simulation of the kinds of questions students might find on the real exam. In many cases, a student can probably use their judgment and intuition based on what they’ve learned to make a pretty good guess at a question.

However as always these resources are just a suggestion and you should use them as best suits your class and their needs. The goal of CSP is to grow participation in computer science, so if offering this as a high stakes test early in the year will go against that goal maybe try going through the assessment in a lower stakes way (allow students to work with a partner, make the assessment worth less points, etc.) to practice for the future.

There is a multiple choice assessment for this chapter available on Code Studio. It can be found on the stage right after this lesson and uses the Lockable Stages feature. If you are new to Lockable Stages check out How to Administer a Locked Assessment.

Extended Learning

  • Additional Formatting Encoding Challenges:
    • Special characters not found in the ASCII encoding (e.g. ñ)
    • Multiple columns of text
    • A text box at any location on the screen
    • A table of information
  • Continue the exploration of HTML by determining how you would complete today’s activity in HTML. Further compare your own protocol with HTML.
  • Read Blown to Bits (www.bitsbook.com), Chapter 3, Ghosts in the Machine, pp. 73-80 (What You See Is Not What the Computer Knows), then answer the following questions:
    • Give an example of your own when just knowing what a computer did wasn’t sufficient - you really needed to know how and why it was doing what it was doing as well.
    • Talk about file metadata and how it “fingerprints” a file. Include a discussion of file metadata benefits and challenges.
  • Read Blown to Bits (www.bitsbook.com), Chapter 3, Ghosts in the Machine, pp. 80-88 (Representation, Reality, and Illusion), then answer the following questions:
    • How does highlighting in a PDF doc work? What are the computational ideas utilized?
  • Continue the exploration of HTML by determining how you would complete today’s activity in HTML. Further compare your own protocol with HTML.
  • Lesson Vocabulary & Resources
  • 1
  • (click tabs to see student view)
View on Code Studio

Student Instructions

Unit 1: Lesson 7 - Encoding and Sending Formatted Text

Background

Since all information sent by a computer is encoded in bits, we need protocols to help us interpret these bits as meaningful information. In previous lessons we learned how to use the binary number system to allow us to represent numbers in bits. We now address the challenge of encoding text in bits and find that we can reuse and build upon much of the work we did to encode numbers. This pattern of reusing solutions to other problems arises frequently in computer science and allows the development of more complex systems.

ASCII encoding is the standard number-to-text encoding scheme used in computers and on the Internet. This protocol is 7 bits long and so can encode 128 different symbols, each corresponding to one of the binary numbers between 0 and 127. Originally the encoding was designed for older printers and so the codes associated with the numbers 0-31 were designated as "control codes" which instruct the printer to perform certain actions. These codes are now largely out of use. Codes 32-126, however, contain many of the most commonly used symbols you use on the internet, including the lowercase and capital letters, the numbers 0-9, and most commonly-used punctuation marks. Perhaps more remarkably, this printable character set is also used to represent all of the text formatting we use to add variety and emphasis to "plain-text" documents. The ability to represent formatting using only ASCII symbols is the result of cleverly designed protocols.

Vocabulary

  • ASCII - ASCII - American Standard Code for Information Interchange. ASCII is the universally recognized raw text format that any computer can understand.
  • code - (v) to write code, or to write instructions for a computer.
  • Protocol - A set of rules governing the exchange or transmission of data between devices.

Lesson

  • Learn about the ASCII encoding scheme.
  • Invent a text formatting language
  • Design a protocol for sending formatted text.
  • Discover the challenges of encoding meta-information.

Resources

  • Sending Formatted Text - Activity Guide (PDF | DOCX)
  • Rubric - Sending Formatted Text - Rubric (PDF | DOCX)
View on Code Studio

Sending Formatted Text Activity

Directions:
Work with a partner or in a small team to develop a protocol that allows you to send formatted text.

Guidelines:
- Both the text and the formatting instructions must be derived from the printable ASCII character set (i.e. codes 32-126). - Your protocol must encode at least: - bold, italics, and underlining - three different font sizes (large, medium, and small) - three different font colors (red, black, blue) - You will demonstrate that your protocol works by having the recipient be able to faithfully draw (or produce in some fashion) the formatted text based only on the data she received through an ASCII-text version of the Internet Simulator. A sample message can be found below.

Develop Your Protocol:
Use the worksheet handed out by your teacher to brainstorm ideas for your protocol. Iteratively improve your protocol by testing it out with simple sample messages.

View on Code Studio

Student Instructions

View on Code Studio

Teaching Tip

47 character + another 47 with shift = 94 total characters. You'd need 7 bits to encode that many characters.

Student Instructions

View on Code Studio

Student Instructions

Many languages do not use the characters of U.S. English. Suppose you wanted to be able to encode the characters of every language on earth within a single protocol. Guess how many characters would need to be encoded and calculate the number of bits that would be required per character. Then discuss the benefits and drawbacks of using this single unified system.

View on Code Studio

Teaching Tip

Answer

The answer is 24 because the ASCII text "150" is just 3 ASCII characters - same as any other 3-character text, "ABC", "DOG", whatever. And in ASCII every character is encoded as one byte which is 8 bits, and 8 * 3 = 24.

Student Instructions

How many bits?

ASCII has an encoding for every character of the alphabet as well as encodings for numbers -- that is, encodings for the symbols of the digits 0-9. So here is a trick question: How many bits are required to store the text of the number "150" in ASCII?'

View on Code Studio

Student Instructions

View on Code Studio

Student Instructions

Respond to this prompt or to another as directed by your teacher.

How long did it take to send a formatted text message using your protocol?

How many extra bits did your encoding scheme need to communicate the text? Calculate the percentage of bits that were used for formatting instead of sending the actual text message.

percent = (formatting bits)/(total bits) * 100

View on Code Studio

Student Instructions

What problems arose in your efforts to create a working protocol? How did you think about the problem(s) in order to solve it?

Describe one instance in which collaboration with a partner influenced the final protocol your team produced.

Standards Alignment

View full course alignment

CSTA K-12 Computer Science Standards (2011)

CD - Computers & Communication Devices
  • CD.L2:6 - Describe the major components and functions of computer systems and networks.
  • CD.L3A:9 - Describe how the Internet facilitates global communication.
CL - Collaboration
  • CL.L2:3 - Collaborate with peers, experts and others using collaborative practices such as pair programming, working in project teams and participating in-group active learning activities.
CT - Computational Thinking
  • CT.L2:13 - Understand the notion of hierarchy and abstraction in computing including high level languages, translation, instruction set and logic circuits.
  • CT.L2:14 - Examine connections between elements of mathematics and computer science including binary numbers, logic, sets and functions.
  • CT.L2:7 - Represent data in a variety of ways including text, sounds, pictures and numbers.
  • CT.L2:8 - Use visual representations of problem states, structures and data (e.g., graphs, charts, network diagrams, flowcharts).
  • CT.L3A:6 - Analyze the representation and trade-offs among various forms of digital information.

Computer Science Principles

2.1 - A variety of abstractions built upon binary sequences can be used to represent all digital data.
2.1.1 - Describe the variety of abstractions used to represent data. [P3]
  • 2.1.1A - Digital data is represented by abstractions at different levels.
  • 2.1.1B - At the lowest level, all digital data are represented by bits.
  • 2.1.1C - At a higher level, bits are grouped to represent abstractions, including but not limited to numbers, characters, and color.
  • 2.1.1D - Number bases, including binary, decimal, and hexadecimal, are used to represent and investigate digital data.
  • 2.1.1E - At one of the lowest levels of abstraction, digital data is represented in binary (base 2) using only combinations of the digits zero and one.
2.1.2 - Explain how binary sequences are used to represent digital data. [P5]
  • 2.1.2B - In many programming languages, the fixed number of bits used to represent characters or integers limits the range of integer values and mathematical operations; this limitation can result in overflow or other errors.
  • 2.1.2D - The interpretation of a binary sequence depends on how it is used.
  • 2.1.2E - A sequence of bits may represent instructions or data.
  • 2.1.2F - A sequence of bits may represent different types of data in different contexts.
2.2 - Multiple levels of abstraction are used to write programs or create other computational artifacts
2.2.1 - Develop an abstraction when writing a program or creating other computational artifacts. [P2]
  • 2.2.1A - The process of developing an abstraction involves removing detail and generalizing functionality.
  • 2.2.1B - An abstraction extracts common features from specific examples in order to generalize concepts.
2.2.3 - Identify multiple levels of abstractions that are used when writing programs. [P3]
  • 2.2.3E - Binary data is processed by physical layers of computing hardware, including gates, chips, and components.
3.1 - People use computer programs to process information to gain insight and knowledge.
3.1.3 - Explain the insight and knowledge gained from digitally processed data by using appropriate visualizations, notations, and precise language. [P5]
  • 3.1.3A - Visualization tools and software can communicate information about data.
  • 3.1.3E - Interactivity with data is an aspect of communicating.
3.3 - There are trade offs when representing information as digital data.
3.3.1 - Analyze how data representation, storage, security, and transmission of data involve computational manipulation of information. [P4]
  • 3.3.1A - Digital data representations involve trade offs related to storage, security, and privacy concerns.
  • 3.3.1B - Security concerns engender tradeoffs in storing and transmitting information.
6.1 - The Internet is a network of autonomous systems.
6.1.1 - Explain the abstractions in the Internet and how the Internet functions. [P3]
  • 6.1.1A - The Internet connects devices and networks all over the world.
  • 6.1.1B - An end to end architecture facilitates connecting new devices and networks on the Internet.
  • 6.1.1C - Devices and networks that make up the Internet are connected and communicate using addresses and protocols.
  • 6.1.1D - The Internet and the systems built on it facilitate collaboration.
6.2 - Characteristics of the Internet influence the systems built on it.
6.2.2 - Explain how the characteristics of the Internet influence the systems built on it. [P4]
  • 6.2.2D - Interfaces and protocols enable widespread use of the Internet.
  • 6.2.2F - The Internet is a packet-switched system through which digital data is sent by breaking the data into blocks of bits called packets, which contain both the data being transmitted and control information for routing the data.
  • 6.2.2G - Standards for packets and routing include transmission control protocol/Internet protocol (TCP/IP).
  • 6.2.2H - Standards for sharing information and communicating between browsers and servers on the Web include HTTP and secure sockets layer/transport layer security (SSL/TLS).