TCP/IP

Norm Al Dude and Professor N. Erd
on the subject of TCP/IP

Nicky Erd was surfing the Internet. From the comfort of his chair he accessed a few news items, left a message to Vice President Gore about his dislike of the new gasoline that gives him headaches, and looked up some things in the genetic engineering references in the Library of Congress. Then he yawned and switched off his computer.

I asked, "How in the world could you talk to all these computers? Aren't they all different? Does your PC know so many languages?"

He smiled condescendingly and answered with his own question. "Don't you know about TCP/IP?"

"What in the world is that? Some nerd that does translation?"

"Oh, no - it's just a suite of communication protocols that allows computers of different types to interconnect. That's what the Internet is -- a network of networks, just like the telephone network, but instead of telephones, it interconnects computers."

"And what is a suite of protocols?"

N. Erd started to explain and we ended up talking the whole day. Here is an abbreviated report - and if occasionally I start to sound like him it's only because his PhDitis is catching.

"A suite of protocols" is a bunch of programs (software) that cooperate with each other to accomplish the task of providing interconnectivity between computer processes.

IBM has developed SNA (Systems Network Architecture) and APPN (Advanced Peer-To-Peer-Networking); DEC has developed DNA (Digital Network Architecture); Apple has Appletalk; Novell uses something called SPX/IPX (Sequenced Packet Exchange/Internet Packet Exchange); Xerox uses XNS (Xerox Network Services) and sold it to others such as Ungerman-Bass, 3-COM and Banyan (Novell SPX/IPX is also an XNS grandchild); etc., etc. They all do the same thing, but with different bits and bytes.

Well, TCP/IP is yet another one of these. And, specifically, this is the protocol suite used over the Internet.

The whole thing started in the late 1960s when the US Government through its Advanced Research Project Agency (ARPA) decided to set up contract work with universities and corporate research community representatives to interconnect world-wide computers in a single network.

The problem was that many research centers and many government suppliers had to send and receive data to and from government centers, while the machines involved were incompatible with each other.

In 1969 the first internetwork (ARPANET) occurred with 4 nodes using routing devices that allowed data packet deliveries between otherwise incompatible computers. The packet switching technique proved to be better than nothing, and in fact the worldwide X.25 standard resulted from it. However, it was not well suited for military networks (not robust enough) because it used virtual circuits that could fail. DARPA (the Defense Advanced Research Project Agency) had to relaunch the routing project when a message to warn the USS PUEBLO of the impending North Korea attempt to seize it failed to reach the ship which was subsequently captured. In 1974, two dudes by the name Vinton G. Cerf and Robert E. Kahn proposed a suite of protocols named TCP/IP (Transmission Control Protocol/Internet Protocol) that proved to respond well to the following requirements:

Ability to route data between subnetworks
Independence of subnetwork technology
Independence of host computer hardware
Independence of operating system
Tolerance of any error routes in subnetworks
Robust recovery from failures
Ability to add a new subnetwork and keep going

The TCP/IP protocol suite was a real winner from start - and it was soon integrated into the UNIX operating system. It became the internetworking technology of choice for both the government and non-government networks. It is now used for Internet access and routing and, with some modifications, it may be the technology for the future internetworking as well. Let's see what TCP/IP consists of.

You see, a network with networking software is solving interconnectivity problems for basically three types of applications: file transfer, data-base inquiry and response, and electronic mail. The networking software handles common tasks for such applications such as how to access the desired file, what security to provide, and how to encode and present the information.

It's like a genie who comes out of this bottle called network and says, "Master, you have three wishes - what will they be? What presentation orders do you have? (Code, encryption, compression types; printing fonts, etc.) What session security options do you want? (Password, account name or number, log-on/log-off message, etc.) And, what operation do you want executed? (File transfer, inquiry-response, e-mail)." Then the genie of the network performs these functions under the "layer" of programs called the Application Layer or Host-to-Host Layer in the TCP/IP protocol suite.

Figure 1: TCP/IP protocol stack

The most well-known applications in TCP/IP are:

telnet - Remote control of a distant computer port from a local keyboard
ftp - File Transfer Protocol
SMTP - Simple Mail Transfer Protocol (e-mail)
SNMP - Simple Network Management Protocol
ping - Packet Internet Groper (network reachability test)
TFTP - Trivial File Transfer Protocol (a 'quick and dirty' way to transfer files without security and without error control)

There are many others, but these are the 'core' applications. Except for ping (which is a simple diagnostic -- it basically sends a packet and expects it echoed back from the destination to see whether the destination was reached and how long it took to get there and back), the others are client-server applications. Files are transferred to/from the server from/to the client, e-mail is sent from a client to a mail-server, and is read from the server by a recipient client, and remote control of a host port is in fact control of a telnet server port by a telnet client.

For these applications each client runs "client software" or "front-end software" while the server runs "server software" or "back-end software". The client and server application programs interact, first by invoking the opening of a session (login message, password, address or name of resource to be accessed), then by creating a transmit and a receive buffer for that session in both the client and the server and selecting the proper presentation parameters (code, encryption, compression, print formats, etc.). Once this phase is over, the "packing/unpacking" layer is called.

There are two kinds of "packing" protocols: TCP (Transmission Control Protocol) and UDP (User Datagram Protocol).

TCP is a sophisticated full-duplex (both directions of transmission are used simultaneously) protocol that chops the file to be transmitted into pieces called 'segments' that can be as small as 21 bytes and as big as 64,000 bytes. Each segment is sequenced by the sending TCP and acknowledged by the receiving TCP. The receiving TCP is controlling the flow of segments by allocating a 'window' of 'so many bytes' that the transmitter can send at any time. In addition, TCP can flag data as "urgent" or "externally urgent/to be pushed", and can negotiate maximum segment size. The segments are transmitted in sequence and checked for accuracy (with an error checking code called CRC or "Cyclic Redundancy Check") and retransmissions are requested when errors are detected.

The TCP sequence number, acknowledgment number, CRC, destination and source port flags and options are placed ahead of the data in a so-called 'header'.

Figure 2: TCP header

A TCP segment consists of its header and the data attached to it. The source and destination ports are used to define the client and server session numbers.

UDP is a very simple protocol that attaches a header to the data consisting of a destination and source port number and a CRC and sends the data one packet (datagram) at a time with no sequencing or error control.

Figure 3: UDP header

If you folks ever read nerdic books that show 'headers' and 'segments' like these figures, you might wonder what they mean. These are 'bit maps', says N. Erd. I see them as chests of drawers. For example, the TCP header contains a drawer called 'acknowledgment number', one called 'sequence number', one called 'flags', one called 'windows', etc.

The bits are extracted from each drawer in the transmitter and sent one by one through the network until they reach the receiver. At that point they get stored in their respective drawer (acknowledgment bits in the 'acknowledgment' drawer, sequence bits in the 'sequence' drawer, etc.). Then the mighty session or application looks in each drawer and interprets the bits to take action based on the results of that interpretation. TCP and UDP are what they call 'transport layer protocols' in the OSI (Open System Interconnect) layered protocol model.

If you read what I learned about routing, you'll see that the transport layer is responsible for

Chopping user messages into packets for the purpose of error control (it's better to have to retransmit only a packet that was found in error, than the whole message)
Flow control (there is only so much space available in the receive buffer - and if its all occupied we can stop transmission at the next packet, in the middle of the message if necessary)
Sharing (multiplexing) links (packets from user A can be traveling the same link as those from user B, C, etc., without the message of one user having to wait for the passage of an entire message of another user)

These packets (called segments in TCP/IP) are only prepared for travel by TCP and - once arrived at destination - are reassembled by the receiving TCP program into the original message. TCP then delivers the message to the proper session/application (ftp, telnet, etc.). The delivery is done through the 'port'. So, TCP is not moving data physically - it is only 'packing' and 'unpacking' it. The programs and hardware in charge of moving data are called by TCP to do the work.

The 'network layer' program is responsible for putting the packets in an envelope, writing a destination address and a source address and some special delivery options on the envelope, and then requesting the 'data link layer' and the 'physical layer' to deliver the envelopes. These lower layers are the real movers. In TCP/IP, the 'network layer' is IP (Internet Protocol). The 'envelope' that IP uses to put the TCP segments in is actually another header. The envelope with the segment in it is called an 'IP Datagram'.

Figure 4: IP header

The IP header (chest of drawers) contains

Destination and source IP addresses
A 'type of service' which shows how this particular datagram is to be delivered (over which type of media - fiber, satellite, etc., and how fast)
A CRC (error checking code) for the header (IP is unreliable - it does not check data, only the header information is checked, and that's because if addresses are corrupted the datagram might fall into the wrong hands)
Information regarding fragmentation of this datagram - sometimes large IP datagrams (maximum 64,000 bytes) cannot be sent over some parts of the network that can only handle shorter segments or fragments, so the datagram is fragmented by the transmitter and reassembled by the receiver (see Fig. 5)
Various options regarding this datagram, including how to route it, how to identify it (security labeling), how to trace the places through which it passes, how to time-stamp it for delay measurement, etc.
Finally, there are some other nerdic pieces of information such as its length, header length, software version, unique identifier, and a 'protocol' number which shows whether the segment in the datagram is TCP data or a UDP packet or some other type of datagram.

Figure 5: Fragmentation

The way this stuff works is just like the routers article explains it. In the originating host, TCP or UDP segments the message and hands over the segments to IP. IP makes the segments into datagrams by writing the IP headers, then sends the datagrams to a 'default router' (which the TCP/IP nerds call a gateway). Each router examines each datagram's IP header and compares the destination IP address to that of the network under its supervision. If the addresses match, than the router admits the packet in and sends it on the network to the destination host. If not, the router looks in some tables to find the next hop router (post office) where to send (route) the datagram. The last router (the one that matched the destination IP address to its network) has to physically deliver the datagram to the destination IP address host. Since there is absolutely no correspondence between an IP address and a MAC (Medium Access Control) physical address, the router has to have a table like a directory that shows which MAC address corresponds to which IP address. This table is called an Address Resolution Protocol (ARP) table. If the host's address is not in the table, the router will send an "ARP" packet asking "Host IP address - what's your MAC address, buddy?" The host responds (or some proxy for that host), and then the router sends the IP datagram to the physical MAC address found in the response.

Figure 6: Routing flow

1.2 attaches network header to data. DA = 3.4, SA = 1.2.
1.2 sends packet over token ring to default router 1.5.
Router accepts and ACKs the packet.
Router examines destination network number (3), looks in routing tables for best path, drops token ring envelope and builds a WAN envelope, then sends packet over WAN.
Packet is sent from router to router via best path according to routing tables, based on destination network number.
Router 3.5 recognizes DA as its own, drops WAN envelope and builds Ethernet envelope, and places packet on LAN.
3.4 recognizes the DA as its own, drops the Ethernet frame, and forwards the data to upper layers.

Now that we've explained all these things, let me show you what an IP address looks like (it's all very nerdic!).

An IP address is made of four groups of decimal numbers between 0 - 255 separated by dots. Like 190.21.23.41. Some of the numbers are special (like 0.0.0.0 or 255.255.255.255) and are used to designate the default gateway, a broadcast or multicast address, or some reserved numbers for the nerds to play with.

A part of the address designates the network numbers, and the remaining part designates the host number. So, we may say an IP address has the format NETWORK.HOST. You see, computers don't like decimals - they like binary - so the neatly decimal numbers shown above are really transformed into binary by some magic that some N. Erd put in the computer's mind. A decimal number between 0 and 255 can be expressed in binary by using 8 binary digits (bits). For example a 5 would be 00000101, a 10 would be 00001010, a 254 would be 11111110, etc. I am too scared of these things to teach you, so take N. Erd's word for it and don't argue - or else you're doomed to do math. So an IP address, having 4 decimal numbers, needs 4 x 8 = 32 bits.

Traditionally, the conventions are that there are three types of networks.

Class A networks use the first octet (8 bits) to designate the network, with the first (high-order) bit set to 0. The last three octets designate the host. So the Class A network addresses are 1.H.H.H to 127.H.H.H, where H is used to designate the host address octets. Class A addressing allows for 127 networks.
Class B networks use two octets to designate the network and two to designate the host. The network part must begin with 10. The Class B networks are 128.x.H.H to 191.x.H.H, where x is any number between 0 and 255. There can be 16,384 Class B networks.
Class C networks use three octets to designate the network and only one to designate the host. The network part begins with 110. The Class C networks are 192.x.x.H to 223.x.x.H, which allows for 2,097,152 networks.

As you can see, there are very few Class A networks, but each of them can accommodate millions of hosts. A Class B network supports only 65,534 hosts, while Class C only 254 hosts (all 0 and 1 combinations are not allowed).

There is also a Class D address (starts with 1110) used for multicasting and a Class E (1111) address reserved for the nerds.

With the proliferation of PCs and the Internet growing like bread with too much yeast, they are running out of addresses - and therefore some solutions are proposed to conserve address space, and even to change the system. But - so far - most existing routers work on 'Class' assumption.

So, if you're given one IP address, how can you route packets that come from outside your local network to a specific destination in one office or department so that no other office or department sees those packets? The key is, you have to split your network into smaller parts called subnetworks, each with a router in charge, and each identified by a group of bits from within the host portion of the big IP address.

The problem is, how does a router know how many bits from the host portion are designating the subnetwork? This is solved by the nerds with something called a 'mask'. A mask is a binary 32 bit number that has a '1' in all bit positions that have to be examined in a packet IP address and '0' in all bit portions that don't have to be examined, to determine what is the 'network' portion of the address.

Figure 7: Subnetworking and masking

You see, when a packet is to be allowed into the network or subnetwork by a router, the router has to find if the network portion of the datagram address coincides with its own. To do so, it looks at the datagram's address through a window partially covered by a curtain. The curtain covers the 'host' portion and lets the router see only the 'network' portion. The curtain is made of '0's, while the uncovered glass window is made of '1's. This way, the router can see through 1's, the network bits, but cannot see through the 0's, the host bits. For example, if a company has a class B address but wants to divide their local network into 8 subnets, three additional bits are required to designate the subnet (2^3 = 8). In this case, the mask would have to have 32 bits with the first 19 equal '1' and the last 13 equal to '0' (class B network requires 16 bits, plus three for subnetwork). So the mask for this network would be 11111111.11111111.11100000.00000000, or, according to N. Erd, 255.255.224.0.

Each subnetwork can have 2^13-2 (= 8,190) hosts in such a network (all 0s and all 1s are used for broadcasting and other dark purposes).

Folks, other TCP/IP protocols deal with error control (ICMP - Internet Control Message Protocol), building routing tables (RIP - Route Information Protocol or OSPF - Open Shortest Path First), finding physical addresses associated with logical IP addresses (ARP - Address Resolution Protocol), establishing connection to other networks (PPP - Point-to-Point Protocol, EGP - Exterior Gateway Protocol), and finding IP addresses associated with names/directory services (DNS - Domain Name Service) etc.

But I think you've had enough of this nonsense - and, frankly, so have I. Keep in mind, if you're reading this tutorial now, you navigated the Internet to here using TCP/IP!

You may want to look at some other publications that offer more information on TCP/IP and related topics.

So long, folks.

Yours,
Norm Al Dude

Tutorial index

The tutorial index page lists other articles on the technology of data communications.

Home

Return to the main DataComm-US index page.

Norm Al Dude and Professor N. Erd on the subject of TCP/IP