TCP

General

Introduction

IP looks after the Network layer, routing packets from one host to another for particular services. The next layer up is the Transport layer, responsible for transferring data from one host to the other, looking after reliability, re-transmission of packets (if appropriate), flow control, etc.

There are many protocols in use at this layer. Under Unix/Linux the file /etc/protocols contains a list of over 100. TCP (transmission control protocol) is one of the two principal protocols in TCP/IP networking. The other, UDP is dealt with in the next chapter.

TCP is connection oriented, meaning that it maintains a connection between two hosts and that packets flow sequentially along this connection. So packets are delivered in order. TCP also has a mechanism to ensure that packets are acknowledged and may be re-transmitted if lost. It also exercises flow control with packets send backing off under network load for example.

Port addresses

TCP and many other protocols use the concept of a port to identify a service. Both ends of a TCP connection have a port number, in the range 0 - 65535. A service provider will usually have a "well known" port, and on Unix/Linux these are listed in the file /etc/services. Examples are 80 for HTTP, 443 for HTTPS, 22 for SSH and 21 for FTP control. Ports in the range 0 - 1023 are usually reserved for system-controlled ports. Clients will typically have a "transient" port, in the range 49152 - 65535.

Sockets

The programming abstraction for TCP is that of socket. A socket represents a host IP address and a port number. Of course, there is more to it than that - a socket has state such as being connected to another socket, and is used to send and receive packets.

The example used below is of an echo client/server. The client reads lines from the console and writes to the socket. The server reads each line from its socket and writes the line back to the client which prints it.

Pseudocode for the client is


connect a socket to the server
do
    read a line from the console
    write the line to the socket
    read the line back from the socket
    write the line to the console
until line == "BYE"
      

Pseudocode for the server is


while true
    accept an incoming socket
    read data from the socket
    if the data is empty
        close the socket and return
    write the data back to the socket
    if the line == "BYE"
        close the socket
      

Data format

The data sent in a TCP packet is binary data: the maximum size of a TCP packet (including headers) is 65535 bytes, but more typically is constrained to be the MTU of the network, usually about 1500 bytes. Data larger than that is fragmented across multiple packets, but usually the API will reassemble them for you.

Stevens recommends against sending binary data without extra work: hosts may be little-endian or big-endian, some may have 32 bit integers others will have 64 bits, and some languages will add padding to data structures. Managing binary data is discussed in the next chapter.

Text has many complexities too: character sets for different languages, different encodings and so on. This is discussed in a later chapter too.

For this example, we are sending data from a client to a server which doesn't attempt to do any thing with it, and just sends it back unchanged to the client that sent it. So the complexities don't arise here and we jsut look at how to establish a connection and transfer data between them.

Dual stack servers

Once there were only IPv4 hosts. Sometime mythical time in the future there may be only IPv6 hosts. In the meantime, there are dual IPv4/IPv6 hosts running both IPv4 and IPv6 stacks. Can IPv4 talk to IPv6 or vice versa? There is no way of squashing an IPv6 address into an IPv4 one, but there is a standard way of converting an IPv4 address into an IPv6 one. Most dual stack systems can do this automatically.

Both IPv4 and IPv6 clients can talk to an IPv6 server. The IPv4 addresses are converted by the receiving O/S into IPv6 addresses and handled by the IPv6 server. The programming language APIs don't always make it clear how they are managing this though.


Copyright © Jan Newmarch, jan@newmarch.name
Creative Commons License
" Network Programming using Java, Go, Python, Rust, JavaScript and Julia" by Jan Newmarch is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License .
Based on a work at https://jan.newmarch.name/NetworkProgramming/ .

If you like this book, please contribute using PayPal