There are many kinds of networks in the world. These range from the very old such as serial links, through to wide area networks made from copper and fibre, to wireless networks of various kinds, both for computers and for telecommunications devices such as phones. These networks obviously differ at the physical link layer, but in many cases they also differed at higher layers of the OSI stack.
Over the years there has been a convergence to the "internet stack" of IP and TCP/UDP. For example, Bluetooth defines physical layers and protocol layers, but on top of that is an IP stack so that the same internet programming techniques can be employed on many Bluetooth devices. Similarly, developing 4G wireless phone technologies such as LTE (Long Term Evolution) will also use an IP stack.
While IP provides the networking layer 3 of the OSI stack, TCP and UDP deal with layer 4. These are not the final word, even in the interenet world: SCTP has come from the telecommunications to challenge both TCP and UDP, while to provide internet services in interplanetary space requires new, under development protocols such as DTN. Nevertheless, IP, TCP and UDP hold sway as principal networking technologies now and at least for a considerable time into the future. Each langauge has some measure of support for this style of programming.
This chapter discusses the IP layer as this is fundamental to all IP networking programs.
The OSI model was devised using a committee process wherein the standard was set up and then implemented. Some parts of the OSI standard are obscure, some parts cannot easily be implemented, some parts have not been implemented.
The TCP/IP protocol was devised through a long-running DARPA project. This worked by implementation followed by RFCs (Request For Comment). TCP/IP is the principal Unix networking protocol. TCP/IP = Transmission Control Protocol/Internet Protocol.
The TCP/IP stack is shorter than the OSI one:
TCP is a connection-oriented protocol, UDP (User Datagram Protocol) is a connectionless protocol.
The IP layer provides a connectionless and unreliable delivery system. It considers each datagram independently of the others. Any association between datagrams must be supplied by the higher layers.
The IP layer supplies a checksum that includes its own header. The header includes the source and destination addresses.
The IP layer handles routing through an Internet. It is also responsible for breaking up large datagrams into smaller ones for transmission and reassembling them at the other end.
In order to use a service you must be able to find it. The Internet uses an address scheme for devices such as computers so that they can be located. This addressing scheme was originally devised when there were only a handful of connected computers, and very generously allowed upto 2^32 addresses, using a 32 bit unsigned integer. These are the so-called IPv4 addresses. In recent years, the number of connected (or at least directly addressable) devices has threatened to exceed this number, and so "any day now" we will switch to IPv6 addressing which will allow upto 2^128 addresses, using an unsigned 128 bit integer. The changeover is most likely to be forced by emerging countries, as the developed world has already taken nearly all of the pool of IPv4 addresses.
The address is a 32 bit integer which gives the IP address. This addresses down to a network interface card on a single device. The address is usually written as four bytes in decimal with a dot '.' between them, as in "127.0.0.1" or "184.108.40.206".
The IP address of any device is generally composed of two parts: the address of the network in which the device resides, and the address of the device within that network. Once upon a time, the split between network address and internal address was simple and was based upon the bytes used in the IP address.
This scheme doesn't work well if you want, say, 400 computers on a network. 254 is too small, while 65,536 (-2) is too large. In binary arithmetic terms, you want about 512 (-2). This can be achieved by using a 23 bit network address and 9 bits for the device addresses. Similarly, if you want upto 1024 (-2) devices, you use a 22 bit network address and a 10 bit device address.
Given an IP address of a device, and knowing how many bits N are used for the network address gives a relatively straightforward process for extracting the network address and the device address within that network. Form a "network mask" which is a 32-bit binary number with all ones in the first N places and all zeroes in the remaining ones. For example, if 16 bits are used for the network address, the mask is 11111111111111110000000000000000. It's a little inconvenient using binary, so decimal bytes are usually used. The netmask for 16 bit network addresses is 255.255.0.0, for 24 bit network addresses it is 255.255.255.0, while for 23 bit addresses it would be 255.255.254.0 and for 22 bit addresses it would be 255.255.252.0.
Then to find the network of a device, bit-wise AND it's IP address with the network mask, while the device address within the subnet is found with bit-wise AND of the 1's complement of the mask with the IP address.
The internet has grown vastly beyond original expectations. The initially generous 32-bit addressing scheme is on the verge of running out. There are unpleasant workarounds such as NAT addressing, but eventually we will have to switch to a wider address space. IPv6 uses 128-bit addresses. Even bytes becomes cumbersome to express such addresses, so hexadecimal digits are used, grouped into 4 digits and separated by a colon ':'. A typical address might be 2002:c0e8:82e7:0:0:0:c0e8:82e7.
These addresses are not easy to remember! DNS will become even more important. There are tricks to reducing some addresses, such as eliding zeroes and repeated digits. For example, "localhost" is 0:0:0:0:0:0:0:1, which can be shortened to ::1
Each address is divided into three components: the first is the network address used for internet routing. My ISP for example gives me a 56 bit network address for my home network. Within that, I have 16 bits in which to create subnets. Most homes for example will only have a single subnet. The last part is the device component, of 64 bits, often based on a hosts MAC address, but not necessarily.
IPv6 can be unicast or multicast. Unicast addresses are primarily of three types
fe80::c474:4605:44af:462c%eth0They are in the range fe80::10
For users, working with IP addresses is too difficult.
Consequently, most hosts are given a host name
These names are much easier for users to work with.
However, the names must be
resolved to IP addresses for most
network functions. The resolver may be a list
of hard-coded name-address pairs, but much more
common is to use the Domain Name Service (DNS).
This is a highly distributed service that maps names to IP
addresses, and sometimes IP addresses back to names.
We won't go into any of the details of DNS, but most of the rest of this chapter is concerned with how each language uses DNS services to get IP addresses from host names.
The world no longer accepts ASCII as the 'only' text encoding. An increasing number of organisations prefer to work in their own language such as Greek, Arabic, Thai, etc. IDN (Internationalized Domain Names) allows host names to be in any language. But DNS won't accept most of them and they have to be encoded into ASCII for lookup services to find them.
The actual domain name registered is the ASCII name, and s/w has to convert the language-specific name to the ASCII version. This will be discussed in the chapter on Text as it is a complex issue.
Copyright © Jan Newmarch, firstname.lastname@example.org
Based on a work at https://jan.newmarch.name/NetworkProgramming/ .