Messages passed between clients and servers contain an ordered sequence of bytes. In the last chapter we looked at encoding the strings types of various languages into one byte sequence form, the UTF-8 encoding.
But languages generally have a much wider range of data types: various primitive types such as integers of various sizes, structures or records, arrays, maps and so on. On any host, these constructs will be represented at runtime as bytes in memory, but there isn't any obvious ways of transferring these between hosts, particularly if the programs on each were written in different languages.
What is required is serialization and deserialization between the data objects of a programming language and a sequence of bytes, and vice versa. These operation are also known as marshalling and unmarshalling, although some may claim there is a subtle difference (which we shall ignore).
Serialization just encodes/decodes data to and from a sequence of bytes. RPC (remote procedure call) transfers these bytes across the network along with an operation to be executed remotely on that data, and a result returned. RPC will be discussed in a later chapter.
There are two principal serialization methods:
The first has the advantage of readability: you can 'read' the
data using tools such as
wireshark, send it using
telnet as a client or send and receive as a server
netcat. But it is wasteful of space: a one byte
integer still takes several bytes as a string. It also has to be encoded
as bytes for transmission and decoded on receipt.
Examples of text serialization include XML, JSON and YAML.
Byte format is much more efficient. But it is much harder to debug and make sense of the packets. Examples of direct binary formats include CDR, CBOR, Protocol Buffers, XDR and many others.
The format of messages sent by one one host to another must be understandable by the other. One way is the format to be self-describing. For example, to send an int, it could be prefixed by a type indicator as in
Of course, the receiver has to understand this formatting language!
Examples include JSON and XML data types schema.
The main alternative is to have a specification language for data types, and any data structure used is to be written in this language. A compiler will then translate this into code for the target programming languages to serialize and deserialize the data. As long as both hosts use code generated from the same specification and using compliant compilers, then they can understand the data.
Examples include Protocol Buffers, ONC, and XDR.
A third possibility is where the target programing language is fixed: the hosts involved are running programs written all in the same language. Then the standard libraries for those languages alrady 'know' the encoding used and can serialize and deserialize the native code objects. Examples include Gob (for Go), Pickle (for Python) and Java Object Serialization (for Java). I won't look any further at these in this book.
I will use this example in the these three chapters. Suppose we have data about a person and their email addresses. Informally it could look like this
An example could be
Copyright © Jan Newmarch, email@example.com
Based on a work at https://jan.newmarch.name/NetworkProgramming/ .