Serialisation

General

Introduction

Messages passed between clients and servers contain an ordered sequence of bytes. In the last chapter we looked at encoding the strings types of various languages into one byte sequence form, the UTF-8 encoding.

But languages generally have a much wider range of data types: various primitive types such as integers of various sizes, structures or records, arrays, maps and so on. On any host, these constructs will be represented at runtime as bytes in memory, but there isn't any obvious ways of transferring these between hosts, particularly if the programs on each were written in different languages.

What is required is serialization and deserialization between the data objects of a programming language and a sequence of bytes, and vice versa. These operation are also known as marshalling and unmarshalling, although some may claim there is a subtle difference (which we shall ignore).

Serialization or RPC?

Serialization just encodes/decodes data to and from a sequence of bytes. RPC (remote procedure call) transfers these bytes across the network along with an operation to be executed remotely on that data, and a result returned. RPC will be discussed in a later chapter.

Binary or text?

There are two principal serialization methods:

Serialize to text (which is then serialized to bytes)
Serialize directly to bytes

The first has the advantage of readability: you can 'read' the data using tools such as wireshark, send it using telnet as a client or send and receive as a server using netcat. But it is wasteful of space: a one byte integer still takes several bytes as a string. It also has to be encoded as bytes for transmission and decoded on receipt. Examples of text serialization include XML, JSON and YAML.

Byte format is much more efficient. But it is much harder to debug and make sense of the packets. Examples of direct binary formats include CDR, CBOR, Protocol Buffers, XDR and many others.

Self-describing or external description?

The format of messages sent by one one host to another must be understandable by the other. One way is the format to be self-describing. For example, to send an int, it could be prefixed by a type indicator as in


	  "int": 42

Of course, the receiver has to understand this formatting language! Examples include JSON and XML data types schema.

The main alternative is to have a specification language for data types, and any data structure used is to be written in this language. A compiler will then translate this into code for the target programming languages to serialize and deserialize the data. As long as both hosts use code generated from the same specification and using compliant compilers, then they can understand the data.

Examples include Protocol Buffers, ONC, and XDR.

A third possibility is where the target programing language is fixed: the hosts involved are running programs written all in the same language. Then the standard libraries for those languages alrady 'know' the encoding used and can serialize and deserialize the native code objects. Examples include Gob (for Go), Pickle (for Python) and Java Object Serialization (for Java). I won't look any further at these in this book.

Example

I will use this example in the these three chapters. Suppose we have data about a person and their email addresses. Informally it could look like this


Name {
    string family
    string personal
}

Email {
    string kind
    string address
}

Person {
    Name name
    Email[] emails
}

An example could be


Person {
    Name: {
              family: "Newmarch"
              personal: "Jan"
          }
    Email[]: {
              Email: {
                        kind: "home"
                        address: "jan@newmarch.name"
                     }
              Email: {
                        kind: "work"
                        address: "j.newmarch@boxhill.edu.au"
                     }
             }
}

Resources

Comparison of data-serialization formats

Copyright © Jan Newmarch, jan@newmarch.name

" Network Programming using Java, Go, Python, Rust, JavaScript and Julia" by Jan Newmarch is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License .
Based on a work at https://jan.newmarch.name/NetworkProgramming/ .

If you like this book, please contribute using PayPal