Upto: Table of Contents of full book "Internet of Things - a techie's viewpoint"

Serialisation

Resources

Introduction

Clients and servers will exchange data. Part of this data may be in the transport protocol, but much will be in the "body" of a message. If the data is just a stream of bytes then it is easy to send it across the network in exactly that form: a stream of bytes. That is all the network knows how to send data.

But most data has some structure to it. Most programming languages deal in highly structured data;

Serialisation is representing this data as a linear set of bytes, which can be de-serialised back into an equivalent form. There have been a huge number of systems addressing this issue.

One key distinguisher between the different systems is whether or not an external specification can exist, must exist or does not exist. XML may be specified, or may be required to specify a schema; JSON does not have such a mechanism; Protocol Buffers require an external specification.

XML

XML was designed as an extensible language between HTML and SGML. HTML only has fixed tags. SGML can have any defined tags but is too complex. XML sits in the middle ground, where any tags can be defined and is computationally tractable.

XML document structures can be defined using XML Schema or Relax NG. XML Schema includes a data-typing sublanguage which can be used to represent many complex structures in serialised form. In addition to basic types such as integers, booleans and strings, it can also be used to represent complex types such as records and sequences.

The downside of XML is its verbosity. Everything is represented as strings surrounded by tags (as strings). Using strings for everything uses space. Processing tags, parsing strings into integers etc all costs computational power. This doesn't matter if you have enough power (e.g the Raspberry Pi and above), but would be unusable on a constrained device such as the Arduino.

JSON

JSON (JavaScript Object Notation) arose as a serialisation mechanism for JavaScript objects. It uses a fairly simple format which is more lightweight than XML. The structure of JavaScript objects makes it suitable for representation of many objects of other programming languages, and there are encoding and decoding engines for many languages.

JSON can represent the basic types of strings, numbers and booleans, and the structured types of arrays and objects s lists of key:value pairs. A typical example from Java API for JSON Processing: An Introduction to JSON is

	
{
    "firstName": "John",
    "lastName": "Smith",
    "age": 25,
    "address": {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": 10021
    },
    "phoneNumbers": [
        {
            "type": "home",
            "number": "212 555-1234"
        },
        {
            "type": "fax",
            "number": "646 555-4567" 
        }
    ] 
}
	
      

Like XML, strings are used to represent all types, so string storage, parsers, etc may be an issue for constrained devices.

Binary JSON (CBOR)

String processing is expensive, both computationally and in storage. There are several systems which provide a binary encoding of JSON using native data formats for things like integers and reducing the processing time in general.

There are several alternative schemes including

CBOR (Concise Binary Object Representation) is an IETF RFC (RFC 7049). As such, it is likely to become the prominent binary encoding method

Google Protocol Buffers

Protocol Buffers are designed as a language-independent binary protocol with an external specification. Translators exist for many languages. This may become a major representation technique.


Copyright © Jan Newmarch, jan@newmarch.name
Creative Commons License
"The Internet of Things - a techie's viewpoint" by Jan Newmarch is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://jan.newmarch.name/IoT/.

If you like this book, please donate using PayPal