Serialisation: Protocol Buffers

General

Google Protocol Buffers

One of the most popular multiple-language serialization techniques is Protocol Buffers from Google. Google supports the languages C++, C#, Dart, Go, Java and Python and there is community support for many others, including the ones in this book

It uses a binary format, and the format is defined by a specification language. A data structure is defined in this language, and compilers then generate specific language versions which can be included programs written in those languages.

We will define a Person datatype in this section and then for each of the languages dealt with will show a client and a server that can deal with messages sent using that data type.

The specification language is described in the Protocol Buffers Language Guide (proto3)

Suppose we have data about a person and their email addresses. Informally it could look like this


Name {
    string family
    string personal
}

Email {
    string kind
    string address
}

Person {
    Name name
    Email[] emails
}
      

An example could be


Person {
    Name: {
              family: "Newmarch"
              personal: "Jan"
          }
    Email[]: {
              Email: {
                        kind: "home"
                        address: "jan@newmarch.name"
                     }
              Email: {
                        kind: "work"
                        address: "j.newmarch@boxhill.edu.au"
                     }
             }
}
      

The specification in the protocol buffers specification language would be as in personv3.proto :


syntax = "proto3";
package person;

message Person {

	message Name {
        	string family = 1;
        	string personal = 2;
	}

	message Email {
        	string kind = 1;
        	string address = 2;
	}

	Name  name = 1;
        repeated Email emails = 2;

}
      

Serialized, it could (but this is too simple) look something like


1 1 Newmarch 2 Jan 2 1 home 2 jan@newmarch.name 1 work 2 j.newmarch@boxhill.edu.au
      
This could be stored in a file, sent across the network, attached to a web page, etc. We will just use it for network data transmission.

The protobuf compiler

The specification can be compiled into appropriate code for a number of languages using the protoc compiler. On my system, the packaged version is only at version 3.0.0 and is a couple of years old, while the current version (at June 1, 2020) is 3.12.2

There has been evolution, upward compatable for the specification language and the wire protocol, but not so clean for some of the language APIs. So it is best to use the latest version rather than the one in a distro's repositories. The latest version is at GitHub . The compiler itself comes in compiled versions for various platforms, such as protoc-3.12.2-linux-x86_64.zip for 64-bit Linux systems. This should be downloaded and unzipped into a suitable directory such as /usr/local/ for Linux.

In addition to this, you also need the translator specifications for each target programming language, such as protobuf-java-3.12.2.zip for Java. This gives you source code, which has to built. It is easier to get a ready-built version from a site such as JAR Download. For example, for Java it is on the page Download com.google.protobuf protobuf-java JAR files with dependency

Resources

Java

Compiling prototype files

Installing the protoc compiler was discussed in the last section. It can then be run on the example specification file personv3.proto by


      protoc --java_out=java-src-dir personv3.proto
  
where java-src-dir is where the generated files will be placed. As the specification file gave the package name as person, a subdirectory will be created java-src-dir/person with contents Personv3.java (note the capitalisation of the class name).

Structure of Java files

This file contains a definition of the class Personv3 in package person. For each message type in the prototype specification there is an inner class, here

The Introduction to Creational Design Patterns: Builder design pattern has been adopted for the generated Java code. So instead of e.g. a Person() constructor, you use a PersonBuilder() to construct a builder, which is then populated by its attributes using methods such as setName() and addEmail(). The Person object is then created by calling build() on the builder. Method chaining (see Method Chaining In Java with Examples) is used to avoid repeated calls on the same object.

The advantage is that code such as


      Personv3.Person.Name.Builder nameBldr = Personv3.Person.Name.newBuilder();
      nameBldr.setPersonal("Jan");
      nameBldr.setFamily("Newmarch");
      Personv3.Person.Name name = nameBldr.build();
  
can be reduced to

      Personv3.Person.Name name = Personv3.Person.Name.newBuilder()
                                          .setPersonal("Jan")
                                          .setFamily("Newmarch")
                                          .build();
  

The methods to populate a NameBuilder are setPersonal() and setFamily(). The methods to populate an EmailBuilder are setKind()and setAddress(), while the methods to populate a PersonBuilder are setName() and for the repeated email field, addEmail().

In addition, a Person object has methods

to allow reading and writing bytes or to streams.

Person client

The client creates the example Person using the various builders. It then connects to a server and sends the serialized binary form on an OutputStream to the server. Then it terminates.

The code is PersonClient.java:


import person.*;
import java.util.Arrays;
import java.io.*;
import java.net.*;


public class PersonClient {

    public static final int SERVER_PORT = 2001;
 
    public static void main(String[] args) {
	
	if (args.length != 1) {
            System.err.println("Usage: Client address");
            System.exit(1);
        }

	Personv3.Person.Name name = Personv3.Person.Name.newBuilder()
	    .setPersonal("Jan")
	    .setFamily("Newmarch")
	    .build();
	
	Personv3.Person.Email email1 = Personv3.Person.Email.newBuilder()
	    .setKind("private")
	    .setAddress("jan@newmarch.name")
	    .build();

	Personv3.Person.Email email2 = Personv3.Person.Email.newBuilder()
	    .setKind("work")
	    .setAddress("j.newmarch@boxhill.edu.au")
	    .build();

	Personv3.Person person = Personv3.Person.newBuilder()
	    .setName(name)
	    .addEmail(email1)
	    .addEmail(email2)
	    .build();
	
	String str = person.toString();
	System.out.println("Sending: " + str);

        InetAddress address = null;
        try {
            address = InetAddress.getByName(args[0]);
        } catch(UnknownHostException e) {
            e.printStackTrace();
            System.exit(2);
        }

        Socket sock = null;
        try {
            sock = new Socket(address, SERVER_PORT);
	    System.out.println("Connected");
        } catch(IOException e) {
            e.printStackTrace();
            System.exit(3);
        }
	
        OutputStream out = null;
        try {
            out = sock.getOutputStream();
        } catch(IOException e) {
            e.printStackTrace();
            System.exit(5);
        }
	try {
	    person.writeTo(out);
	} catch(IOException e) {
	    e.printStackTrace();
            System.exit(6);

	}
    }

}

To build the client, you need to include the source for the client, the generated Java files and the jar file for the Protobuf support. With the person directory, the jar file and the client file in the same directory, the Linux compile command is


      javac PersonClient.java person/Personv3.java -cp protobuf-java-3.12.2.jar
  
and the run command is

      java  -cp .:protobuf-java-3.12.2.jar PersonClient localhost
  

Person server

The server listens for connections and then reads a single Person on the connection's InputStream. It then prints the Person to stdout and terminates the connection. It is PersonServer.java:


import person.*;
import java.util.Arrays;
import java.io.*;
import java.net.*;


public class PersonServer {

    public static final int SERVER_PORT = 2001;
 
    public static void main(String[] args){
	
	ServerSocket s = null;
        try {
            s = new ServerSocket(SERVER_PORT);
        } catch(IOException e) {
            System.out.println(e);
            System.exit(1);
        }
        while (true) {
            Socket incoming = null;
            try {
                incoming = s.accept();
                System.out.println("Connected");
            } catch(IOException e) {
                System.out.println(e);
                continue;
            }

            handleSocket(incoming);
        }
    }

    public static void handleSocket(Socket incoming) {
        InputStream in;
	
	try {
	    in = incoming.getInputStream();
        }  catch(IOException e) {
            System.err.println(e.toString());
            return;
        }

	Personv3.Person person;
	try {
	    person = Personv3.Person.parseFrom(in);
	    System.out.println("Receiving: " + person.toString());
	} catch(IOException e) {
	    System.err.println(e.toString());
	    return;
	}
    }
}

the Linux compile command is


      javac PeersonServer.java person/Personv3.java -cp protobuf-java-3.12.2.jar
  
and the run command is

      java  -cp .:protobuf-java-3.12.2.jar PersonServer
  

Both client and the server print the Person to stdout. The output from each should be


name {
  family: "Newmarch"
  personal: "Jan"
}
email {
  kind: "private"
  address: "jan@newmarch.name"
}
email {
  kind: "work"
  address: "j.newmarch@boxhill.edu.au"
}
  

Go

Compiling prototype files

The compile command is


      protoc --go_out=go/src/person personv3.proto
  
Note that the generated file is put in a subdirectory src. This allows the Go compiler to find it from the GOPATH environment variable.

Go is not officially supported by Protocol Buffers, so you will need to install the Go PB files. The instruction at Protocol Buffer Basics: Go says


      go install google.golang.org/protobuf/cmd/protoc-gen-go
  
This seems to be incorrect now, possibly the doco is ahead of the implementation. It currently needs to be

      go install github.com/golang/protobuf/cmd/protoc-gen-go
  
and in the applications using this, the import needs to be include "github.com/golang/protobuf/proto".

Structure of files

The generated file is src/person/personv3.pb.go. it contains structures person.Person, person.Person_Name and person.Person_Email.

The proto has functions Marshal() and Unmarshal().

Person client

The client creates the structures, calls proto.Marshal() to serialize the person, and then writes it to a server. It is PersonClient.go:


/*
*/

package main

import (
	"fmt"
        "github.com/golang/protobuf/proto"
        "os"
	"net"
        "person"
)

func main() {

	if len(os.Args) != 2 {
                fmt.Fprint(os.Stderr, "Usage: ", os.Args[0], " host:port\n")
                os.Exit(1)
        }
        service := os.Args[1]


        name := person.Person_Name{
                Family:   "newmarch",
                Personal: "jan"}

        email1 := person.Person_Email{
                Kind:    "home",
                Address: "jan@newmarch.name"}
        email2 := person.Person_Email{
                Kind:    "work",
                Address: "j.newmarch@boxhill.edu.au"}

        emails := []*person.Person_Email{&email1, &email2}
        p := person.Person{
                Name:  &name,
                Email: emails,
        }
        fmt.Println(p)

        data, err := proto.Marshal(&p)
        checkError(err)

	conn, err := net.Dial("tcp", service)
        checkError(err)
        fmt.Println("Connected")

	_, err = conn.Write(data)
	checkError(err)


}

func checkError(err error) {
        if err != nil {
                fmt.Println("Fatal error ", err.Error())
                os.Exit(1)
        }
}

To run it, you need to set GOPATH to find all the relevant files. In my system it is


      GOPATH=/home/.../go:/usr/lib/go-1.10/
      go run PersonClient.go localhost:2001
  

Person server

The server listens, reads binary data and calls proto.Unmarshal(). It just has to be careful ot to pass in the entire byte array, just the slice read. It is PersonServer.go:


/* SimpleEchoServer
 */
package main

import (
	"fmt"
	"net"
	"os"
	"github.com/golang/protobuf/proto"
	"person"
)

func main() {

	service := ":2001"
	tcpAddr, err := net.ResolveTCPAddr("tcp", service)
	checkError(err)

	listener, err := net.ListenTCP("tcp", tcpAddr)
	checkError(err)

	for {
		conn, err := listener.Accept()
		if err != nil {
			continue
		}
		fmt.Println("Connected")
		// run as a coroutine
		go handleClient(conn)
	}
}

func handleClient(conn net.Conn) {
	// close connection on exit
        defer conn.Close()

	var data []byte
	data = make([]byte, 128, 128)

	nread, err := conn.Read(data)
	if err != nil {
		fmt.Println("Disconnecting")
		return
	}
	
	p := &person.Person{}
	err = proto.Unmarshal(data[0:nread], p)
	if err != nil {
		fmt.Println("Error unmarshalling %s", err.Error())
		return
	}
	fmt.Println("Received: ", p)
}

func checkError(err error) {
	if err != nil {
		fmt.Fprintf(os.Stderr, "Fatal error: %s", err.Error())
		os.Exit(1)
	}
}

To run it, you need to set GOPATH to find all the relevant files. In my system it is

      GOPATH=/home/.../go:/usr/lib/go-1.10/
      go run PersonServer.go
  

Python

Compiling prototype files

The Python files are generated from the prototype by


      protoc personv3.proto --python_out=python
  

When I tried to compile my client (see later), I got the error


      AttributeError: module 'google.protobuf.descriptor' has no attribute '_internal_create_key'
  
I am now using the 3.12.2 version of protoc. It seems that the version installed in Python 3.6 (version 3.11.2) isn't quite compatable with this. I had to do

pip3 uninstall protobuf
pip3 install protobuf
  
to get the versions right.

Structure of files

The compiler just creates one file. However, it isn't nice to look at, as it uses Python meta-calls to generate classes. However, you can write programs assuming that it has generated standard Python classes.

For the example we use, the classes are Person, Person.Name and Person.Email.

Person client

The client creates a Person and then fills in the fields. To add a Name there are two ways: create a Name object, assigns its fields and then copy its value into the Person.Name by CopyFrom(), or assign the fields of Person.Name directly. For the list of emails, add() new emails and fill in the fields of each email.

The client is PersonClient.py:



import socket
import fileinput
import sys

import personv3_pb2

PORT = 2001


if len(sys.argv) < 2:
    print('Usage: comm hostname')
    exit(1)

host = sys.argv[1]

person = personv3_pb2.Person()

#name = person.Name()
#name.family = 'Newmarch'
#name.personal = 'Jan'
#person.name.CopyFrom(name)

person.name.family = 'Newmarch'
person.name.personal = 'Jan'

email1 = person.email.add()
email1.kind = 'private'
email1.address = 'jan@newmarch.name'

email2 = person.email.add()
email2.kind = 'work'
email2.address = 'j.newmarch@boxhill.edu.au'

print(person)

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.connect((host, PORT))

    s.sendall(person.SerializeToString())

Person server

The server listens for client connections and reads the data. It creates a Person object and then calls the method ParseFromStrin() on the (binary) data. Here the servers just prints the created person and terminates the connection.

The server is PersonServer.py:


# https://realpython.com/python-sockets/#echo-server
# sequential server

import socket
import personv3_pb2

HOST = '' # INADDR_ANY
PORT = 2001

with socket.socket(socket.AF_INET6, socket.SOCK_STREAM) as s:
    s.bind((HOST, PORT))
    s.listen()
    while True:
        conn, addr = s.accept()
        with conn:
            print('Connected by', addr)
            while True:
                data = conn.recv(1024)
                if not data:
                    print('Disconnecting')
                    break
                person = personv3_pb2.Person()
                person.ParseFromString(data)
                print(person)


Copyright © Jan Newmarch, jan@newmarch.name
Creative Commons License
" Network Programming using Java, Go, Python, Rust, JavaScript and Julia" by Jan Newmarch is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License .
Based on a work at https://jan.newmarch.name/NetworkProgramming/ .

If you like this book, please contribute using PayPal