One of the most popular multiple-language serialization techniques is Protocol Buffers from Google. Google supports the languages C++, C#, Dart, Go, Java and Python and there is community support for many others, including the ones in this book
It uses a binary format, and the format is defined by a specification language. A data structure is defined in this language, and compilers then generate specific language versions which can be included programs written in those languages.
We will define a Person
datatype in this section
and then for each of the languages dealt with will show a
client and a server that can deal with messages sent using that
data type.
The specification language is described in the Protocol Buffers Language Guide (proto3)
Suppose we have data about a person and their email addresses. Informally it could look like this
Name {
string family
string personal
}
Email {
string kind
string address
}
Person {
Name name
Email[] emails
}
An example could be
Person {
Name: {
family: "Newmarch"
personal: "Jan"
}
Email[]: {
Email: {
kind: "home"
address: "jan@newmarch.name"
}
Email: {
kind: "work"
address: "j.newmarch@boxhill.edu.au"
}
}
}
The specification in the protocol buffers specification language would be as in personv3.proto :
syntax = "proto3";
package person;
message Person {
message Name {
string family = 1;
string personal = 2;
}
message Email {
string kind = 1;
string address = 2;
}
Name name = 1;
repeated Email emails = 2;
}
Serialized, it could (but this is too simple) look something like
1 1 Newmarch 2 Jan 2 1 home 2 jan@newmarch.name 1 work 2 j.newmarch@boxhill.edu.au
This could be stored in a file, sent across the network, attached to a web page, etc.
We will just use it for network data transmission.
The specification can be compiled into appropriate code for a number of
languages using the protoc
compiler.
On my system, the packaged version is only at version 3.0.0 and is
a couple of years old, while the current version (at June 1, 2020)
is 3.12.2
There has been evolution, upward compatable for the specification language
and the wire
protocol, but not so clean for some of the language APIs.
So it is best to use the latest version rather than the one in a
distro's repositories.
The latest version is at
GitHub .
The compiler itself comes in compiled versions for various platforms,
such as protoc-3.12.2-linux-x86_64.zip
for 64-bit
Linux systems. This should be downloaded and unzipped into a
suitable directory such as /usr/local/
for Linux.
In addition to this, you also need the translator specifications for each
target programming language, such as protobuf-java-3.12.2.zip
for Java. This gives you source code, which has to built.
It is easier to get a ready-built version from a site such as
JAR Download. For example, for Java it is on the page
Download com.google.protobuf protobuf-java JAR files with dependency
Installing the protoc
compiler was discussed in the last section.
It can then be run on the example specification file personv3.proto
by
protoc --java_out=java-src-dir personv3.proto
where java-src-dir
is where the generated files will be placed.
As the specification file gave the package name as person
,
a subdirectory will be created java-src-dir/person
with contents
Personv3.java
(note the capitalisation of the class name).
This file contains a definition of the class Personv3
in package
person
. For each message type in the prototype specification there
is an inner class, here
Personv3.Person
Personv3.Person.Name
Personv3.Person.Email
The
Introduction to Creational Design Patterns: Builder
design pattern has been adopted for the generated Java code.
So instead of e.g. a Person()
constructor,
you use a PersonBuilder()
to construct a builder,
which is then populated by its attributes using methods
such as setName()
and addEmail()
.
The Person
object is then created by calling
build()
on the builder.
Method chaining
(see
Method Chaining In Java with Examples) is used
to avoid repeated calls on the same object.
The advantage is that code such as
Personv3.Person.Name.Builder nameBldr = Personv3.Person.Name.newBuilder();
nameBldr.setPersonal("Jan");
nameBldr.setFamily("Newmarch");
Personv3.Person.Name name = nameBldr.build();
can be reduced to
Personv3.Person.Name name = Personv3.Person.Name.newBuilder()
.setPersonal("Jan")
.setFamily("Newmarch")
.build();
The methods to populate a NameBuilder
are
setPersonal()
and setFamily()
.
The methods to populate an EmailBuilder
are
setKind()
and setAddress()
,
while the methods to populate a PersonBuilder
are setName()
and for the repeated email field,
addEmail()
.
In addition, a Person
object has methods
byte[] toByteArray()
static Person parseFrom(byte[] data)
void writeTo(OutputStream output
static Person parseFrom(InputStream input)
The client creates the example Person
using the various builders.
It then connects to a server and sends the serialized binary form
on an OutputStream
to the server. Then it terminates.
The code is PersonClient.java:
import person.*;
import java.util.Arrays;
import java.io.*;
import java.net.*;
public class PersonClient {
public static final int SERVER_PORT = 2001;
public static void main(String[] args) {
if (args.length != 1) {
System.err.println("Usage: Client address");
System.exit(1);
}
Personv3.Person.Name name = Personv3.Person.Name.newBuilder()
.setPersonal("Jan")
.setFamily("Newmarch")
.build();
Personv3.Person.Email email1 = Personv3.Person.Email.newBuilder()
.setKind("private")
.setAddress("jan@newmarch.name")
.build();
Personv3.Person.Email email2 = Personv3.Person.Email.newBuilder()
.setKind("work")
.setAddress("j.newmarch@boxhill.edu.au")
.build();
Personv3.Person person = Personv3.Person.newBuilder()
.setName(name)
.addEmail(email1)
.addEmail(email2)
.build();
String str = person.toString();
System.out.println("Sending: " + str);
InetAddress address = null;
try {
address = InetAddress.getByName(args[0]);
} catch(UnknownHostException e) {
e.printStackTrace();
System.exit(2);
}
Socket sock = null;
try {
sock = new Socket(address, SERVER_PORT);
System.out.println("Connected");
} catch(IOException e) {
e.printStackTrace();
System.exit(3);
}
OutputStream out = null;
try {
out = sock.getOutputStream();
} catch(IOException e) {
e.printStackTrace();
System.exit(5);
}
try {
person.writeTo(out);
} catch(IOException e) {
e.printStackTrace();
System.exit(6);
}
}
}
To build the client, you need to include the source for the client,
the generated Java files and the jar file for the Protobuf support.
With the person
directory, the jar file and the client file in the
same directory, the Linux compile command is
javac PersonClient.java person/Personv3.java -cp protobuf-java-3.12.2.jar
and the run command is
java -cp .:protobuf-java-3.12.2.jar PersonClient localhost
The server listens for connections and then reads a single Person
on the connection's InputStream
. It then prints the
Person
to stdout and terminates the connection. It is
PersonServer.java:
import person.*;
import java.util.Arrays;
import java.io.*;
import java.net.*;
public class PersonServer {
public static final int SERVER_PORT = 2001;
public static void main(String[] args){
ServerSocket s = null;
try {
s = new ServerSocket(SERVER_PORT);
} catch(IOException e) {
System.out.println(e);
System.exit(1);
}
while (true) {
Socket incoming = null;
try {
incoming = s.accept();
System.out.println("Connected");
} catch(IOException e) {
System.out.println(e);
continue;
}
handleSocket(incoming);
}
}
public static void handleSocket(Socket incoming) {
InputStream in;
try {
in = incoming.getInputStream();
} catch(IOException e) {
System.err.println(e.toString());
return;
}
Personv3.Person person;
try {
person = Personv3.Person.parseFrom(in);
System.out.println("Receiving: " + person.toString());
} catch(IOException e) {
System.err.println(e.toString());
return;
}
}
}
the Linux compile command is
javac PeersonServer.java person/Personv3.java -cp protobuf-java-3.12.2.jar
and the run command is
java -cp .:protobuf-java-3.12.2.jar PersonServer
Both client and the server print the Person
to stdout.
The output from each should be
name {
family: "Newmarch"
personal: "Jan"
}
email {
kind: "private"
address: "jan@newmarch.name"
}
email {
kind: "work"
address: "j.newmarch@boxhill.edu.au"
}
The compile command is
protoc --go_out=go/src/person personv3.proto
Note that the generated file is put in a subdirectory src
.
This allows the Go compiler to find it from the GOPATH
environment variable.
Go is not officially supported by Protocol Buffers, so you will need to install the Go PB files. The instruction at Protocol Buffer Basics: Go says
go install google.golang.org/protobuf/cmd/protoc-gen-go
This seems to be incorrect now, possibly the doco is ahead
of the implementation. It currently needs to be
go install github.com/golang/protobuf/cmd/protoc-gen-go
and in the applications using this, the import needs to be include
"github.com/golang/protobuf/proto"
.
The generated file is src/person/personv3.pb.go
.
it contains structures person.Person
,
person.Person_Name
and
person.Person_Email
.
The proto
has functions Marshal()
and Unmarshal()
.
The client creates the structures, calls proto.Marshal()
to serialize the person, and then writes it to a server.
It is
PersonClient.go:
/*
*/
package main
import (
"fmt"
"github.com/golang/protobuf/proto"
"os"
"net"
"person"
)
func main() {
if len(os.Args) != 2 {
fmt.Fprint(os.Stderr, "Usage: ", os.Args[0], " host:port\n")
os.Exit(1)
}
service := os.Args[1]
name := person.Person_Name{
Family: "newmarch",
Personal: "jan"}
email1 := person.Person_Email{
Kind: "home",
Address: "jan@newmarch.name"}
email2 := person.Person_Email{
Kind: "work",
Address: "j.newmarch@boxhill.edu.au"}
emails := []*person.Person_Email{&email1, &email2}
p := person.Person{
Name: &name,
Email: emails,
}
fmt.Println(p)
data, err := proto.Marshal(&p)
checkError(err)
conn, err := net.Dial("tcp", service)
checkError(err)
fmt.Println("Connected")
_, err = conn.Write(data)
checkError(err)
}
func checkError(err error) {
if err != nil {
fmt.Println("Fatal error ", err.Error())
os.Exit(1)
}
}
To run it, you need to set GOPATH
to find all the relevant files.
In my system it is
GOPATH=/home/.../go:/usr/lib/go-1.10/
go run PersonClient.go localhost:2001
The server listens, reads binary data and calls
proto.Unmarshal()
. It just has to be careful
ot to pass in the entire byte array, just the slice read.
It is
PersonServer.go:
/* SimpleEchoServer
*/
package main
import (
"fmt"
"net"
"os"
"github.com/golang/protobuf/proto"
"person"
)
func main() {
service := ":2001"
tcpAddr, err := net.ResolveTCPAddr("tcp", service)
checkError(err)
listener, err := net.ListenTCP("tcp", tcpAddr)
checkError(err)
for {
conn, err := listener.Accept()
if err != nil {
continue
}
fmt.Println("Connected")
// run as a coroutine
go handleClient(conn)
}
}
func handleClient(conn net.Conn) {
// close connection on exit
defer conn.Close()
var data []byte
data = make([]byte, 128, 128)
nread, err := conn.Read(data)
if err != nil {
fmt.Println("Disconnecting")
return
}
p := &person.Person{}
err = proto.Unmarshal(data[0:nread], p)
if err != nil {
fmt.Println("Error unmarshalling %s", err.Error())
return
}
fmt.Println("Received: ", p)
}
func checkError(err error) {
if err != nil {
fmt.Fprintf(os.Stderr, "Fatal error: %s", err.Error())
os.Exit(1)
}
}
To run it, you need to set GOPATH
to find all the relevant files.
In my system it is
GOPATH=/home/.../go:/usr/lib/go-1.10/
go run PersonServer.go
The Python files are generated from the prototype by
protoc personv3.proto --python_out=python
When I tried to compile my client (see later), I got the error
AttributeError: module 'google.protobuf.descriptor' has no attribute '_internal_create_key'
I am now using the 3.12.2 version of protoc
.
It seems that the version installed in Python 3.6 (version 3.11.2)
isn't quite compatable with this.
I had to do
pip3 uninstall protobuf
pip3 install protobuf
to get the versions right.
The compiler just creates one file. However, it isn't nice to look at, as it uses Python meta-calls to generate classes. However, you can write programs assuming that it has generated standard Python classes.
For the example we use, the classes are Person
,
Person.Name
and Person.Email
.
The client creates a Person
and then fills in the fields.
To add a Name
there are two ways:
create a Name
object, assigns its fields and then copy
its value into the Person.Name
by
CopyFrom()
,
or assign the fields of Person.Name
directly.
For the list of emails, add()
new emails
and fill in the fields of each email.
The client is PersonClient.py:
import socket
import fileinput
import sys
import personv3_pb2
PORT = 2001
if len(sys.argv) < 2:
print('Usage: comm hostname')
exit(1)
host = sys.argv[1]
person = personv3_pb2.Person()
#name = person.Name()
#name.family = 'Newmarch'
#name.personal = 'Jan'
#person.name.CopyFrom(name)
person.name.family = 'Newmarch'
person.name.personal = 'Jan'
email1 = person.email.add()
email1.kind = 'private'
email1.address = 'jan@newmarch.name'
email2 = person.email.add()
email2.kind = 'work'
email2.address = 'j.newmarch@boxhill.edu.au'
print(person)
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect((host, PORT))
s.sendall(person.SerializeToString())
The server listens for client connections and reads the data.
It creates a Person
object and then calls the method
ParseFromStrin()
on the (binary) data.
Here the servers just prints the created person and terminates the connection.
The server is PersonServer.py:
# https://realpython.com/python-sockets/#echo-server
# sequential server
import socket
import personv3_pb2
HOST = '' # INADDR_ANY
PORT = 2001
with socket.socket(socket.AF_INET6, socket.SOCK_STREAM) as s:
s.bind((HOST, PORT))
s.listen()
while True:
conn, addr = s.accept()
with conn:
print('Connected by', addr)
while True:
data = conn.recv(1024)
if not data:
print('Disconnecting')
break
person = personv3_pb2.Person()
person.ParseFromString(data)
print(person)
Copyright © Jan Newmarch, jan@newmarch.name
" Network Programming using Java, Go, Python, Rust, JavaScript and Julia"
by
Jan Newmarch
is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License
.
Based on a work at
https://jan.newmarch.name/NetworkProgramming/
.