Protocol Design

Part 1 (Ogg-Vorbis format, 6Mbytes) Part 1 (WAV format, 34Mbytes)
Part 2 (Ogg-Vorbis format, 22Mbytes) Part 2 (MP3, 21Mbytes) Part 2 (WAV format, 236Mbytes)
Part 3 (Ogg-Vorbis format, 55Mbytes) Part 3 (MP3, 53Mbytes) Part 3 (WAV format, 58Mbytes)

Introduction

A client and server need to exchange information via messages. TCP and UDP provide the transport mechanisms to do this. The two processes also have to have a protocol in place so that message exchange can take place meaningfully.

Protocol Design

Some parameters are

Version control

A protocol used in a client/server system will evolve over time, changing as the system expands. This raises compatability problems: a version 2 client will make requests that a version 1 server doesn't understand, whereas a version 2 server will send replies that a version 1 client won't understand.

Each side should ideally be able to understand messages for its own version and all earlier ones. It should be able to write replies to old style queries in old style response format.

The ability to talk earlier version formats may be lost if the protocol changes too much. In this case, you need to be able to ensure that no copies of the earlier version still exist (imposible, of course...).

Part of the protocol setup should involve version information.

The Web

The Web is a good example of a system that is messed up by different versions. The protocol has been through two versions, and most servers/browsers use the later version. The version is given in each request

request version
GET / pre 1.0
GET / HTTP/1.0 HTTP 1.0
GET / HTTP/1.1 HTTP 1.1

But the content of the messages has been through a large number of versions:

JNLP

Data Serialisation

Messages are sent across the network as a sequence of bytes, which has no structure except for a linear stream of bytes. Programming languages use structured data such as

For example, sending the following variable length table of two columns of variable length strings:

fred programmer
liping analyst
sureerat manager
could be done by
    3                // 3 rows, 2 columns assumed
    4 fred           // 4 char string,col 1
    10 programmer    // 10 char string,col 2
    6 liping         // 6 char string, col 1
    7 analyst        // 7 char string, col 2
    8 sureerat       // 8 char string, col 1
    7 manager        // 7 char string, col 2

Variable length things can alternatively have their length indicated by terminating them with an "illegal" value, such as '\0' for strings:

    3
    fred\0        
    programmer\0
    liping\0
    analyst\0
    sureerat\0
    manager\0

To send the same data as a 3-row fixed table of two columns of strings of length 8 and 10 respectively:

    fred\0\0\0\0
    programmer
    liping\0\0
    analyst\0\0\0
    sureerat
    manager\0\0\0

Any of these formats is okay - but the protocol must specify which one is used.

Java serialization

Java classes can be marked as implementing the interface Serializable. This interface has no methods, so no extra code has to be written for Serializable classes.

If a class is Serializable, then the methods ObjectOutputStream.writeObject() and ObjectInputStream.readObject() can be called. This allows an object (and all objects it references) to be written out (e.g. to a file) and then read back and retored.

This can be used to save and restore objects, to make them persistent.

If the client and server are both Java applications, then objects in one JVM can be written out to the other JVM. i.e. this gives mobile objects. This is used by e.g. RMI (Remote Method Invocation). It cannot be used for general client/server applications, because a C/Perl/Ada etc program does not understand Java objects.

Message Format

Usually, the first part of the message will be a message type.

The message types can be strings or integers. e.g. HTTPD uses integers such as 404 to mean "not found". The messages from client to server and vice versa are disjoint: "LOGIN" from client to server is different to "LOGIN" from server to client.

Data Format

There are two main data format choices: byte encoded or character encoded.

Byte format

In the byte format

The advantages are compactness and hence speed. The disadvantages are caused by the opaqueness of the data: it may be harder to spot errors, harder to debug, require special purpose decoding functions.

Pseudocode for a byte-format server is

    handleSocket() {
        while (true) {
            byte b = in.readByte()
            switch (b) {
                case MSG_1: ...
                case MSG_2: ...
                ...
            }
        }
    }

The Java classes to use for this representation are

e.g. to write an array of four int's:

    int[] a =  new int[...];
    DataOutputStream out = new DataOutputStream(...)
    out.writeByte(4);
    for (int n = 0; n < 4; n++)
        out.writeInt(a[n]);
In this unit, you write to the output stream of a socket

To read them in

    byte size = in.readByte();
    int[] a = new int[size];
    for (byte n = 0; n < size; n++)
        in.readInt(a[n]);
In this unit, you read from the inout stream of a socket

Unsigned types

Java does not support unsigned data types such as unsigned int or unsigned long. These can only be handled by reading them as signed, and then if they are negative converting them to the next size up and adding the max value of the original type. This can be done by

Character Format

In this mode, everything is sent in character mode if possible. For example, the four-byte string "1024" instead of the one byte 1024. Data that is inherently binary may be uuencoded to change it into a 7-bit format and then sent as ASCII characters.

In character format,

Pseudocode is

handleSocket() {
    line = in.readLine()
    if (line.startsWith(...) {
        ...
    } else if (line.startsWith(...) {
        ...
    }
}

Character formats are easier to setup, easier to debug, but carry higher overheads: plus other problems.

ASCII

The standard 7-bit character sets are EBCDIC and ASCII. The Internet tends to expect ASCII because of its Unix origin. EBCDIC characters would need to be converted before being put on the wire. The Unix program dd may be useful for this.

The "standard" ASCII set allows some variations: characters such as `[' are not required to be present and may be substituted for others. The ISO 646 character set is a subset of full ASCII and is totally portable.

The following table shows ISO 646 in blue, the other ASCII characters in red

! " # $ % & ' ( ) * + , - .
0 1 2 3 4 5 6 7 8 9 : ; < = >
@ A B C D E F G H I J K L M N
P Q R S T U V W X Y Z [ \ ] ^
` a b c d e f g h i j k l m n
p q r s t u v w x y z { | } ~

ISO 8859

The European character sets are 8-bit sets. The first 128 characters (the 7-bit subset) are the same as ASCII. The top 128 bits represent additional European characters. These vary across the continent. The most common set is ISO8859-1, covering Western Europe

ISO8859-2 etc, cover other European regions plus Russia, Israel, etc.

The classes to use for 7-bit ASCII and 8-bit ISO 8859 are

Also the Character methods such as isAplha() work okay with ISO 8859

The PrintStream methods write Java characters and strings in 8-bit format. BufferedReader.readLine() reads 8-bit characters into a Java string.

Newline

In Unix, the newline character is '\n'. In MSDOS, it is the pair "\r\n". Text files in Unix need to be converted in order to be read properly by MSDOS, and vice versa.

Programs that write lines may be different:

    out.print("abcd\n");   // Unix
    out.print("abcd\r\n"); //MSDOS
Note that "\r\n" is not the same as "\n\r".

There is a system property line.separator which is different for Unix and MSDOS

    String separator = System.getProperty("line.separator");

The PrintStream.println(...) methods use this separator, so they write (local) text files correctly under Unix and MSDOS.

A pair of applications could agree to use either '\n' or "\r\n".

BufferedReader.readLine() will use any of '\r', '\n' or a combination to signal end of line.

Unicode

Many Asian languages are based on hieroglyphics. They require 16-bit character coding

Unicode is the principal encoding at the moment. It is a pure 16-bit code, large enough to cover all existing languages.

Java uses Unicode internally.

If the exchange protocol uses Unicode, then there are Java methods to read and write Unicode:

The readUTF()/writeUTF() methods use only a subset of possible UTF formats.

Note: UTF-8 tries to compress the amount of space needed by only using 1 byte for any characters in the ASCII subset. The Java version of UTF-8 does not quite conform to the standard (it uses 2 bytes for the null character instead of one).

Unicode UTF

Other 16-bit character sets

Read/write other 16-bit sets

ISO 10646

Unicode is not quite large enough. It encodes Asian languages because it treats some Chinese, Japanese and Korean characters as though they were the same. This is okay unless you have a mixed language document containing both Chinese and Japanese. You won't always be able to tell when a character belongs to a particular language

ISO 10646 is a 32-bit character set. It is large enough for all known characters sets, including Egyptian hieroglyphs, Klingon and other unknown languages. There is no support for ISO 10646 in any common programming langauges.

Simple Example

A file transfer protocol - not as complex as the real FTP, or even TFTP. This is a complete worked example of creating all components of a client-server application. It is a simple version of a file transfer program which includes messages in both directions, as well as design of messaging protocol.

Look at a simple non-client-server program that allows you to list files in a directory, change directory and copy files. For simplicity, all filenames and file contents will be assumed to be in 7-bit ASCII. The pseudo-code would be


read line from user
while not eof do
  if line == dir
    list directory
  else

  if line == cd <dir>
    change directory
    if succesful
      print new directory name
    else
      complain
  else

  if line == copy <file>
    if the file can be read
      copy file
    else complain
  else

  if line == quit
    quit
  else
    complain

  read line from user

A non-distributed application would just link the UI and file access code

In a CS situtation, the client would be at the user end, talking to a server somewhere else. Aspects of this program belong solely at the presentation end, such as getting the commands from the user. Some is messages from the client to the server, some is at the server end.

For a simple file transfer, assume that all files are at the server end, and we are only transferring ASCII files from the server to the client. The transferred file is to have the same name as the original file. The client side (including presentation aspects) will become


read line from user
while not eof do
  if line == dir
    list directory
  else

  if line == cd <dir>
    change directory
    if succesful
      print new directory name
    else
      complain
  else

  if line == copy <file>
    if the file can be read
      copy file
    else
      complain
  else

  if line == quit
    quit
  else
    complain

  read line from user

where the italicised lines involve communication with the server.

Alternative presentation aspects

A GUI program, such as VB, Motif, etc, would allow directories to be displayed as lists, for files to be selected and actions such as change directory, get, to be be performed on them. The client would be controlled by actions associated with various events that take place in graphical objects. The pseudo-code might look like


change dir button:
  if there is a selected file
    change directory
  if successful
    update directory label
    list directory
    update directory list

get file button:
  if there is a selected file
    copy file

The functions called from the different UI's should be the same - changing the presentation should not change the networking code

Protocol - informal

client request server response
dir send list of files
cd <dir> change dir
send error if failed
send newdir if succeed
get <file> the file can be read
send error if failed
send file if succeed
quit quit

Text protocol

Message format:

client request server response
send "DIR" send list of files, one per line
terminated by a blank line
send "CD <dir>" change dir
send "ERROR" if failed
send "SUCCEEDED" + new directory name if succeed
send "GET <file>" the file can be read
send "ERROR" if failed
if succeed, send contents of ASCII file
prefixed by the number of lines
send "QUIT" close connection

Common code

Common definitions used by both client and server



/**
 * FileTransferTextConstants.java
 */

public class FileTransferTextConstants {

    public static final String CD = "CD";
    public static final String DIR = "DIR";
    public static final String GET = "GET";
    public static final String ERROR = "ERROR";
    public static final String SUCCEEDED = "SUCCEEDED";
    public static final String QUIT = "QUIT";
    public static final int PORT = 18889;
    public static final String CR_LF = "\r\n";
}// FileTransferTextConstants

Server code (incomplete)


import java.io.*;
import java.net.*;

public class FileTransferTextServer {
    
    public static void main(String argv[]) {
	ServerSocket s = null;
	try {
	    s = new ServerSocket(FileTransferTextConstants.PORT);
	} catch(IOException e) {
	    System.out.println(e);
	    System.exit(1);
	}

	while (true) {
	    Socket incoming = null;
	    try {
		incoming = s.accept();
	    } catch(IOException e) {
		System.out.println(e);
		continue;
	    }

	    new SocketHandler(incoming).start();
	}
    }
}

class SocketHandler extends Thread {

    Socket incoming;
    File clientDir = new File(".");
    BufferedReader reader;
    PrintStream out;

    SocketHandler(Socket incoming) {
	this.incoming = incoming;
    }

    public void run() {
	try {
	    reader = new BufferedReader(new InputStreamReader(
					incoming.getInputStream()));
	    out = new PrintStream(incoming.getOutputStream());
	    
	    while (true) {
		String line = reader.readLine();
		if (line == null) { 
		    break;
		}
		System.out.println("Received request: " + line);

		if (line.startsWith(FileTransferTextConstants.CD)) {
		    changeDirRequest(losePrefix(line, 
						FileTransferTextConstants.CD));
		} else if (line.startsWith(FileTransferTextConstants.DIR)) {
		    directoryRequest();
		} else if (line.startsWith(FileTransferTextConstants.GET)) {
		    // code omitted
		} else if (line.startsWith(FileTransferTextConstants.QUIT)) {
		    break;
		} else {
		    out.print(FileTransferTextConstants.ERROR + 
			      FileTransferTextConstants.CR_LF);
		}
		
	    }
	    incoming.close();
	} catch(IOException e) {
	    e.printStackTrace();
	}
    }


    /**
     * Given that the string starts with the prefix,
     * get rid of the prefix and any following whitespace
     */
    public String losePrefix(String str, String prefix) {
	int index = prefix.length();
	String ret = str.substring(index).trim();
	return ret;

    }

    public void changeDirRequest(String dir) {
	File newDir = new File(clientDir, dir);
	if (newDir.isDirectory()) {
	    clientDir = newDir;
	    try {
		out.print(FileTransferTextConstants.SUCCEEDED + " " +
			  clientDir.getCanonicalPath() +
			  FileTransferTextConstants.CR_LF);
	    } catch(IOException e) {
		e.printStackTrace();
	    }
	} else {
	    out.print(FileTransferTextConstants.ERROR +
		      FileTransferTextConstants.CR_LF);
	}
    }

    public void directoryRequest() {
	String[] fileNames = clientDir.list();
	if (fileNames == null) {
	    out.print(FileTransferTextConstants.ERROR +
		      FileTransferTextConstants.CR_LF);
	}
	for (int n = 0; n < fileNames.length; n++) {
	    out.print(fileNames[n] +
		      FileTransferTextConstants.CR_LF);
	}
	out.print(FileTransferTextConstants.CR_LF);
    }
}

Client code (incomplete)



/**
 * FileTransferTextClient.java
 */

/**
 * WARNING: the following code is okay as procedural code
 * but it sucks as O/O code
 */

import java.io.*;
import java.net.*;

public class FileTransferTextClient {

    private final static String  UI_DIR = "dir";
    private final static String  UI_CD = "cd";
    private final static String  UI_GET = "get";
    private final static String  UI_QUIT = "quit";

    protected Socket sock;
    protected BufferedReader reader;
    protected BufferedReader console;
    protected PrintStream writer;

    public static void main(String[] args){

	if (args.length != 1) {
	    System.err.println("Usage: Client address");
	    System.exit(1);
	}
	new FileTransferTextClient(args[0]);
    }

    public FileTransferTextClient(String server) {

	InetAddress address = null;
	try {
	    address = InetAddress.getByName(server);
	} catch(UnknownHostException e) {
	    e.printStackTrace();
	    System.exit(2);
	}

	sock = null;
	InputStream in = null;
	OutputStream out = null;
	try {
	    sock = new Socket(address, FileTransferTextConstants.PORT);
	    in = sock.getInputStream();
	    out = sock.getOutputStream();
	} catch(IOException e) {
	    e.printStackTrace();
	    System.exit(3);
	}

	reader = new BufferedReader(new InputStreamReader(in));
	writer = new PrintStream(out);
	console = new BufferedReader(new InputStreamReader(System.in));

	while (true) {
	    String line = null;
	    try {
		System.out.print("Enter request: ");
		line = console.readLine();
		System.out.println("Request was " + line);
	    } catch(IOException e) {
		e.printStackTrace();
		exit();
	    }

	    if (line.equals(UI_DIR)) {
		directoryRequest();
	    } else if (line.startsWith(UI_CD)) {
		changeDirRequest(losePrefix(line, 
					    UI_CD));
	    } else if (line.startsWith(UI_GET)) {
		getFileRequest(losePrefix(line, 
					  UI_GET));
	    } else if (line.equals(UI_QUIT)) {
		exit();
	    } else {
		System.out.println("Unrecognised command");
	    }
	}
    }

    /**
     * Given that the string starts with the prefix,
     * get rid of the prefix and any whitespace
     */
    public String losePrefix(String str, String prefix) {
	int index = prefix.length();
	String ret = str.substring(index).trim();
	return ret;
    }

    public void exit() {
	try {
	    writer.print(FileTransferTextConstants.QUIT +
			 FileTransferTextConstants.CR_LF);
	    reader.close();
	    writer.close();
	    sock.close();
	} catch(Exception e) {
	    e.printStackTrace();
	}
	System.exit(0);
    }

    public void directoryRequest() {
	writer.print(FileTransferTextConstants.DIR + 
		     FileTransferTextConstants.CR_LF);

	System.out.println("Dir listing is:");
	String line = null;
	while (true) {
	    try {
		line = reader.readLine();
	    } catch(IOException e) {
		break;
	    }
	    if (line.equals("")) {
		break;
	    }
	    System.out.println(line);
	}
    }

    public void changeDirRequest(String dir) {
	writer.print(FileTransferTextConstants.CD + " " + dir + 
		     FileTransferTextConstants.CR_LF);

	String response = null;
	try {
	    response = reader.readLine();
	} catch (IOException e) {
	    e.printStackTrace();
	    return;
	}

	if (response.equals(FileTransferTextConstants.ERROR)) {
	    System.out.println("Error in DIR request");
	} else if (response.startsWith(FileTransferTextConstants.SUCCEEDED)) {
	    String newdir = losePrefix(response, 
				       FileTransferTextConstants.SUCCEEDED);
	    System.out.println("Changed dir to " + newdir);
	} else {
	    System.out.println("Illegal response from server" +
			       response);
	}
    }

    public void getFileRequest(String filename) {
	// code omitted
    }
} // FileTransferTextClient









IP Address

References

Unicode Consortium The Unicode Standard ISBN 0-201-56788-1, QA 268.U55

D. H. Crocker Standard for the Format of ARPA Internet Text Messages IETF RFC 822

The IETF RFC's may be obtained from ftp://ietf.org/internet-drafts/ or http://www.garlic.com/~lynn/rfcietf.htm


This page is maintained by Jan Newmarch http://jan.newmarch.name
Copyright © Jan Newmarch, Monash University, 2007
Creative Commons License This work is licensed under a Creative Commons License
The moral right of Jan Newmarch to be identified as the author of this page has been asserted.