When a procedure is called, it usually makes use of the stack, pushing parameters
onto the stack and reserving space for local variables:
C does not have call by reference, but only call by value. Most other procedural languages have both.
The remote procedure call is intended to act like a procedure call, but to act across the network transparently.
The process makes a remote procedure call by pushing its parameters and a return address onto the stack, and jumping to the start of the procedure. The procedure itself is responsible for accessing and using the network.
After the remote execution is over, the procedure jumps back to the return address. The calling process then continues.
The server side stub has to wait for messages asking for a procedure to run.
It has to read the parameters, and present them in a suitable form to execute
the procedure locally. After execution,it has to send the results back to the
For example, the short int could use the first two bytes with the next two blank, or the other way round. The string could be prefixed by its length or be terminated by a sentinel value. If the length is sent, should it be an int? A short int? The ordinary int could be big-endian or little-endian.
The Sun RPC uses a standard format called XDR. The ordering is big-endian and the minimum size of any field is 32 bits. DCE uses a different format, as does Xerox Courier.
The message could be formed using implicit typing. That is, only the values are sent, and it is assumed that both the client and the server know what the types are meant to be.
Alternatively, there is a type specification ISO language called ASN.1 (Abstract Syntax Notation). This increases message sizes, but is more reliable.
A pointer would refer to an address in the calling procedure's address space. The remote procedure could not assign a meaning to this as it would not have access to that address space. So passing pointers is usually not possible.
How about fixed size arrays? Variable sized arrays? Variant records? Floating point numbers?
Each RPC method must have a list of acceptable data types that can be passed across the network.
If this was done by hand, then obscure errors would result. So it must be done automatically.
For a normal procedure call, the compiler is able to look at the specification of the procedure and do two things: generate the correct code for placing arguments on the stack when a procedure is called, and generate correct code for using these parameters within the procedure.
In RPC, this is more complex. The compiler must generate separate stubs, one for the client stub embedded in the application, and one for the server stub for the remote machine.
The compiler must know which parameters are in parameters and which are out. In parameters are sent from the client to server, out parameters are sent back.
Languages like C have no concept of in or out parameters. Therefore the compiler cannot be a standard C compiler, and the specification of the procedures cannot be done in C.
A typical specification might be
What errors can occur in a remote procedure call?
In C, it may be possible to return an error value for some functions, but not for all. Anyway, in Ada, if you have to use a function then you can't use the parameters like you can with procedures.
In Ada you can raise an exception, or in C generate a signal. However, Pascal has neither of these concepts.
There is no language-independant solution.
Unfortunately, what if the server has in fact received the message, but is just being slow. The request may end up being executed twice or more. This can be avoided by including an identifier in the message to stop it being retried if it has already been received.
Preventing this is the at most once problem.
One solution is to not resend messages. In this case you hit the at least once problem.
rpcgento gnerate much of the networking code from a specification file.
inparameter, and return at most one
outparameter as the function result.
If you want to use more than one
you have to wrap them up
in a single structure, and similarly with the
Multiple functions may be defined at once. They are numbered from one upwards, and any of these may be remotely executed.
The specification defines a program that will run remotely, made up of the functions. The program has a name, a version number and a unique identifying number (chosen by you).
For example, a program may have two local functions to find the date on a machine. The local definitions could be
rpcgenis a program that takes a specification file as command line parameter and generates C source files that can be used as client and server stubs.
rpcgen run on rdate.x would generate files
Note that the function returns is in terms of a pointer to the original data type. You are expected to write versions of the functions which use a variable to store the pointer value returned, and dereference this variable.
On the client side this is
Finally, the ``handle'' variable on the client side is set by a call
Putting this all together, here is an original, non-RPC program:if this source is time_clnt.c, the compile command is
On the server side, the reverse must be carried out, to insert the original contents of the functions back into the RPC stubs:if this source is time_svc.c, the compile command is
In a single computer there is only one clock, and usually only one CPU. It is easy to synchronize processes, because they can examine the clock.
For example, a process can run at a regular time each day by waking up, examining the clock, and going back to sleep if it is too early.
A backup program can examine the ``last backed up'' time and the ``last modified'' time of a file to decide whether or not it has changed since the last backup.
In a distributed system there are many clocks. Each one may have a different value of the time.
Suppose a backup program compares the last backup time set by its own clock, with the last modified time as set by the remote clock. The backup may have been done at 10.00am,and the file may have been modified at 9.50am. Clearly it does not need to be backed up - unless the second machine has a clock that is more than 10 minutes slower than the first clock.
Logical clocks are clocks that may or may not have the physically correct time (to within error). A clock will be ``wrong'' if it is out of synchronisation with other clocks. Each logical clock has a current time value.
If machine A communicates with machine B, then the time that the message left A must be before the time that it arrives at B. This is written A < B.
If two actions occur on the same machine, and A is before B according to the local clock, then again, A < B.
On the other hand, if two actions occur on two different machines,and there is no communication between them, then neither A < B nor B < A. The events are concurrent (as far as the logical time is concerned).
It is actually very simple to ensure that any two logical clocks in a system are synchronised. Firstly, observe that if two processes never communicate then they are synchronised.
Secondly if machine A sends a message to B then it must arrive after it left A. So if A includes its own time, then it must arrive at B after this.
If B's time is later anyway, fine. B is synchronised with A for that event. If B's time is earlier, then B must be slow. B advances its own time to A+1. It has now synchronised this event.
Similarly, every time two machines communicate they compare send-time to receive-time and advance any slow clock.