Unix Systems Programming using Java

Jan Newmarch, Distributed Information Laboratory
Information Sciences and Engineering, University of Canberra
jan@newmarch.name, http://jan.newmarch.name
(Currently at the CRC for Distributed Systems Technology)

The standard Java packages include some cross-platform support for common file, directory and process operations. This paper discusses an extension library that makes functions of the POSIX 1003.1 API available in addition to the standard packages.

1. Introduction

The Unix systems programming API has been standardised by a large number of POSIX standards documents from the IEEE. For the majority of programmers the most important of these is POSIX 1003.1 which covers process primitives and environments, directories and files, I/O primitives and device specific functions. Later standards cover other issues, such as Shells and Utilities (POSIX 1003.2) and Administration (POSIX 1387.2).

One of the unusual characteristics of POSIX 1003.1 is its use of the C language throughout. Whereas IEEE standards would normally be expected to be language independent, the history of Unix is so bound up with C that this was the only feasible route. The document claims that it will be the basis for definitions that are independent of particular programming languages. In fact, new language dependent versions have arisen, such as Ada (POSIX 1003.5) and Fortran (POSIX 1003.9).

Irrespective of the formal standards mechanisms, designers of libraries for other languages have used POSIX 1003.1 as the specification for their own libraries. For example, tcl [Ousterhout] and Perl [Wall] have well-developed Unix bindings that copy the syntax, and rely on the semantics, of POSIX 1003.1.

Java [Arnold] is the latest flavour in programming languages for all sorts of excellent reasons, despite some serious shortcomings. It is a full O/O language (although not as pure as SmallTalk or Eiffel); it is based on C/C++, but avoids the worst of both languages (not completely, though); by use of a virtual machine it is cross-platform (although some vendors have variant versions); it has a rich set set of libraries that deal with, for example, graphics and networking. These are combining to produce the ``Web programming'' environment where applications or applets are written once to run on any platform. Java extends the open authoring environment of HTML to an open programming environment.

How do Unix and Java co-exist? Very well in general terms. A Java application can do exactly the same things in a Unix environment that it can do in a Mac or Windows environment. With care, applications can even be written that transparently handle such issues as forward- versus back-slashes for directory separators.

Java currently lacks the depth of hooks into the POSIX API that tcl and Perl have. The standard must continue to have this lack or it will lose its universal applicability. However, extensions by non-standard libraries can supply this depth, as long as they are done in an appropriate manner.

This paper reports on a project to make the POSIX API available to Java applications or applets using a non-standard library. It encapulates POSIX calls within Java methods for a new set of classes in the posix package.

This project must be contrasted in aims and methods to the Microsoft policy with J++: their attempt is to make J++ the best environment for doing Windows programming in Java [J++], and this implicitly rejects the notion of a cross-platform environment. It may even turn out to be difficult for the programmer using J++ to be certain of how general their code is. The statements from Microsoft do not seem to dispel any uncertainty.

This project, on the other hand, makes it explicit about the introduction of non-standard features, by placing all the new classes in a package that is non-standard. No application using the posix package will ever pass the ``Java Pure'' tests. On the other hand, this package will develop by an open process method: the source code will be freely available to all, and can be freely modified. If interest is high enough, it can be divorced from individual control and given to a suitable co-ordinating authority.

2. What's in POSIX?

Most operating system APIs share common functionality. For example, both Unix and Windows have means to list the files in a directory and to change to that directory. Some of this common functionality has already been captured in the Java class File which can return a list of files in a directory. However, many useful methods are missing. For example, the ability to change directories does not appear to be there because it has different effects in DOS to Unix (in DOS it changes directories for all processes, but in Unix only for the calling process).

Some of the areas in which the POSIX API supplies extra functionality over the Java Core is given in the following (incomplete) table:

Files multiple links
file status
access mode
FIFOs
Processes file creation mask
execution times
user id
signals
process groups

3. A POSIX Library

Java allows an application or applet to be built from application-specific files and from common libraries. The Java Core libraries supply a standard set of libraries that can be guaranteed in all environments with the Java logo (although at least one vendor is likely to lose this by non-compliance with this requirement). The Core libraries cover language features (java.lang), general I/O (java.io), useful utilities (java.util), networking (java.net), windowing (java.awt), etc, plus techniques such as remote method invocation (RMI) and Java Native Interface (JNI).

Any application or applet can add classes that are written in Java. These can be accessed from local copies or downloaded across the network. This is the principle behind the network computer, and is captured in the ``Pure Java'' certification. This validates that an aplication or applet should be able to run in any Java-conforming environment.

The key to all of this is that implementations of the Java Core classes must exist on all Java-conforming platforms. It is not often stated, but these Core classes are riddled with native, platform-dependent code. They have to be: the AWT relies on the native windowing system, and the network classes rely on the native TCP/IP stack. What Pure Java really means is: use no native code apart from that sanctioned by Sun.

The Java Virtual Machine interpreters are currently written in C. There is an interface between compiled Java and the interpreter which is formalised by a set of Java to native-code C functions. Regrettably, there are several sets of interface APIs: the original set from Java 1.0, which is still heavily used but not talked about anymore (eg in the AWT); the Java Native Interface (JNI) which is the new public standard [JNI]; the Microsoft set, which differs from JNI and is one of the points of contention between Sun and Microsoft.

This POSIX library is implemented using the Java Native Interface (JNI) API. It is a ``thin'' library, calling directly into the C API with very little extra work. It will need to be compiled for all native platforms, since it is written in C as well as Java. The intent is that it should run on all Posix platforms, but of course this has still to be tested.

4. Language Differences

4.1 Return Values

The POSIX system calls in C use the return value of functions for a variety of purposes. For example, in open() a ``handle'' to the file (a file descriptor) is returned for all future references to the file; in umask() the previous value of the mask is returned.

In Java, these uses may be irrelevant or deprecated. Handles to structures are rarely needed since in Java one is dealing with objects, and the reference to an object is a handle to that object. Special integer values (for example) are not needed.

Style issues may cause some problems. The Java Beans specification encourages the use of ``getter'' and ``setter'' methods. For example, the function umask() in C returns the old mask while also setting a new one. In Java Beans style, this would be two separate methods, getUmask() and setUmask().

4.2 Data Types

There are some data representation issues that are difficult to resolve. Java has a more limited set of integer types, and does not include the unsigned values. This is discussed in more detail later.

Java does not allow mechanisms to perform address arithmetic. There does not seem to be any need to do this in the POSIX API, so this is not a problem.

4.3 Error handling

Systems calls may succeed or fail. There must be a way of signalling both cases. In C, the standard method to do this is by the function return value. For example, the function open() will return a non-negative integer (the file descriptor) if the call succeeds, or minus one if it fails.

In Java, errors are signalled by raising exceptions, and if no exception is raised then the method has succeeded.

4.4 Decisions

For a first cut at a Java/POSIX binding all decisions were made in favour of the C programmer. This was based on the fact that the design decisions for the API had already been made and it was relatively easy to take these decisions unchanged into Java. It also allowed concentration on the (initially) ``harder'' bits which involved using the JNI API to map Java into C.

However, the resulting code did not fit the Java coding styles or standards. Programs using this had an awkward feel. It was discarded in favour of a more Java-like solution, reworking the POSIX API. Some of the possibilities are discussed in the sequel as to why this decision was made.

5. Class Structure

Related to the language issue is the classification of POSIX calls into Java classes. C++ can adopt the C API unchanged since it is a mixed OO/procedural language that will accept C functions unchanged. In Java, all functions must belong to methods of classes. This means that some changes must be made to the API anyway. This section discusses some of the issues in this.

The POSIX specification chapter on ``Files and Directories'' contains function calls in a number of categories. There is a set of functions that merely take the name of a file: these include chmod() and access(). Another set refer to a file as object by using a file descriptor, such as open(), creat(), read() and write(). The C system calls stat() and fstat() take a structure and fill values in. The functions link() and unlink() really refer to file names within directories, and as such refer to directories rather than files, even though programmers tend to think of them as file rather than directory operations. The function umask() sets file creation flags for the current process, and is unrelated to files except in their creation.

The other chapters in the POSIX 1003.1 specification contain an equal mixture of categories. In other words, this specification is not drawn up on lines that are suitable for OO languages. Categorisation of C functions into OO classes and methods is not clear-cut.

5.1 Class Stat

A set of functions that map well into Java are the functions stat() and fstat() as they deal with the stat structure. The C stat structure has a number of fields and a set of C macros that act on one of these fields:
struct stat {
  mode_t st_mode;
  ino_t  st_ino;
  ...
}

/* constants */
#define S_IRUSR ...
#define S_IWUSR ...

/* macros */
#define S_ISDIR(mode) ...
#define S_ISCHR(mode) ...
This displays a number of characteristics of POSIX:
  • Structure field names have a prefix st_ to label them as belonging to structure stat
  • Native data types are hidden by typedef definitions, such as mode_t
  • To reduce name clashes, defined constants have a prefix S_
  • To reduce name clashes, macros have a prefix S_

The functions that manipulate this structure are stat() and fstat(). These fill in the fields with values that can be queried by a program. These system calls cannot be used to set values.

In Java terms, many of these devices for avoiding namespace problems can be dropped, since Java has its own mechanisms. The read-only nature of the fields can also be specified. The difference between the two system calls that fill in the stat structure correspond to two different constructors. However, the type synonym problem is not handled well, and may need to revert to base types. A Java class for this structure and its functions could be

public class Stat {

  // constants
  public final int RWXU;
  public final int RUSR;
  public final int WUSR;
  // etc

  // constructors
  public Stat(String fileName) throws PosixException; // stat()
  public Stat(posix.File file) throws PosixException; // stat()
  public Stat(posix.OpenFile fd) throws PosixException; // fstat()

  // public field access
  public int getMode();
  public int getIno();
  public int getDev();
  // etc

  public boolean isDir();
  public boolean isChr();
  // etc
}
Note also that the constants such as RUSR are not assigned their ``well known'' values (such as 0400) in these definitions. The actual values are not part of POSIX and so should not be part of the Java class definitions. The technique of ``blank finals'' is used here, and is discussed later in implementation issues.

5.2 Class File

File operations fall into two sets: those that require only filenames (such as unlink()), and those that require open files (such as read()).

The operations that only use filenames can form a class of their own. The Java Core library has a class java.io.File that plays a similar role in the platform-independent world. The Java class shows a range of decisions that can be made in this mapping.

  1. C functions could map onto static class methods of Java. This would make them like most like the C calls. The methods could return the same values as the C calls, handled in the same way as in a C program. This would produce Java code such as
          if (File.unlink("/etc/passwd") == -1)
              ...
          
  2. C functions could map onto static class methods, but error handling would be done through exceptions rather than return values
          try
              File.unlink("/etc/passwd");
          catch(Exception e)
              ....
          
  3. Since each of the methods deals with a single file by name, a more OO approach would create an object with this name and call methods on it. Since we probably will not need such objects to have long-term existence there will often be no need to save them in variables. Using this, and returning C style error values gives
          if (new File("/etc/passwd").unlink() == -1)
             ...
          
  4. Finally, using objects plus exceptions gives
          try
              new File("/etc/passwd").unlink();
          catch(Exception e)
              ...
          
The first version will be most familiar to C programmers. This is the way that entrenched C programmers can still program in C while calling it Java. The last is preferred Java style. A Java binding can certainly be produced using the first method, and was done so in an early version. However, this was discarded in favour of the last method on the grounds that a binding that does not ``feel right'' would ultimately fail.

5.3 Class OpenFile

The read() and write() functions act on open files. A file opened only for reading cannot be written to, and vice versa. Safety would divide open files into two classes, readable files and writable files. This is done in Core Java with classes InputFileStream and OutputFileStream. However, in Posix a file can be opened as both readable and writable. This is a strong candidate for a set of classes using multiple inheritance

Regrettably, Java does not have multiple inheritance so this does not map very nicely.

Despite duplication of code, the class ReadWriteFile must also be a direct child of OpenFile. This common parent class is needed for a number of reasons:

  • common constants such as O_NONBLOCK should be defined in a parent class
  • common methods such as close() should be defined in a parent class
  • some Posix calls use a file descriptor that does not distinguish between files opened for read, for write or for both, such as fstat(). Such calls (when turned into methods) will need a generic object of any of these types.

The class structure is thus


with details
public class OpenFile {

    public static final int NONBLOCK;
    // etc

    public void close();
    // etc
}

public class ReadFile extends OpenFile {

    public ReadFile(String path);

    public int read(byte buf[], int nbyte);
}

public class WriteFile extends OpenFile {

    public WriteFile(String path);

    public int write(byte buf[], int nbyte);
}

public class ReadWriteFile extends OpenFile {

    public ReadWriteFile(String path);

    public int read(byte buf[], int nbyte);
    public int write(byte buf[], int nbyte);
}

5.4 Class Pipe

Posix uses the un-named pipe as primary means of inter-process communication, although there are plenty of other methods: named pipes, streams, shared memory, ports, etc. A pipe creates a pair of I/O channels, one used for read and one for write. A C pipe returns an array of two file descriptors. The programmer has to remember that index zero is used for reads, in analogy to file descriptor zero being the read descriptor. Similarly for index one, used for writes.

A Java version of pipe() can define a class that avoids the potential problem of poor programmer memory by making type-safe versions in a class Pipe:

public Class Pipe {

  public Pipe();

  public ReadFile in();
  public WriteFile out();
}
The two public methods are both ``getter'' methods that return the appropriate OpenFile. However, by returning a subclass, it is only possible to use ``read'' methods on in() and ``write'' methods on out().

5.5 System call dup2()

A typical systems programming activity is to create a pipeline of two separate processes, such as
ls | wc
In order to do this in C, it is necessary to create a pipe, perform a fork(), and then map the ends of the pipe onto the standard file descriptors using dup() or preferably dup2().

The dup2() C function call takes two parameters, the current file descriptor and the file descriptor it would ``like to be''. Typically one of the desired file descriptors is zero (for standard input) or one (for standard output). The ``current file descriptor'' will be an OpenFile object. The desired file descriptor could be an integer as in the C call, or could be another OpenFile object, since this binding is deprecating the need for an integer file descriptor. After the dup2(), reads/writes to either object should have the same effect.

For this to function seamlessly, there will need to be existing objects corresponding to file descriptors 0, 1 and 2. These can be done in a similar manner to the Core streams System.in and System.out.

These all give further fields and methods of class OpenFile:

public class OpenFile {

    protected native void dup2(int fd);
    public void dup2(OpenFile f);

    ....
}
with the standard files in a general Posix class
public class Posix {

    static public InputFile stdin;
    static public OutputFile stdout;
    static public OutputFile stderr;
}

This means that we can now perform operations such as

ReadFile f = new ReadFile("/etc/passwd");
f.dup2(stdin);
f.read(buf, 128);      // reads from /etc/passwd
stdin.read(buf, 128);  // reads more from /etc/passwd

5.6 Class Process

Java already has threads, and in practice they sit on top of a threads package which may be supplied from the O/S or be a separate package. These allow multiple Java threads to share memory and objects, but otherwise run disjointly.

The traditional multi-tasking model in Unix has been via processes, which are more heavyweight and do not share memory (although they share some things, such as file descriptors). Processes are identified by a ``process id'' which is an unsigned integer. Each process has its own process id, and all but the initial process will have a parent process id.

There are a host of functions that work on the current process, such as getpid() and getuid(). Functions that work on other processes, such as kill(), can only use the process id since that is all the information that will typically be available.

In any running Java application there can only be one process (although there may be many threads). It hardly seems worthwhile creating an object of type Process if there can only be one of them. So instead of being able to create objects of this class, it is easier to define them as all being static methods of the class.

This binding is not complete for this class. A partial definition is

public class Process {

    public static int execvp(String file, String args[]);
    public static int fork();
}

5.7 Class PosixException

A PosixException is raised when an error occurs. This contains an error message as produced by strerror().

6. Examples

6.1 Copying input to output

Copying from standard input to standard output may be done by
final int SIZE = 1024;
byte buf[SIZE];
int nread;
try {
    while ((nread = Posix.stdin.read(buf, SIZE)) != 0)
        Posix.stdout.write(buf, nread);
} catch(PosixException e) {
    System.err.println("Error in copy " + e.toString());
}
The while loop terminates on end of file. Any error is caught by the exception handler. This separates the two possibilities which are often combined in the C code of while ((nread = read(...)) > 0)

6.2 File information

Check read permissions to current directory:
Stat currDir;
try {
    currDir = new Stat(".");
} catch(PosixException e) {
    System.out.println("Stat failed on . " + e.toString());
    System.exit(1);
}

int mode = currDir.getMode();
if (mode & Stat.ROTH)
    System.out.println(". is world readable");

6.3 A Pipeline

It is common in Posix systems programming to set up a pipeline, to fork and to execute another program within one or both of the processes. To perform the pipeline
ls | wc
in C can be done by (with no error checking)
int pfd[2];

pipe(pfd);
if (fork() == 0) {
    close(pfd[1]);
    dup2(pfd[0], 0);
    close(pfd[0]);

    char *args[] = {"wc", NULL};
    execvp("wc", args);
} else {
    close(pfd[0]);
    dup2(pfd[1], 1);
    close(pfd[1]);

    char *args[] = {"ls", NULL};
    execvp("ls", args);
}

The classes given in this binding allow this to be done in Java by

Pipe p = new Pipe();
if (Process.fork() == 0) {
    p.out().close();
    p.in().dup2(Posix.stdin);
    p.in().close();

    String args[] = new String[] {"wc"};
    Process.execvp("wc", args);
} else {
    p.in().close();
    p.out().dup2(Posix.stdout);
    p.out().close();

    String args[] = new String[] {"ls"};
    execvp("ls", args);
}

7. Implementation Issues

7.1 Constants

POSIX defines all constants symbolically. For example, although ``everybody knows'' that the file mode ``readable by user'' has the value 0400, POSIX does not define it as such, but only by the symbolic value S_IRUSR. This is entirely correct: the actual value may depend on the platform and should not be pinned down in this standard.

This means that constants should not be hard-coded in the Java binding, but instead picked up from the local environment. In Java 1.0 this would only be possible for variables, and allowing constants to be variable would not be a good idea. From Java 1.1, constants (declared as final) need not have their value statically defined, but can be assigned to once, in an initialisation section such as a constructor. This allows a native code call to assign a value.

Alas, things are not quite straightforward. The Java compiler must ensure that the constants are assigned a value, and this cannot be hidden in native code inaccessible to the compiler. A dummy value must be used:

public static final int RUSR;
private static int rusr;  // dummy vbl

protected static native void initialiseConstants();

static {
    initialiseConstants(); // sets rusr, etc

    RUSR = rusr;  // yuk
    ...
}

7.2 Types

In a similar vein, POSIX uses typedef's to hide native data types. For example, mode_t is usually a synonym for unsigned short. This could be a different type (although it is unlikely). Java does not have a mechanism for typedefs.

In version 1.0 of this binding, common assumptions were made about what the types resolve to, to give common code across all versions of Unix. However, in version 2 this is being rewritten to allow for different implementations and also to allow for different sizes of basic data types, such as 64-bit versus 32-bit integers. In this new version, a method such as getMode() is defined to return an interface value,


interface ModeT {
}

where the type is implemented in a variety of ways. For example,

class ModeT_32bit_signed {
     int mode;
}

The correct version is selected at compile time of the C library using a Perl script used by autoconf.

7.3 Varargs

C allows functions to have a variable number of arguments, such as printf(). This is not allowed in Java. This affects a few Posix functions such as execl() which cannot be implemented in Java. Fortunately, there are alternatives such as execvp() which replace the varargs with an array of args.

8. Conclusion

This paper has shown that it is possible to map a non-trivial subset of the POSIX 1003.1 API into Java classes and method calls. The mapping is currently incomplete but there appear to be no real problems in completing it.

Mapping POSIX constants into Java constants is messy but straightforward. However, mapping POSIX data types into Java causes problems that require build-time configuration of the native C libraries.

For this project to move forward will require a consensus effort, since there are many issues involved. It is currently being taken up by a project student, to flesh out the binding and to resolve issues. The library is available under an ``open source'' license from http://jan.newmarch.name/java/posix/.

9. Acknowledgements

The author is currently on a sabbatical program at the CRC for Distributed Systems Technology, and the work reported in this paper has been funded in part by the Co-operative Research Centre Program through the Department of Industry, Science & Tourism of the Commonwealth Government of Australia.

10. References

[Arnold]
K. Arnold and J. Gosling, The Java Programming Language, Addison-Wesley, 1996
[J++]
VisualJ++ Home Page, http://www.microsoft.com/visualj/
[JNI]
Java Native Interface, http://java.sun.com/products/jdk/1.2/docs/guide/jni/
[Ousterhout]
J. K. Ousterhout Tcl and the Tk Toolkit, Addison-Wesley, 1994
[Wall]
L. Wall and R. L. Schwartz Programming Perl, O'Reilly, 1993

Jan Newmarch (http://jan.newmarch.name)
jan@newmarch.name
Last modified: Wed Mar 3 10:42:00 EST 1999
Copyright ©Jan Newmarch