The standard Java packages include some cross-platform support for common file, directory and process operations. This paper discusses an extension library that makes functions of the POSIX 1003.1 API available in addition to the standard packages.
One of the unusual characteristics of POSIX 1003.1 is its use of the C language throughout. Whereas IEEE standards would normally be expected to be language independent, the history of Unix is so bound up with C that this was the only feasible route. The document claims that it will be the basis for definitions that are independent of particular programming languages. In fact, new language dependent versions have arisen, such as Ada (POSIX 1003.5) and Fortran (POSIX 1003.9).
Irrespective of the formal standards mechanisms, designers of libraries for other languages have used POSIX 1003.1 as the specification for their own libraries. For example, tcl [Ousterhout] and Perl [Wall] have well-developed Unix bindings that copy the syntax, and rely on the semantics, of POSIX 1003.1.
Java [Arnold] is the latest flavour in programming languages for all sorts of excellent reasons, despite some serious shortcomings. It is a full O/O language (although not as pure as SmallTalk or Eiffel); it is based on C/C++, but avoids the worst of both languages (not completely, though); by use of a virtual machine it is cross-platform (although some vendors have variant versions); it has a rich set set of libraries that deal with, for example, graphics and networking. These are combining to produce the ``Web programming'' environment where applications or applets are written once to run on any platform. Java extends the open authoring environment of HTML to an open programming environment.
How do Unix and Java co-exist? Very well in general terms. A Java application can do exactly the same things in a Unix environment that it can do in a Mac or Windows environment. With care, applications can even be written that transparently handle such issues as forward- versus back-slashes for directory separators.
Java currently lacks the depth of hooks into the POSIX API that tcl and Perl have. The standard must continue to have this lack or it will lose its universal applicability. However, extensions by non-standard libraries can supply this depth, as long as they are done in an appropriate manner.
This paper reports on a project to make the POSIX API
available to Java applications or applets using a
non-standard library. It encapulates POSIX calls within
Java methods for a new set of classes in the posix
package.
This project must be contrasted in aims and methods to the Microsoft policy with J++: their attempt is to make J++ the best environment for doing Windows programming in Java [J++], and this implicitly rejects the notion of a cross-platform environment. It may even turn out to be difficult for the programmer using J++ to be certain of how general their code is. The statements from Microsoft do not seem to dispel any uncertainty.
This project, on the other hand, makes it explicit about the
introduction of non-standard features, by placing all the
new classes in a package that is non-standard.
No application using the posix
package will ever
pass the ``Java Pure'' tests. On the other hand, this package
will develop by an open process method: the source code will be
freely available to all, and can be freely modified.
If interest is high enough, it can be divorced
from individual control and given to a suitable
co-ordinating authority.
File
which can
return a list of files in a directory. However, many useful
methods are missing. For example, the ability
to change directories does not appear to be there because
it has different effects in DOS to Unix (in DOS it changes
directories for all processes, but in Unix only for the
calling process).
Some of the areas in which the POSIX API supplies extra functionality over the Java Core is given in the following (incomplete) table:
Files | multiple links |
file status | |
access mode | |
FIFOs | |
Processes | file creation mask |
execution times | |
user id | |
signals | |
process groups |
java.lang
),
general I/O (java.io
), useful utilities
(java.util
), networking (java.net
),
windowing (java.awt
), etc,
plus techniques such as remote method
invocation (RMI) and Java Native Interface (JNI).
Any application or applet can add classes that are written in Java. These can be accessed from local copies or downloaded across the network. This is the principle behind the network computer, and is captured in the ``Pure Java'' certification. This validates that an aplication or applet should be able to run in any Java-conforming environment.
The key to all of this is that implementations of the Java Core classes must exist on all Java-conforming platforms. It is not often stated, but these Core classes are riddled with native, platform-dependent code. They have to be: the AWT relies on the native windowing system, and the network classes rely on the native TCP/IP stack. What Pure Java really means is: use no native code apart from that sanctioned by Sun.
The Java Virtual Machine interpreters are currently written in C. There is an interface between compiled Java and the interpreter which is formalised by a set of Java to native-code C functions. Regrettably, there are several sets of interface APIs: the original set from Java 1.0, which is still heavily used but not talked about anymore (eg in the AWT); the Java Native Interface (JNI) which is the new public standard [JNI]; the Microsoft set, which differs from JNI and is one of the points of contention between Sun and Microsoft.
This POSIX library is implemented using the Java Native Interface (JNI) API. It is a ``thin'' library, calling directly into the C API with very little extra work. It will need to be compiled for all native platforms, since it is written in C as well as Java. The intent is that it should run on all Posix platforms, but of course this has still to be tested.
open()
a ``handle'' to the file (a file descriptor) is returned
for all future references to the file; in umask()
the previous value of the mask is returned.
In Java, these uses may be irrelevant or deprecated. Handles to structures are rarely needed since in Java one is dealing with objects, and the reference to an object is a handle to that object. Special integer values (for example) are not needed.
Style issues may cause some problems.
The Java Beans specification encourages the use of ``getter''
and ``setter'' methods. For example, the function
umask()
in C returns the old mask while also
setting a new one. In Java Beans style, this
would be two separate methods, getUmask()
and
setUmask()
.
There are some data representation issues that are difficult to resolve. Java has a more limited set of integer types, and does not include the unsigned values. This is discussed in more detail later.
Java does not allow mechanisms to perform address arithmetic. There does not seem to be any need to do this in the POSIX API, so this is not a problem.
open()
will return a non-negative integer
(the file descriptor) if the call succeeds, or minus one
if it fails.
In Java, errors are signalled by raising exceptions, and if no exception is raised then the method has succeeded.
However, the resulting code did not fit the Java coding styles or standards. Programs using this had an awkward feel. It was discarded in favour of a more Java-like solution, reworking the POSIX API. Some of the possibilities are discussed in the sequel as to why this decision was made.
The POSIX specification chapter on ``Files and Directories''
contains function calls in a number of categories.
There is
a set of functions that merely take the name of a file:
these include chmod()
and access()
.
Another set refer to a file as object
by using a file descriptor, such as
open()
, creat()
, read()
and write()
.
The C system calls stat()
and fstat()
take a structure and fill values in.
The functions link()
and unlink()
really refer to file names within
directories, and as such refer to directories rather than
files, even though programmers tend to think of them as file
rather than directory operations.
The function
umask()
sets file creation flags for the current
process, and is unrelated to files except in their creation.
The other chapters in the POSIX 1003.1 specification contain an equal mixture of categories. In other words, this specification is not drawn up on lines that are suitable for OO languages. Categorisation of C functions into OO classes and methods is not clear-cut.
Stat
stat()
and fstat()
as they deal with
the stat
structure. The
C stat
structure has a number of fields and
a set of C macros that act on one of these fields:
struct stat { mode_t st_mode; ino_t st_ino; ... } /* constants */ #define S_IRUSR ... #define S_IWUSR ... /* macros */ #define S_ISDIR(mode) ... #define S_ISCHR(mode) ...This displays a number of characteristics of POSIX:
st_
to
label them as belonging to structure stat
typedef
definitions, such as mode_t
S_
S_
The functions that manipulate this structure are
stat()
and fstat()
. These fill in
the fields with values that can be queried by a program.
These system calls cannot be used to set values.
In Java terms, many of these devices for avoiding namespace
problems can be dropped, since Java has its own mechanisms.
The read-only nature of the fields can also be specified.
The difference between the two system calls that fill in
the stat
structure correspond to two
different constructors.
However, the type synonym problem is not handled well,
and may need to revert to base types.
A Java class for this structure
and its functions could be
public class Stat { // constants public final int RWXU; public final int RUSR; public final int WUSR; // etc // constructors public Stat(String fileName) throws PosixException; // stat() public Stat(posix.File file) throws PosixException; // stat() public Stat(posix.OpenFile fd) throws PosixException; // fstat() // public field access public int getMode(); public int getIno(); public int getDev(); // etc public boolean isDir(); public boolean isChr(); // etc }Note also that the constants such as
RUSR
are not
assigned their ``well known'' values (such as 0400)
in these definitions. The
actual values are not part of POSIX and so should not be part
of the Java class definitions. The technique of ``blank finals''
is used here, and is discussed later in implementation issues.
File
unlink()
),
and those that require open files (such as read()
).
The operations that only use filenames can form a class of
their own. The Java Core library has a class
java.io.File
that plays a similar role in the
platform-independent world.
The Java class shows a range of decisions that
can be made in this mapping.
if (File.unlink("/etc/passwd") == -1) ...
try File.unlink("/etc/passwd"); catch(Exception e) ....
if (new File("/etc/passwd").unlink() == -1) ...
try new File("/etc/passwd").unlink(); catch(Exception e) ...
OpenFile
read()
and write()
functions act
on open files. A file opened only for reading cannot be
written to, and vice versa.
Safety would divide open files
into two classes, readable files and writable files. This
is done in Core Java with classes
InputFileStream
and OutputFileStream
.
However, in Posix a file can be opened as both readable
and writable.
This is a strong candidate for a set of classes
using multiple inheritance
Despite duplication of code, the class
ReadWriteFile
must also be a direct child of
OpenFile
. This common parent class is needed
for a number of reasons:
O_NONBLOCK
should be defined in a parent class
close()
should
be defined in a parent class
fstat()
.
Such calls (when turned into methods) will need a
generic object of any of these types.
The class structure is thus
public class OpenFile { public static final int NONBLOCK; // etc public void close(); // etc } public class ReadFile extends OpenFile { public ReadFile(String path); public int read(byte buf[], int nbyte); } public class WriteFile extends OpenFile { public WriteFile(String path); public int write(byte buf[], int nbyte); } public class ReadWriteFile extends OpenFile { public ReadWriteFile(String path); public int read(byte buf[], int nbyte); public int write(byte buf[], int nbyte); }
Pipe
A Java version of pipe()
can define a class
that avoids the potential problem of poor programmer memory
by making type-safe versions in a class Pipe
:
public Class Pipe { public Pipe(); public ReadFile in(); public WriteFile out(); }The two public methods are both ``getter'' methods that return the appropriate
OpenFile
. However, by returning
a subclass, it is only possible to use ``read'' methods
on in()
and ``write'' methods on
out()
.
dup2()
ls | wcIn order to do this in C, it is necessary to create a pipe, perform a
fork()
, and then map the ends of
the pipe onto the standard file descriptors using
dup()
or preferably dup2()
.
The dup2()
C function
call takes two parameters, the
current file descriptor and the file descriptor it would
``like to be''. Typically one of the desired file descriptors
is zero (for standard input) or one (for standard output).
The ``current file descriptor'' will be an
OpenFile
object. The desired file descriptor
could be an integer as in the C call, or could be another
OpenFile
object, since this binding is
deprecating the need for an integer file descriptor.
After the dup2()
, reads/writes to either object
should have the same effect.
For this to function seamlessly, there will need to be
existing objects corresponding to file descriptors 0, 1
and 2. These can be done in a similar manner to the Core
streams System.in
and System.out
.
These all give further fields and methods of class
OpenFile
:
public class OpenFile { protected native void dup2(int fd); public void dup2(OpenFile f); .... }with the standard files in a general
Posix
class
public class Posix { static public InputFile stdin; static public OutputFile stdout; static public OutputFile stderr; }
This means that we can now perform operations such as
ReadFile f = new ReadFile("/etc/passwd"); f.dup2(stdin); f.read(buf, 128); // reads from /etc/passwd stdin.read(buf, 128); // reads more from /etc/passwd
Process
The traditional multi-tasking model in Unix has been via processes, which are more heavyweight and do not share memory (although they share some things, such as file descriptors). Processes are identified by a ``process id'' which is an unsigned integer. Each process has its own process id, and all but the initial process will have a parent process id.
There are a host of functions that work on the current
process, such as getpid()
and
getuid()
. Functions that work on other
processes, such as kill()
, can only use the
process id since that is all the information that will
typically be available.
In any running Java application there can only be one
process (although there may be many threads). It hardly
seems worthwhile creating an object of type
Process
if there can only be one of them.
So instead of being able to create objects of this class,
it is easier to define them as all being static
methods of the class.
This binding is not complete for this class. A partial definition is
public class Process { public static int execvp(String file, String args[]); public static int fork(); }
PosixException
PosixException
is raised when an error occurs.
This contains an error message as produced by
strerror()
.
final int SIZE = 1024; byte buf[SIZE]; int nread; try { while ((nread = Posix.stdin.read(buf, SIZE)) != 0) Posix.stdout.write(buf, nread); } catch(PosixException e) { System.err.println("Error in copy " + e.toString()); }The
while
loop terminates on end of file.
Any error is caught by the exception handler. This separates
the two possibilities which are often combined in the C code
of while ((nread = read(...)) > 0)
Stat currDir; try { currDir = new Stat("."); } catch(PosixException e) { System.out.println("Stat failed on . " + e.toString()); System.exit(1); } int mode = currDir.getMode(); if (mode & Stat.ROTH) System.out.println(". is world readable");
ls | wcin C can be done by (with no error checking)
int pfd[2]; pipe(pfd); if (fork() == 0) { close(pfd[1]); dup2(pfd[0], 0); close(pfd[0]); char *args[] = {"wc", NULL}; execvp("wc", args); } else { close(pfd[0]); dup2(pfd[1], 1); close(pfd[1]); char *args[] = {"ls", NULL}; execvp("ls", args); }
The classes given in this binding allow this to be done in Java by
Pipe p = new Pipe(); if (Process.fork() == 0) { p.out().close(); p.in().dup2(Posix.stdin); p.in().close(); String args[] = new String[] {"wc"}; Process.execvp("wc", args); } else { p.in().close(); p.out().dup2(Posix.stdout); p.out().close(); String args[] = new String[] {"ls"}; execvp("ls", args); }
S_IRUSR
. This is
entirely correct: the actual value may depend on the platform
and should not be pinned down in this standard.
This means that constants should not be hard-coded in the
Java binding, but instead picked up from the local
environment. In Java 1.0 this would only be possible
for variables, and allowing constants to be
variable would not be a good idea. From Java 1.1, constants
(declared as final
) need not have their value
statically defined, but can be assigned to once,
in an initialisation section such as a constructor.
This allows a native code call to assign a value.
Alas, things are not quite straightforward. The Java compiler must ensure that the constants are assigned a value, and this cannot be hidden in native code inaccessible to the compiler. A dummy value must be used:
public static final int RUSR; private static int rusr; // dummy vbl protected static native void initialiseConstants(); static { initialiseConstants(); // sets rusr, etc RUSR = rusr; // yuk ... }
typedef
's to
hide native data types. For example, mode_t
is usually a synonym for unsigned short
.
This could be a different type (although it is unlikely).
Java does not have a mechanism for typedefs.
In version 1.0 of this binding, common assumptions were made
about what the types resolve to, to give common code across
all versions of Unix. However, in version 2 this is being
rewritten to allow for different implementations and also
to allow for different sizes of basic data types, such as
64-bit versus 32-bit integers. In this new version, a method
such as getMode()
is defined to return an interface
value,
interface ModeT {
}
where the type is implemented in a variety of ways. For example,
class ModeT_32bit_signed {
int mode;
}
The correct version is selected at compile time of the C library
using a Perl script used by autoconf
.
printf()
. This is not allowed in Java.
This affects a few Posix functions such as execl()
which cannot be implemented in Java. Fortunately, there
are alternatives such as execvp()
which replace
the varargs with an array of args.
Mapping POSIX constants into Java constants is messy but straightforward. However, mapping POSIX data types into Java causes problems that require build-time configuration of the native C libraries.
For this project to move forward will require a consensus
effort, since there are many issues involved.
It is currently being taken up by a project student, to
flesh out the binding and to resolve issues. The library is
available under an ``open source'' license from
http://jan.newmarch.name/java/posix/
.