A Service Architecture for Scalable Distributed Audio

Introduction

Over the years we have become accustomed to listening to music and general audio from an increasing variety of sources. From only hearing music in concert halls, we can now hear it from TVs and radios, piped throughout shopping malls and elevators, and blaring out from spruikers at individual shops. In addition to that, more and more people are carrying their own portable audio sources, culminating in the current generation of iPods which can store 10,000 music files. The variety of audio sources and possible audio sinks can only be expected to increase with more and more devices being able to generate and consume audio. In addition, we can expect that sources and sinks will become more volatile, with consumers moving within range and out of range of a multiplicity of sources.

Most architectures for home audio-visual systems such as the Java Media Framework or Microsoft Direct Show are based on a local model, where all generators (e.g. TV tuner card) and consumers (e.g soundcard) are all on the same machine. Even though JMF supports remote audio by means of HTTP and RTP, it hides these under a local programming model.

Network architectures are either based on existing middleware such as C++, often extending it in some way, or build their own middleware structure oriented towards a particular view of A/V. In the first class are systems such as Multimedia System Services and ... In the second category are systems such as Network-integrated Multimedia Middleware (NMM). There is work on distributed A/V systems using Java such as HAVi but this is quite specific to the firewire networking protocol.

This paper is oriented towards providing a large-scale service-based architecture where the emphasis is on service advertisement and discovery, simplified as much as possible, with recovery under failure as services disappear. The framework acts at an abstract level of service description but implementation levels maintain the capability of accomodating many transport protocols, of handling multiple presentation formats, being able to manage issues such quality of service and even being able to use multiple middleware systems.

The system uses Jini for service management. This is a middleware system built on Java that is able to fully exploit Java networking capabilities and object mobility.

The structure of this paper is as follows: the next section discusses Jini as a service management middleware. The following section discusses and defines the service interfaces for our system. After this, additional interfaces that give lower level information are discussed. Some implementation techniques follow this. The succeeding section looks at scalability issues, and finally the paper concludes with a summary and discussion of future work.

Java and Jini

Java is a platform-independant language in which programs are compiled to portable byte code. It has become widely accepted from the enterprise level down to embedded systems in small devices. While the scale of hardware variation has lead to different levels of virtual machine and core libraries (CLDC, CDC, J2SE, J2EE), there is still a much higher degree of conformability than in languages compiled to the object code layer.

In addition, Java has well-defined introspection mechanisms, which leads to standard serialisation techniques. These can be used to separate object data from class code so that instance data can be moved across a network and combined with class definitions from a separate source. This can be used as the basis for mobile systems of various kinds, from RMI to Jini to mobile agent systems.

Jini exploits the mobility of Java code with a service management system tuned towards network realities. It gives service advertisement and discovery, with resilient recovery mechanisms in case of failure. It is interface based, with total flexibility in implementation.

The advantages of this are

Jini supplies a service advertisement and lookup registry
It has inbuilt reflection
It has an event model
It supplies a resilient failure mechanism
It allows flexible proxies, from RPC-like stubs, to "fat" proxies that can use local resources and any appropriate middleware
It can distribute user interfaces as components of a service
It can bridge to other middleware systems
It can handle "legacy" devices through a surrogate model or through Java JNI

Service definitions

There are many variables that affect how A/V is sourced, moved around a network and delivered

Transport: The transport layer may be reliable (slow) TCP, unreliable (faster) UDP, HTTP (even slower), with some QOS such as RTP or some other network technology protocol such as Bluetooth or FireWire
Format: There are an enormous number of formats, from encumbered formats such as MP3 (for which you are supposed to pay license fees for encoders and decoders), unencumbered equivalents such as Ogg-Vorbis, compressed (MP3 and Ogg-Vorbis) or uncompressed (Sun AU or M/S WAV), lossy or lossless. In addition, there are many wrinkles in each format: little- or big-endian; 8, 16 or 32 bit; mono, stereo, 5-1,...; sample rate such as 44.1khz, 8khz, etc
Content description: Audio comes from many different sources: tracks off a CD, streaming audio from an FM station, speech off a telephone line. The MPEG-7 concentrates on technical aspects of an audio signal in attempts to classify it, while the CD databases (CDDB) such as freedb classify CDs by Artist/Title - which breaks down with compilation CDs and most classical CDs (who is the artist - the composer, the conductor or the orchestra?)
Push/pull: An audio stream may be "pushed", such as an FM radio stream that is always playing. Or it may be "pulled" by a client from a server, such as in fetching an MP3 file from an HTTP server

Interfaces should contain all the information about how to access services. With audio, all the information about a service can be quite complex: for example, a service might offer a CD track encoded in 16-bit stereo, big-endian, 44.1khz sampling in WAV format from an HTTP server. This information may be needed by a consumer that wants to play the file.

But at the most abstract layer an A/V system consists of three players:

Sources of A/V data
Sinks for A/V data
Controller clients to link sources and sinks

From the controller viewpoint, most of this information is irrelevant: it will just want to link sources to sinks, and leave it to them to decide how and if they can communicate.

For simplicity we define two interfaces: Source and Sink. To avoid making implementation decisions about pull versus push, we have methods to tell a source about a sink, a sink about a source, to tell the source to play and the sink to record. Again, how they decide how to do this is upto the source and sink. Sometimes this won't work: an HTTP source may not be able to deliver to an RTP sink, or a WAV file may not be managed by an MP3 player. If they don't succeed in negotiating tranport and content, then an exception should be thrown. This violates the principle that a service should be usable based on its interface alone, but considerably simplifies matters for controller clients.

A controller that wants to play a sequence of audio tracks to a sink will need to know when one track is finished in order to start the next. The play() and record() methods could block till finished, or return immediately and post an event on completion. The second method allows more flexibility, and so needs add/remove listener methods for the events.

Finally, there are the exceptions that can be thrown by the methods. Attempting to add a source that a sink cannot handle should throw an exception such as IncompatableSourceException. A sink that can handle only a small number of sources (for example, only one) could throw an exception if too many sources are added. A source that is already playing may not be able to satisfy a new request to play.

These considerations lead to a pair of high-level interfaces which seem to be suitable for controllers to manage sources and sinks (other event constants may be added later):


public interface Source extends java.rmi.Remote {

    int STOP = 1;

    void play() throws 
                RemoteException,
	        AlreadyPlayingException;
    void stop() throws 
                RemoteException,
                NotPlayingException;
    void addSink(Sink sink) throws
                RemoteException,
                TooManySinksException,
                IncompatableSinkException;
    void removeSink(Sink sink) throws
                RemoteException,
                NoSuchSinkException;
    EventRegistration addSourceListener(RemoteEventListener listener,
					MarshalledObject handback) throws
					    RemoteException;
}// Source

and


public interface Sink extends java.rmi.Remote {

    int STOP = 1;

    void record() throws
	        RemoteException,
	        AlreadyRecordingException;
    void stop() throws 
                RemoteException,
	        NotRecordingException;
    void addSource(Source src) throws
                RemoteException,
                TooManySourcesException,
                IncompatableSourceException;
    void removeSource(Source src) throws
                RemoteException,
                NoSuchSourceException;
    EventRegistration addSinkListener(RemoteEventListener listener, 
				      MarshalledObject handback) throws
	        RemoteException;
    void removeSinkListener(RemoteEventListener listener) throws
	        RemoteException,
		NoSuchListenerException;

}// Sink

Additional interfaces

The two interfaces given above are enough to identify sources and sinks to a third party client (or to each other). In order to negotiate whether they can talk to each other may require more information, which can be supplied by further interfaces.

Content interfaces

The Java Media Framework (JMF) has methods such as getSupportedContentTypes() which returns an array of strings. Other media toolkits have similar mechanisms. This isn't type-safe: it relies on all parties having the same strings and attaching the same meaning to each. In addition to this, if a new type comes along, there isn't a reliable means of specifying this information to others. A type-safe system can at least specify this by class files.

Interfaces are more type-safe than strings: a WAV interface, an Ogg interface, etc. This doesn't easily allow extension to the multiplicity of content type variations (bit size, sampling rate, etc), but the current content handlers seem to be able to handle most of these variations anyway, so it seems feasible to ignore them at an application level.

The content interfaces are just place-holders:


package presentation;

public interface Ogg extends java.rmi.Remote {
}

A source that could make an audio stream available in OggVorbis format would signal this by implementing the Ogg interface. A sink that can manage OggVorbis streams would also implement this interface.

Transport interfaces

In a similar way, the transport mechanisms may be represented by interfaces. A transport sink will get the information from a source using some unspecified network transport mechanism. The audio stream can be made available to any other object by exposing an InputStream. This is a standard Java stream, not the special one used by JMF. Similarly, a transport source would make an output stream available for source-side objects to write data into.


public interface TransportSink {

    public InputStream getInputStream();
}// TransportSink

and


public interface TransportSource {

    public OutputStream getOutputStream();
}// TransportSource

Linkages

By separating the transport and content layers, we have a model that follows a part of the ISO 7-layer model: transport and presentation layers. The communication paths for a "pull" sink are

The classes involved in a "pull" sink could look like

where the choice of transport and content implementation is based on the interfaces supported by the source.

Implementation

A variety of implementations have built using these interfaces. The separation of transport and content (presentation) and the networking support built into Java means that the implementations are very small - typically just a few dozen lines.

A number of clients to link sources to sinks have also been built. The simplest just links any source to any sink. More complex graphical user interfaces have also been built, and here the bulk of the code lies in the Swing objects.

Scalability

Objects

In a normal service architecture, creating 10,000 services will create at least 10,000 objects. In Jini 2.0 using Jeri, this number will be substantially larger: the programmer will need to create an exporter for each service, and generate a proxy for each service. Behind the scenes, many more objects may be created.

We tested the memory requirements for such a large number of objects by writing a server which just created an object 10,000 times, created an exporter and proxy and exported the proxy. The results are shown in table XXX. Using a "larger" object, we got table XXX.

We did these tests with normal services, and then with Activatable services. Using activatable services requires use of an activation server such as rmi. Using activation means that the memory load is placed into the activation server, which caches services on disk and reactivates them at need. The figures involving service memory use, activation server memory use and activation server disk use are given in table XXX

Robin: depending on results, you may be able to say: no problems, or we need to put in place a new memory management scheme

Threads

Memory use

Registry size

Lookup service

Managing 10,000 services will put memory and processing load on the lookup service. These are given in table XXX.

Leasing

Each service will have an associated lease. Jini normally gives about 5 minutes per lease before it needs to be renewed. 10,000 leases will need renewal at the rate of about 30 leases per second. The network traffic will be ... and the processor usage will be ... (There should be figures for lease renewal for DHCP for comparison - someone must have studied DHCP o'heads for large networks).

User interfaces

Sources and sinks can attempt to link to each directly or via a third party agent. The Source and Sink interfaces form a first step in this. They may need to negotiate based on further interfaces that each implements. A sink service that records to a file on disk presents an interesting case that can be handled within this framework, but which adds additional information.

A service is defined by its contract. A sink must be able to record, or throw a known exception. A file sink will need to have a file selected. If none is selected, it could throw a NoFileSelectedException, but this would break the contract since a client may not know about this exception. So a file sink will need to be able to handle this case without complaint (say by discarding the file or saving it in a default file).

A file sink will expose an interface


public interface FileSink extends common.Sink {

    public boolean setFile(File sinkFile) throws RemoteException;

    /**
     * methods to browse the file system
     * Based on FileSystemView from JFileChooser
     */

    public File[] getFiles(File dir, boolean useFileHiding) throws RemoteException;
    public File getHomeDirectory() throws RemoteException;
    public File getDefaultDirectory() throws RemoteException;
    public File createNewFolder(File dir) throws RemoteException, java.io.IOException;

}// FileSink

which will allow any third party to browse and choose a sink file.

A GUI client will not be expected to know this interface, though (or any interface apart from Source and Sink). So it will not be able to choose a file unless the sink itself can provide a UI.

The Jini community has standardised a UI mechanism. This allows a service to specify one or more user interface objects, for example based on an AWT Frame or Swing JDialog. A client may choose to use such a UI based on its own preferences. However, the standard Jini UI will not quite handle the "file sink" situation. The Jini UI assumes that a client knows all the interfaces of a service, and is just replacing its own UI with that supplied by the service. Roles such as "main UI" allow the service to specify non-modal UI objects such as Frame or non-modal JDialog.

The requirement to choose a file before recording means that the standard Jini UI roles are not adequate. We have therefore added "Setup" and "Supplementary" roles to cover the cases where a service has extra interfaces that the client does not know about, but which may be needed in a modal or non-modal manner (a non-modal additional interface may be a volume control, for example).

Conclusion

We have presented an architecture for A/V systems that will scale to large numbers of services. The system is targeted towards simplicity while still retaining the ability for detailed service negotiation using multiple transport and middleware sytems.

There is much work to be done in exploiting this architecture by filling in the details of various content types. More importantly is to determine limits in service architecture scalability and how to deal with highly dynamic situations.

Jan Newmarch (http://jan.newmarch.name)