HTTP and REST

Resources

thttpd - tiny/turbo/throttling HTTP server 50k, supports CGI

Overview

HTTP is the transport protocol of the World Wide Web. It is not the protocol of the internet, although this confusion is often made. HTTP sits above TCP and logically is a connectionless protocol: an HTTP request is made, a response is returned and the connection is dropped.

In early versions of HTTP and early clients and servers, this indeed happened. With HTTP 1.1 a large number of optimisations were made to this model, such as the Keep-Alive to allow TCP connections to be kept open for multiple HTTP requests to be made over a single TCP connection.

HTTP allows a client such as a browser to request something from an HTTP server. The servers are labelled by their IP address or DNS name while the "objects" on the server are identified by the path part of the URL.

There is considerable flexibility in the HTTP model. REST applies restraints to this flexibility in order to conform to its distributed programming model.

URLs

URLs are the Web form of URIs, identifying resources by scheme (HTTP or HTTPS), by host and by path on that host. Additional parameters may be added after '?' to refine the URL.

All REST URLs are HTTP URLs, but not vice-versa. 2PartsMagic in REST-ful URI design offers good advice on designing URIs which qualify as RESTful URIs.

HTTP verbs

The REST verbs GET, PUT, POST and DELETE are also HTTP verbs. These can be used directly in HTTP requests as in

      
GET /IoT/index HTTP1.1
host: jan.newmarch.name
...

HTTP message headers

REST systems are able to take full advantage of most of the HTTP message headers, and many of them were designed with REST in mind. These include content negotiation headers such as Accept, optimisation headers such as If-Modified-Since and authentication/authorisation headers such as Authorization.

Status codes

Programming systems often have problems dealing with and signalling errors. The C language is a prime example: functions may return NULL, a negative integer, signal an error in a returned parameter and so on. Nothing consistent. Java can raise exceptions, and exception-handling code really messes up the readability of a program. The recent Go language allows functions to return tuples, where the second field is often used for success/fail codes.

HTTP responses include a Status code. These are good for almost every REST call. If a GET request works, return 200 OK. If it refers to a moved URL, return 301 Moved Permanently and so on. What REST does not approve of are things like a successful call telling there is an error

      
HTTP/1.1 200 OK

Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>Error!!!</title>
...

Wherever possible, use the Status codes in a meaningful way.

Link relations

Links are not part of the HTTP model but are a critical part of REST. The representation of a resource should contain links to related resources. This will allow a client to navigate from one resource to related resources. For example, a request GET books HTTP/1.1 should return a list of URLs for a set of books

The syntax of links is highly dependant of the representation language used: links expressed in JSON will have a different format to links expressed in HTML. For each format, there would hopefully be some standard or at least well-known form for the syntax of links.

The semantics of links may be even more indeterminate. The syntax may be described by e.g. XML schema, but eventually one comes down the meaning of phrases such as "is sub-part of". These can be given a formal meaning using a logic framework such as OWL but generally the English meaning should be clear.

IANA maintains a registry of Link Relations XML namespaces would be another solution. These are

HTML links

HTML as a representation language for URLs has a defined syntax for links e.g.:

      
<link rel="first" href="...">

The set of possible values are defined by HTML5 in section 6.12.3. IANA has a wider set of link values.

REST HTTP client-side programming APIs

REST doesn't add anything to HTTP programming. It just restricts the form that URLs should take and the intended semantics of the REST model. Any API that can make HTTP calls and handle responses is fine for REST. While there are some APIs that state they are intended for REST, they really just seem to be HTTP APIs anyway.

These include

Java: Chapter 20 Building RESTful Web Services with JAX-RS

REST HTTP servers: Apache

Apache has a large number of configuration options. If you want Apache to do something apart from just deliver files, the biggest problem is finding out which options to use!

Suppose I want my web site to point to the books I have written. My current website does that the "old" way, with lots of HTML and PNG files, all pointing explicitly to each other.

From the REST viewpoint, that is bad. I should not be labelling files "IoT/index.html" for example. I should have a structure that reflects the collection, the nouns in those collections and so on. Something like

      
/books
/books/IoT
/books/IoT/parts/
/books/IoT/parts/part1
/books/IoT/parts/part1/chapter1

Apache makes directories really easy: given a directory it will search for files index.html, index.cgi, index.pl, index.php, index.xhtm,l index.htm and return (or call) the first one it finds. This can be controlled by the <DirectoryIndex option in the Apache configuration files.

The /books URL should provide a list of books (it is plural). The file /books/index.html can do this by

      
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
    <link rel="collection" href="http://localhost/books/" />
    <title> Books </title>

  </head>

  <body>
    <h1> List of books </h1>
    <p>
      <ul>
        <li>
          <a href="http://localhost/books/IoT" rel="item">
            Internet of Things
          </a>
        </li>
        <li>
          <a href="http://localhost/books/RPi" rel="item">
            Programming the Raspberry Pi's GPU
          </a>
        </li>
      </ul>
    </p>
  </body>
</html>

The document itself is given an IANA link relation of collection. The individual books are given the IANA relation of item within the collection.

To follow the link /books/IoT we could again treat it as a directory, with included files such as index.html. Or it could be a standalone resource of its own, in which case we need to return a suitable representation.

If we are just building a static website, the representation might by stored in a file /books/IoT.html or /books/IoT.json, say. The Apache option Multiviews is a simple way to get Apache to look for files with the same basename but different extensions and use server-side negotiation to choose the most appropriate representation to return.