Upto: Table of Contents of full book "Internet of Things - a techie's viewpoint"

REST

Resources

DrupalCon Munich 2012: Designing HTTP Interfaces and RESTful Web Services by David Zuelke
IANA: Link Relations
Richardson Maturity Model by Martin Fowler
White House Web API Standards
RESTful API Modeling Language (RAML)

Introduction

Distributed programming refers to computing systems that run across two or more computers. There are many high-level models such as peer-to-peer and client-server. The majority of distributed systems are client-server and include the Web, email, remote file systems and many others.

Within each of these styles is considerable variation. In client-server systems a server will wait for messages to come from clients and will typically reply to these messages. However, the messages can vary considerably in structure, and can emulate systems such as procedure calls or request-response systems.

One of the most popular styles for many years was the remote procedure call in which the client would issue a message that looked like a procedure call, and the server would respond with a message that looked like the return value from procedure call. Such systems included Sun's RPC, Corba and SOAP. These have fallen out of favour more recently with the realisation that distributed programming cannot be made to look like local procedure calls.

The current flavour of the month is REST, standing for Repesentational State Transfer. This was devised by Roy Fielding in his PhD thesis as an abstraction from HTTP, of which he was a principal architect.

Eight fallacies of distributed computing

Sun Microsystems was a company that performed much of the early work in distributed systems, and even had a mantra "The network is the computer." Based on their experience over many years a number of the scientists at Sun came up with the following list of fallacies commonly assumed:

The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn't change.
There is one administrator.
Transport cost is zero.
The network is homogeneous.

Many of these directly impact on network programming. For example, the design of most remote procedure call systems is based on the premise that the network is reliable so that a remote procedure call will behave in the same way as a local call. The fallacies of zero latency and infinite bandwidth also lead to assumptions about the time duration of an RPC call being the same as a local call, whereas they are magnitudes of order slower.

The internet is an unstable structure: nodes are always being added and sometimes removed; nodes in the network may disappear permanently or for short periods; new routes are constantly added and existing routes may be upgraded to carry more traffic or become overloaded and fail to carry any particular messages.

The internet is a mixture of different physical devices and connections: from T1 lines to low bandwidth IEEE 802.15 wireless, from mainframes to microcontrollers.

The problem is to build systems that can cope with such variety, unreliablity and change. A system designed for such conditions was Jini from Sun Microsystems, but that suffered by the Enterprise Programming wars between Sun and Microsoft and was never properly supported by Sun.

URIs and resources

Resources are the "things" that we wish to interact with on a network or the internet. I like to think of them as objects, but there is no requirement that their implementation should be object-based - they should just "look like" a thing, possibly with components.

Each resource has one or more addresses known as URIs (uniform resource indentifiers). These have the generic form

      
	scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]

Typical examples are URLs (uniform resource locator), where the scheme is 'http' or 'https' and the host refers to a computer by its IP address or DNS name as in

      
	https://jan.newmarch.name/IoT/index.html

But there may be others, such as URNs (uniform resource names) for books identified by their ISSN, such as

      
	urn:ISSN:1535-3613

URIs and REST

In the early days of HTTP and HTML, a URL did not only denote a location, but also the type of a resource. For example, a resource ending in ".html" is obviously an HTML document, while one ending in ".png" is a graphics file in PNG format. REST would encourage you to think at a level above this: an HTML URL is really just a particular document, with particular content, that happens to be rendered in HTML. Similarly, a PNG URL is really just an image which happens to be in PNG format.

By abstracting like this, a format-specific URI such as https://jan.newmarch.name/IoT/index.html should instead by given by a URI which stands for just the document:

      
	https://jan.newmarch.name/IoT/index

This URI is the index of this book, without any regard to the form in which it is kept by the server or sent to a client.

How do we get an HTML document from this URI? An HTML document is just one possible representation of the index - it could be given as a Word document, a PDF file or even a JSON string, and still represent the index. The server will determine on some grounds (such as the Accept flag in an HTTP request) which possible representation to send to the client.

This is one of the keys to REST: URIs identify resources, and requests for that resource return a representation of that resource - the resource itself remains on the server and is not sent to the client at all. In fact, the resource might not even exist at all in any concrete form: for example, a representation might be generated from the results of an SQL query which is triggered by making a request to that URI.

The REST approach to designing URIs is still a bit of an art form. Legal (and perfectly legitimate) URIs are not necessarily "good" REST URIs, and many examples of so-called RESTful APIs have URIs that are not very RESTful at all. 2PartsMagic in REST-ful URI design offers good advice on designing appropriate URIs.

REST verbs

You can make certain requests to a URI. If you are making an HTTP request to a URL, HTTP defines the requests that can made: GET, PUT, POST, DELETE, HEAD, OPTIONS, TRACE, CONNECT and possible extensions. There is only a limited number of these! This is very different to what we have come to expect from O/O programming. For example, the Java JLabel has about 250 methods, such as getText, setHorizontalAlignment, etc.

REST is now commonly interpreted as taking just four verbs from HTTP: GET, PUT, POST, DELETE. GET roughly corresponds to the getter-methods of O/O languages while PUT roughly corresponds to the setter-methods of O/O languages. if a JLabel were a REST resource (which it isn't), how would one single GET verb make up for the the hunded or so getter-methods of JLabel?

The answer lies in the URIs. A label has the properties of text, alignment and so on. These are really sub-resources of the label and should be written as sub-URIs of the label. So if the label had a URI of http://jan.newmarch.name/my_label, then the subresources would have URIs

      
	http://jan.newmarch.name/my_label/text
	http://jan.newmarch.name/my_label/horizontalAlignment

and so on. If you want to manipulate the text of the label, then you use the URI of the text resource, not a getter-method on the label itself.

The GET verb

To retrieve a representation of a resource, you GET the resource. This will return some representation of the resource. There may be innumerable possibilities to this choice: for example, a request for this book's index might return a representation of the index in French, using the UTF8 character set, as an XML document, or many other possibilities.

REST does not particularly talk about possibilities for negotiating the representation returned. HTTP 1.1 has an extensive section, considering server, client and negotiation. The Accept headers can be used by the client to specify, for example

      
	Accept: application/xml
	Accept-Language: fr
	Accept-Charset: utf8

The GET verb is required to be idempotent. That is, repeated requests should return the same results (to within representation type). For example, multiple requests for the temperature of a sensor should return the same result (unless of course the temperature has changed).

Idempotency by default allows for caching. This is useful for reducing traffic on the web, and may save battery power for sensors. Caching cannot always be guaranteed: a resource that returns the number of times it has been accessed will give a different result each time it is accessed. This is unusual behaviour, and would be signalled by the HTTP Pragma no-cache.

The PUT verb

If you want to change the state of a resource, you can put new values. There are two principal limitations to PUT:

You can only change the state of a resource whose URI you know
The representation you send must cover all components of the resource

For example, if you only want to change the text in a label, you send the PUT message to the URL http://jan.newmarch.name/my_label/text, not to http://jan.newmarch.name/my_label. Sending to the label would require all of the hundred or so fields to be sent.

PUT is idempotent, but is not safe. That is it changes the state of the resource, but repeated calls change it to the same state.

The DELETE verb

This deletes the resource. It is idempotent but not safe.

The POST verb

POST is the do-everything-else verb to deal with situations not covered by the other verbs. There is agreement about two uses of POST:

If you want to create a brand new resource and you don't know its URI, then POST a representation of the resource to a URI that knows how to create the resource. The returned representation should contain the URI of the new resource
If a resource has many attributes, and you only want to change one or a few of them, then POST a representation with the changed values only

There is intense argument about the respective roles of PUT and POST in edge cases. If you want to create a new resource and do know the URI it will have, then you could use either PUT or POST. Which one you choose seems to depend on other factors...

I've seen the mockery made of the HTTP philosophy by SOAP, an abomination that was carried over to UPnP, by using POST for everything. HTML continues to use POST in Forms when it should have the option of using PUT. For these reasons I do not use POST unless I absolutely have to. I suppose others have their own principled reasons for using POST instead of PUT, but I have no idea what they might be :-).

Due to its open-ended scope, POST could be used for almost enything. Many of these uses could be against the REST model, as is amply illustrated by SOAP. But some of these uses could be legitimate. POST is usually non-idempotent and not safe, although particular cases could be either.

HATEOAS

HATEOAS stands for "Hypermedia as the Engine of Application State". It is generally recognised as an awful acronym, but it has stuck. The basic principle is that navigating from one URI to another which is related in some way, should not be done by any out-of-band mechanism but that the new link must be embedded in some way as a hyperlink within the representation of the first URI.

REST does not state the format of the links. They could be given using the HTML link tag, by URLs embedded in a JSON document or by links given in an XML document.

And REST also does not explicitly state the meanings of the links nor how to extract the appropriate links. Fielding states in his blog REST APIs must be hypertext-driven

A REST API should be entered with no prior knowledge beyond the initial URI (bookmark) and set of standardized media types that are appropriate for the intended audience (i.e., expected to be understood by any client that might use the API). From that point on, all application state transitions must be driven by client selection of server-provided choices that are present in the received representations or implied by the user’s manipulation of those representations.

IANA maintains a registry of relation types IANA: Link Relations which can be used. The Web Linking RFP5988 describes the web linking registry. The HTML 5 specification has a small number of defined relations, and points to Microformats rel values for a larger list

Mechanisms such as cookies, or external API specifications such as WSDL for SOAP are explicitly excluded by REST: they are not hyperlinks contained in the representation of a resource.

Representing links

Links are standardised in HTML documents. The Link tag defines an HTML element that can only appear in an HTML header section. For example, a book with chapters, etc might look like this if the links were given as HTML link elements:

	
<html>
  <head>
    <link rel= "author" title="Jan Newmarch" href="https://jan.newmarch.name">
    <link rel="chapter" title="Introduction" href="Introduction/">
    ...

For JSON, the format is not normalised. The REST cookbook notes the lack of standardisation and points to the W3C specification JSON-LD 1.0 "A JSON-based Serialization for Linked Data" and to the HAL - Hypertext Application Language Bodies such as the Open Connectivity Foundation (later chapter) seem to use their own home-grown format.

It is worth noting in this regard that the W3C also has a specification of an HTTP Link header which may be returned by a server to a client. This is used by JSON-LD, for example, to point to a specification of the JSON document contained in the body of an HTTP response.

Transactions with REST

How does REST handle transactions? They were not discussed in the original thesis by Fielding.

The Wikipedia for HATEOAS gives a poor example of managing transactions. It starts from an HTTP request of

      
GET /account/12345 HTTP/1.1
Host: somebank.org
Accept: application/xml
 ...

which returns an XML document as representation of the account

      
HTTP/1.1 200 OK
    Content-Type: application/xml
    Content-Length: ...

    <?xml version="1.0"?>
    <account>
       <account_number>12345</account_number>
       <balance currency="usd">100.00</balance>
       <link rel="deposit" href="http://somebank.org/account/12345/deposit" />
       <link rel="withdraw" href="http://somebank.org/account/12345/withdraw" /> 
       <link rel="transfer" href="http://somebank.org/account/12345/transfer" />
       <link rel="close" href="http://somebank.org/account/12345/close" />
     </account>

This gives the URIs of the related resources deposit, withdraw, transfer and close. However, the resources are verbs not nouns, and that is not good at all. How do they interact with the HTTP verbs? Do you GET a deposit? POST it? PUT it? What happens if you DELETE a deposit - is that supposed to rollback a transaction or what?

The better way, as discussed in e.g. the Slashdot posting Transactions in REST? is to POST to the account asking for a new transaction to be created:

      
POST /account/12345/transaction HTTP/1.1

This will return the URL of a new transaction

      
http://account/12345/txn123

Interactions are now carried out with this transaction URL, such as by PUT-ting a new value which performs and commits the transaction.

      
PUT /account/12345/txn123
<:transaction>
  <from>/account/56789</from>
  <amount>100</amount>
</transaction>

A more detailed discussion of transactions and REST is given by Mihindukulasooriya et al in Seven Challenges for RESTful Transaction Models

Richardson maturity model

Many systems claim to be RESTful. Most are not. I even came across one that claimed that SOAP was RESTful, a clear case of a warped mental state. Martin Fowler discusses the Richardson Maturity Model which classifies systems according to their conformance to REST.

RAML

No content yet.

Conclusion

REST is the architectural model of the Web. It can be applied in many different ways, particularly as HTTP and CoAP which are discussed in later chapters.

Copyright © Jan Newmarch, jan@newmarch.name

"The Internet of Things - a techie's viewpoint" by Jan Newmarch is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://jan.newmarch.name/IoT/.

If you like this book, please donate using PayPal