HTTP

HTTP is the "transport protocol" underlying the Web. It is the means whereby user agents make requests to Web servers and get replies.

HTTP

HTTP stands for HyperText Transport Protocol. It was designed alongside HTML and is the means whereby user agents (such as browsers) communicate with Web servers. Most users will be almost totally unaware of HTTP: they just enter URLs into the address bar of a browser or click on a link and something appears. Most times it will be something expected, but sometimes you will get "404 not found." This is a message at the HTTP level about a problem that has occurred, that usually you can't do much about. The other way in which you might interact with HTTP is when you have to set a "proxy": this is usually an HTTP server which acts as a "gateway" between you and the Web servers you want to look at.

Apart from that, you don't really need to know much about HTTP.

Resources and Representations

But you do need to know what it is that a Web server sends you, versus what it is on the server.

In the earliest days, Web servers just stored HTML documents as files. When you requested the document by its URI (or URL in this case) you got a copy of the document. You didn't get the document itself: that stayed on the Web server, ready for the next request which would return another copy. It would be like a library where you don't borrow books, but just borrow photocopies of the book made on demand: the original stays on the shelf.

Since then things have become more complex: there may not even be an original document stored, just one which is generated on the fly out of a database. As a user you don't care, and probably don't want to know anyway.

But the computer scientists behind this do want to make ensure that you don't really know how or in what form the 'thing' that you get is stored. To that end, it is not referred to as a document or a database but as a resource. That is, a URI (or URL here) points to a resource and how the resource is stored isn't relevant.

Then, when you ask for a resource, what you get is a representation of that resource. This is often an HTML document, but may be an image, a PDF document, a Word document or any other digital object.

Content Negotiation

Why are computer scientists increasingly pedantic about saying: when you use a URI to access a resource, you get a representation of that resource? Well, a lot of that has to do with what the browser can (or wants to) handle and what the server is prepared to give it.

You may know that digital images come in many formats: JPEG files, PNG files, GIF files and so on. When a browser requests an image, it might say "I like PNG best, but will cope with JPEG but nothing else." The Web server will have the image stored in some format, maybe even as a raw image. On receiving the request, it will copy this image and transform it to PNG format before sending it, or maybe convert and send it as JPEG.

Well, you may not notice this: PNG and JPEG representations of an image look much the same. But your browser could also say about a text resource: "I like German best, but can handle English and even French if I have to." The Web server (if it is nice) will give you the resource in German if it has it, or translate it if possible, or fall back to one of the other versions.

This is really good - most of the time. Sometimes things can go wrong, such as when I am travelling in China and the server delivers up the Chinese version of a page when I really want the English version. Then the Content Negotiation has gone wrong, as it has picked up my location as more important than my language preference.

This mainly happens under the hood, where you don't see it. You can influence some settings such as language by (in Forefox) going to Edit -> Preferences -> Content and then setting the Language choices.

However, underlying Content Negotiation is messages sent over HTTP by the user agent, and co-operation from whoever configured the Web server to deal with these messages appropriately.