The World Wide Web is pervasive throughout all computer-literate societies nowadays. It began in an era when people were experimenting with ways of sharing and finding information on computer networks. There was Archie, WAIS, Gopher and many other systems. The World Web Web hit the "sweet spot" in many ways which meant that it succeeded while the others disappeared.

The reasons why it was so successful can be debated. For me, it was a means back in 1995 whereby I could write my lecture notes in a simple way, using non-proprietary software on a Linux computer and my students could see them using non-proprietary software on an otherwise locked down system, Windows. The ability to include graphics was the turning point for me, something not offered by the other systems at the time. The wake-up call that I was on the right path was when one of my students rang me from work saying that he couldn't make the class but that was okay since he was reading my notes from his office.

Nowadays that is a mundane activity. There are billions of pages online and many people just use the Web as second nature to retrieve knowledge in a way that would have been simply inconceivable even twenty years ago.

But we all know that the Web isn't perfect. Even using a search engine such as Google, Bing or Baidu can be tedious, reading and discarding many items before you (hopefully) find the right one. There are nowadays training courses in how to use search engines properly.

Librarians for years have been busy cataloguing information to make it available to their customers. Their work has moved from catalogue cards to online systems using records such as MARC and there is continuing interest in how to exploit the new technologies. Is the Web itself that technology, or is there something beyond that which would be better? Well, the protagonists of the Semantic Web claim that indeed, they have the right technology and yes, it does have what it takes to produce the library catalogues for the twenty-first century. So what is it?

The answer lies back in the computer science discipline of Artificial Intelligence. This had its origins early in the life of computers, with programs such as the Logic Theorist showing remarkable results. But AI has been on a boom/bust cycle from the beginning with extravagant claims shown to be vacuous, which were just followed by more extravagant claims. See the Wikipedia: History of artificial intelligence article for more information.

I published a few papers in the AI field back in the '80s. Nothing special and certainly not really contributing much to the relationship between mind and the computer. I attended the first AI conference to be held in Australia in Sydney in 1987 and was disgusted by Marvin Minsky's claim that his outrageous predictions had been let down by the failure to deliver by the 1,200 or so computer scientists in front of him.

I later investigated some problems discarded as trivial by this community only to discover that they had a staggering number of complexities of their own, and never came back to it.

A keystone of much of the work of the AI community was based in formal theories based on logic. There are serious mathematical and computational issues in this work, and progress has been slow - but steady. However, it has failed to break out of the straitjacket of absolutism, and cannot yet reliably deal with imprecise reasoning (Zadeh's fuzzy logic not withstanding).

With the success of the Web, the AI community took to the boards again, with the support of the inventor of the World Wide Web, Tim Berners-Lee. The core of the idea now is that since you have a constrained environment, the Web, it should be possible to apply the constrained reasoning techniques from AI to this. However, this cannot be done from within the Web itself - any attempt to tamper with it and force it into the mould of AI reasoning will either destroy it, or more likely, just be ignored.

The Semantic Web then, has to be a parallel structure to the Web itself. It has to be constructed all by itself - just extracting all the "useful" information out of the Web itself would amount to solving to AI problem, and that is still a long way off! The idea is that it will provide a rigorous information framework in which information can be linked, found in complex ways and provide a rich environment without the vagaries of the current Web.

This additional effort to notate every Web resource with information to make it a component of the Semantic Web will require an enormous effort. The average user who creates a Web page, or who uploads a movie to YouTube, who creates a blog on BlogSpot, or who Twitters a message just isn't going to do it.

There is an opposing view to the Semantic Web. That is, the ongoing improvements in search engine technology and the breakthroughs in the "Social Web" that continually occur (Facebook, Twitter, LinkedIn and many others) are developing a "soft" semantic web which is at least increasing in richness - but maybe not reliability!

I confess to supporting the views of what I just called the soft semantic web. The idea that there would be a group of people busy doing formal document markup of their own accord just seemed silly to me. But then I realised that there was a very busy and dedicated group of paid people, called library cataloguers! This book is for you: the book shows that there is a way of taking your work beyond the traditional restraints of cataloguing into a way that will extend its scope and reach.

This book attempts to explain the technologies behind Linked Data and the Semantic Web. As you hopefully have realised, I'm not going to hype the technology and at times might even appear cynical. But I intend to show you what is going on, and it is up to you to decide on its value for your profession. Just as long as you don't fall into the AI hype and believe that you are solving the problems of the mind!


Copyright © Jan Newmarch, jan@newmarch.name

