The World Wide Web

During 1989 and 1990, Tim Berners-Lee, at CERN (the European centre for particle physics) in Geneva, developed a proposal for the internal development of a general mechanism for accessing all kinds of computer-based information. That proposal was accepted and work commenced in October 1990 with a commitment of $50,000 worth of computing equipment and two man-years of effort. The internal justification for such an investment was summarised in the brief “introduction” to the proposal:

The current incompatibilities of the platforms and tools make it impossible to access existing information through a common interface, leading to waste of time, frustration and obsolete answers to simple data lookup. There is a potential large benefit from the integration of a variety of systems in a way which allows a user to follow links pointing from one piece of information to another one. This forming of a web of information nodes rather than a hierarchical tree or an ordered list is the basic concept behind HyperText.

At CERN, a variety of data is already available: reports, experiment data, personnel data, electronic mail address lists, computer documentation, experiment documentation, and many other sets of data are spinning around on computer discs continuously. It is however impossible to “jump” from one set to another in an automatic way: once you found out that the name of Joe Bloggs is listed in an incomplete description of some on-line software, it is not straightforward to find his current electronic mail address. Usually, you will have to use a different lookup-method on a different computer with a different user interface. Once you have located information, it is hard to keep a link to it or to make a private note about it that you will later be able to find quickly.

During 1991 software produced by the development team was progressively released within CERN and that August a version was made available on the Internet. The fledgling World Wide Web was the subject of a poster presentation at Hypertext ‘91, and the contacts made, coupled with CERN’s familiarity with large scale collaboration, quickly gave the project a much more ambitious flavour. As one of the biggest of the world’s ‘big science’ projects, and as a joint venture of the European Community, CERN’s everyday business is dominated by collaborative activities.

By the time of the February 1993 ‘alpha’ release of the first graphical browser for the Web (see following section) Berners-Lee was happy to promulgate a much more ambitious view of the Web’s future:

The link between the internal justification for the development and its much wider potential deployment is recorded clearly in CERN’s policy statement which is, like other documents quoted in this section, readily available on the Web itself:

The basic aim of the project is to promote communication and information availability for the High Energy Physics (HEP) community.

… It is in the interests of HEP, CERN, and the project itself that it should interwork with systems and information in many other fields, and so active collaboration with other groups is essential.

The fundamental functionality of the Web is provided through the combination of server software which typically runs on a ‘back end’ computer where the information to be served also resides and client software on the personal computer or workstation where the information is viewed. The Web introduced a small set of interrelated practices which enable the clients and servers to communicate and which have become enshrined as RFCs (draft standards) of the Internet Engineering Task Force—a system for maintaining and developing Internet facilities which is open to the voluntary participation of all comers. The key Web standards are HyperText Transfer Protocol (HTTP) for communication between clients and servers, Universal Resource Locators (URLs) which provide a unique address for each piece of information on the Web, and HyperText Markup Language (HTML) which provides platform-independent means of marking up text with structural information and of embedding URL links. Beyond HTTP, other Internet protocols which Web clients can ‘speak’ include FTP, WAIS, Gopher, and NNTP, the network news protocol, while Web servers respond with messages containing the requested information in the MIME format first defined for multimedia mail extensions. It is the responsibility of the client software to display the requested information in accordance with the hardware and software capabilities of the client, including ‘helper’ applications which may be called up to display images or to play sound or movie clips.

Depending on your perspective, the Web may be seen as the practical embodiment of Ted Nelson’s vision of a global hypertext—a vision which provided important inspiration to Berners-Lee’s original proposal. It may be seen as a rapidly growing collection of information contributed by countless organisations and individuals around the planet. Alternatively, it may be seen as the server, client and support software which was first developed at CERN and which is being continually improved by the developer community—servers such as HTTPD for Unix or MacHTTP; clients such as Lynx for the many people who are still confined to text only access, or Mosaic or Netscape for those with graphical display capabilities; and HTML authoring software such as HotMetal from leading SGML software developer SoftQuad. Or the Web may be seen as a set of defining standards, particularly HTTP, HTML and URL, which were invented by Berners-Lee to make the whole thing possible. We will look in some detail at the vital role of the graphical clients in the next section, after concluding this section by quoting some descriptions of those key standards on which the Web was built, and a note about some newer standards being built in turn on top of the Web.

HTTP is an application-level protocol with the lightness and speed necessary for distributed, collaborative, hyper media information systems. It is a generic, stateless, object-oriented protocol which can be used for many tasks, such as name servers and distributed object management systems, through extension of its request methods (commands). A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.

Although W316 uses many different formats, (HTML) is one basic format which every W3 client understands. It is a simple SGML document type allowing structured text with links. The fact that HTML is valid SGML opens the door to interchange with other systems, but SGML was not chosen for any particular technical merit. HTML describes the logical structure of the document instead of its formatting. This allows it to be displayed optimally on different platforms using different fonts and conventions.

(HTML) is a simple markup language used to create hypertext documents that are portable from one platform to another. HTML documents are SGML documents with generic semantics that are appropriate for representing information from a wide range of applications. HTML markup can represent hypertext news, mail, documentation, and hypermedia; menus of options; database query results; simple structured documents with in-lined graphics; and hypertext views of existing bodies of information.

When you are reading a document, behind every link there is the network-wide address of the document to which it refers. The design of these addresses (URLs) is as fundamental to W3 as hypertext itself. The addresses allow any object anywhere on the Internet to be described, even though these objects are accessed using a variety of different protocols. This flexibility allows the web to envelop all the existing data in FTP archives, news articles, and WAIS and Gopher servers.

(URLs are) basically physical addresses of objects which are retrievable using protocols already deployed on the net. The generic syntax provides a framework for new schemes for names to be resolved using as yet undefined protocols. …

A complete URL consists of a naming scheme specifier followed by a string whose format is a function of the naming scheme. For locators of information on the internet, a common syntax is used for the IP address part.

Far beyond, yet made possible by those original central standards, the open-ended Web design has facilitated the development of high-level formats for specific data:

In certain fields, special data formats have been designed for handling for example DNA codes, the spectra of stars, classical Greek, or the design of bridges. Those working in the field have software allowing them not only to view this data, but to manipulate it, analyse it, and modify it. When the server and the client both understand such a high-level format, then they can take advantage of it, and the data is transferred in that way. At the same time, other people (for example high school students) without the special software can still view the data, if the server can convert it into an inferior but still useful form. We keep the W3 goal of “universal readership” without compromising total functionality at the high level.

However Berners-Lee’s goals, designs and implementations were not sufficient to establish the success of the Web. That also took some similar, but even less planned and justified development work by a somewhat less credentialed team located at another major research facility in the heart of another continent. That work at the National Centre for Supercomputing Applications (NCSA) was led by a then 21 year old computer science undergraduate student, and made the Web easy to access by a broader community at universities and similarly equipped institutions around the world.