Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesThe Backbone of the Web


October 1996 / Core Technologies / The Backbone of the Web

A look at HTTP version 1.1, a necessary new Internet standard.

William Stallings

The Hypertext Transfer Protocol is the foundation protocol of the World Wide Web. The name is somewhat misleading. HTTP is not a protocol for transferring hypertext; it is a protocol for transmitting information with the efficiency necessary to make hypertext jumps. The data transferred by the protocol can be plain text, hypertext, audio, images, or any Internet-accessible information. Information in this article is based on the most recent (June 7, 1996) specification -- HTTP 1.1, draft 05 -- which has been forwarded to the Internet Engineering Standards Group as a proposed standard.

HTTP is a transaction-oriented client/server protocol. To en sure reliability, HTTP uses TCP. Nevertheless, HTTP is a "stateless" protocol: It treats each transaction independently. A typical implementation will create a new TCP connection between client and server for each transaction, then terminate the connection as soon as the transaction completes. However, t he specification does not require this one-to-one relationship between transaction and connection lifetimes; i.e., the connection can stay open so that more transactions can be made.

The stateless nature of HTTP is well-suited to its typical application. A normal Web session involves retrieving a sequence of pages and documents. The sequence is, ideally, performed rapidly, and the locations of the various pages and documents may be widely distributed among a number of servers, located across the country or around the globe.

The figure "Types of HTTP Transfers" illustrates three examples of HTTP operations. The user agent is the client, such as a W eb browser, that initiates the request. The origin server is the server on which a resource resides; an example is a Web server where a desired home page is located. The simplest case is one in which a user agent establishes a direct connection with an origin server. The client opens a TCP connection that is end-to-end between the client and the server. The client then issues an HTTP request. The request consists of a specific command (referred to as a method ), a URL, and a message containing request parameters, information about the client, and perhaps additional content information.

When the server receives the request, it attempts to perform the requested action and then returns an HTTP response. The response includes status information, a success/error code, and a message containing information about the server, information about the response itself, and possible body content. The TCP connection is then closed.

The middle section of the figure shows a case in which there is not an end-to- end TCP connection between the user agent and the origin server. Instead, there are one or more intermediary systems with TCP connections between logically adjacent systems. Each intermediary system acts as a relay, so that a request initiated by the client is relayed through the intermediary systems to the server, and the response from the server is relayed back to the client.

The Machine in the Middle

The HTTP spec defines three forms of intermediary systems: proxy, gateway, and tunnel (see the figure "Intermediary HTTP Systems" ). A proxy acts on behalf of other clients and presents requests from other clients to a server. There are several scenarios that call for the use of a proxy. In one scenario, the proxy acts as an intermediary through a firewall. In this case, the server must authenticate itself to the firewall to set up a connection with the proxy. The proxy accepts responses after they have passed through the firewall. Another scenario involves handlin g different versions of HTTP. If the client and the server are running different versions of HTTP, then the proxy can implement both versions and perform the required mapping.

A gateway is a server that appears to the client as if it were an origin server. It acts on behalf of other servers that may not be able to communicate directly with a client. There are several scenarios in which servers can be used. As with the proxy, a gateway manages transfers through a firewall. In this case the client must authenticate itself to the proxy, which can then pass the request on to the server.

Another common scenario involves working with a non-HTTP server. Browsers have built into them the capability to contact servers that use protocols other than HTTP, such as FTP and Gopher servers. This multiprotocol capability can also be provided by a gateway.

A tunnel is simply a relay point between two TCP connections. HTTP messages are passed unchanged as if there were a single HTTP connection between user agent an d origin server. Tunnels are used when there is an intermediary system between client and server, but it is not necessary for that system to understand the contents of messages.

Now let's take a look at another type of HTTP operation. A cache is a facility that stores previous requests and responses for handling new requests. If a new request arrives that uses the same stored request, then the cache can supply the stored response rather than access the resource indicated in the URL. The cache can operate on a client or on a server or on an intermediary system other than a tunnel. In the figure "Types of HTTP Transfers," a server has cached a request/response transaction, so a corresponding new request from the client need not travel the entire chain to the origin server; instead, the cache server handles it. Not all transactions can be cached, and a client or a server can dictate that a certain transaction may be cached only for a given amount of time.

HTTP Messages

HTTP messages comprise two types: request and response. A request message is sent by an agent to a server to initiate some action. A response message is returned by a server to an agent in response to a request. Some possible actions are:

  • GET: A request to retrieve information.
  • POST: A request to accept the attached entity as a new subordinate to the identified URL.
  • PUT: A request to accept the attached entity and store it under the supplied URL. This may be a new resource with a new URL, or it may be a replacement of the contents of an existing resource with an existing URL.
  • DELETE: Requests that the origin server delete a resource.

A response message may include an entity body containing hypertext-based information. In addition, the response message must specify a status code, which indicates the action taken on the corresponding request. Status codes are organized into the following categorie s:

  • Informational: The request has been received and processing continues. No entity body accompanies this response.
  • Successful: The request was successfully received, understood, and accepted.
  • Redirection: Further action is required to complete the request.
  • Client error: Request contains a syntax error or request cannot be fulfilled.
  • Server error: The server failed to fulfill an apparently valid request.

We Need This Standard

HTTP is the foundation of the World Wide Web. This request/response protocol used on top of TCP carries commands from browsers to servers and responses from servers back to browsers. As the explosive growth of the Web continues, and as new features are added to both browsers and servers, a standardized transfer protocol is essential to maintain the Web's growing functions and interoperability. HTTP provides the standardized definition re quired to meet these needs.


Types of HTTP Transfers

illustration_link (26 Kbytes)

HTTP supports operations via direct connection, intermediary systems, or a cache.


Intermediary HTTP Systems

illustration_link (18 Kbytes)

HTTP allows requests and responses to pass through disparate systems that build the network.


William Stallings is a consultant and author of over a dozen books on data communications and networking. This article is based on material from his most recent book, Data and Computer Communications (Prentice-Hall, 1996). You can reach him at ws@shore.net .

Up to the Core Technologies section contentsGo to previous article: Go to next article: The Server's HelperSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network