Ted received his PhD in mathematics in 1996, answering open problems in complexity theory and infinite colorings for ordered sets, and proceeded with post-doctoral research in component and Web-based collaborative technologies. Following work at Java Software, Sun Microsystems, he was a device management and XML architect at Wind River, participating in the IETF NETCONF design team. Ted currently participates in the JavaServer Faces and Servlet expert groups and is a senior software architect at ICEsoft Technologies developing ICEfaces, an Ajax framework for JavaServer Faces. Ted is a DZone MVB and is not an employee of DZone and has posted 8 posts at DZone. You can read more from them at their website. View Full User Profile

WebSocket is neither Web nor Socket

11.04.2008
| 12919 views |
  • submit to reddit
The HTML 5 proposal contains many new and interesting ideas. To provide some background, let's take a closer look at WebSocket and consider it against potential small modifications to HTTP usage (such as keeping TCP connections distinct when "window" objects are distinct).

WebSocket essentially functions as follows: after initiating an HTTP connection, the client requests an HTTP "Upgrade: WebSocket", whereupon the underlying TCP connection is used to bidirectionally stream 0xFF-separated UTF-8 messages. We can now look at the specifics in more detail:

Does WebSocket use TCP ports 81 and 815?
The use of new ports requires firewall and proxy configuration which will be resisted by many IT administrators. Moreover, these ports appear to be suggested without consulting IANA. The ports appear to be available, but it is a matter of due process to consult IANA before appropriating well-known ports. The most likely outcome for widespread use is that WebSocket would "upgrade" port 80 as below.

How does WebSocket make use of an HTTP connection on port 80?

20 48 54 54 50 2f 31 2e  31 0d 0a 55 70 67 72 61
64 65 3a 20 57 65 62 53 6f 63 6b 65 74 0d 0a 43
6f 6e 6e 65 63 74 69 6f 6e 3a 20 55 70 67 72 61
64 65 0d 0a
For documenting the protocol, perhaps it would make sense to simply give the client/server interaction in ASCII, rather than specifying the exact sequence of bytes used to interact with the remote HTTP server for "Upgrade: WebSocket". Note here that the flexibility of HTTP is being used effectively.

Does WebSocket obey the same origin policy?
The "same origin policy" is one of the cornerstones of web security. Essentially, executable page content can only establish a connection to the server that the user has loaded the page from. Many of the recent security exploits on the web (such as the gmail address book exploit and clickjacking) arise because of subtle breakdowns in same-origin enforcement. It is not clear whether WebSocket is intended to follow the same-origin policy or not (a failure condition when the URL does not refer to the originating host is not documented) but for the safety of the web, we should insist that this policy remain in place.

Is WebSocket restricted to the two-connection limit of HTTP?
This does not appear to be specified. However, since the WebSocket protocol makes no use of metadata, chaos would ensue if a single connection was used to multiplex the traffic of different WebSocket instances. The most natural interpretation is that a new TCP socket is created for each JavaScript construction of a WebSocket object. Typical usage, such as for standalone Ajax components, would have a WebSocket created for each component on the page, potentially resulting in hundreds of connections to the server. Strangely enough, the two-connection limit is the only fundamental aspect that makes using HTTP for Ajax Push difficult, and if we had control over how XMLHttpRequest used the underlying TCP connections, we would be in much better shape. The most dramatic benefit (and greatest risk to scalability) of WebSocket must not be unspecified. Note that socket establishment is expensive, so providing a way to multiplex different endpoints of a protocol over a single connection (as HTTP can) is a useful optimization.

Can WebSocket read and write arbitrarily as with low-level socket APIs?
WebSocket communication is restricted to the WebSocket protocol (which includes the connection setup and the 0xFF-delineated UTF-8 messages). It is argued that this improves security because WebSocket clients are unlikely to be able to attack existing network services. However, if WebSocket becomes popular, the majority of internet-facing systems will have applications that are vulnerable to attack through their WebSocket interface. Is the short-term benefit worth the long-term loss in flexibility (especially considering that a variety of existing plugins allow low-level socket interaction with the originating host).

How does WebSocket delineate messages?
WebSocket framing terminates messages with 0xFF. This is efficient in terms of byte usage, but framing errors could easily occur due to stray binary data (and keep in mind that a framing failure is a critical failure in a protocol). Further, detecting such framing errors would not be obvious from inspecting the TCP stream. (In contrast, MIME framing is unambiguous and requires no internal escaping of binary messages.)

How are function call semantics implemented over WebSocket?
WebSocket enforces no relationship between messages sent and received; multiple messages may be received from the server subsequent to a client message being sent to the server. This is not necessarily a drawback of the protocol, it is simply important to keep in mind that the request/response structure familiar on the web with HTTP is not enforced by WebSocket.

Is WebSocket easy to implement?
On the surface, implementation is straightforward; however, it is important to note that writing can occur simultaneously at both ends of the connection. If both ends attempt to write an amount larger than their TCP output buffers, deadlock can occur. The point here is not that the protocol should be designed to avoid simultaneous writing (as with HTTP 1.0) -- this is necessary to obtain the event-based interactivity we are after. The point is that WebSocket implementations added ad-hoc to many different applications would lead to problems; in other words, ease of implementation is not as important as correctness in the protocol.

Can we just upgrade HTTP?
So, it appears that one interpretation is that the greatest benefit of WebSocket is its unspecified behavior in terms of TCP connections. Are there simple things that we can do to improve HTTP for use with Ajax Push and Comet? After all, we want to make use of the framing and metadata features of HTTP, as well as benefit from its many standard and widely deployed implementations.

The first step is to allow HTTP to benefit (in a reasonable way) from what is unspecified connection behavior with WebSocket: if two JavaScript object contexts do not share a "window" object, they should not (by default) share TCP connections. This would allow multiple browser windows/tabs to open notification connections to the server without interference and without complex inter-window coordination for the purpose of sharing a single connection. This step requires no API or protocol changes.

The next step is to fully support HTTP 1.1 from the browser (specifically, pipelining). By calling enablePipelining(true) on an XMLHttpRequest object, multiple push notification requests could be sent to the server without waiting for one of the two TCP connections to be freed. When a notification was available for one of the requests, all intermediate requests would be unblocked with no-op responses. Again, this would allow more straightforward multi-window push implementations.

Finally, we should consider extensions to the HTTP protocol itself, since a flurry of no-op responses when many windows are open is not efficient. With the introduction of a RequestTag HTTP header, an HTTP response could be uniquely associated with a request (other than by virtue of its order in the queue). This would allow out-of-order responses to pipelined requests, and would make it possible to use HTTP in an event-driven fashion. Note that this is not just useful for notification-style applications; control over response ordering can reduce latency and server buffering requirements. With support for out-of-order responses, it would be desirable to have control over which TCP connection is used for a given request. This could be controlled through an optionally specified connection "name".

From Ted's Ajax Adventure

Published at DZone with permission of Ted Goddard, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Andres Almiray replied on Thu, 2008/11/06 - 8:59pm

An excellent companion to the panel about Comet/WebSockets that was held at the Googleplex earlier this month, wait, Ted you were on that panel?! wish you had brought these issues to the table, woulkd love to hear what James (Kazzing) and Mike  (Orbited) think.

Cheers,
Andres

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.