Making HTML5 WebSockets Work
WEBINAR: On-demand webcast
How to Boost Database Development Productivity on Linux, Docker, and Kubernetes with Microsoft SQL Server 2017 REGISTER >
(Making it work) takes a little time - Doug & the Slugs
Back in March of 2009, I wrote an article entitled Comet Programming: Using Ajax to Simulate Server Push. Comet is a Web application model that enables web servers to send data to the client without having to explicitly request it. Both polling and long polling Comet techniques can be emulated using Ajax. In polling, the client asks for updates at a set interval; in long polling, a client-initiated connection is held open by the server until it has data to send over. In both cases, these hacks are detrimental to server bandwidth and performance.
Only WebSockets provide truly bi-directional and simultaneous data flow, allowing for server push and pulls over a single TCP socket. It is the most technically challenging HTML5 feature to implement, but for truly interactive websites, it's a technology well worth learning. In today's article I'm going to point out some of the most notable caveats that may hamper you in your quest to master WebSockets.
Where We're At With WebSockets
The source of most barriers to the implementation of WebSockets can no doubt be attributed to the early stage and scope of this particular technology. Unlike some simple HTML5 features like new form controls, this is a technology that spans across browsers, servers, and specific socket protocols utilized. As all of these elements evolve, keeping them in any kind of stable state has been a challenge thus far to say the least.
According the the caniuse.com website, only two out of the five A browsers currently support the latest WebSocket protocol, while two are still only offering partial support (see explanation below table). Finally, Internet Explorer is "experimenting" with WebSockets with the possibility of offering support in IE 10:
|10~||6.0||14.0||5.0 (partial support*)||11.0 (partial support*)|
* Partial support refers to the WebSockets implementation using an older version of the protocol and/or the implementation being disabled by default due to security issues with the older protocol.
~ There is a WebSockets page on Microsoft's HTML5 Labs site, but there is no clear confirmation that they will adopt the full WebSockets functionality.
Moreover, caniuse.com puts the total WebSocket support at a mere 21.83%. That figure rises to 45.44% when you include partial support.
The first protocol used by HTML5 WebSockets was draft-ietf-hybi-thewebsocketprotocol-00 (HyBi 00). All was well until the protocol was upgraded to remedy some security issues. As a result of these changes, the Sec-WebSocket-Key1 and Sec-WebSocket-Key2 fields were added to the client header. Each field contains 8 bytes of random tokens which the server uses to construct a 16-byte token at the end of its handshake to prove that it has read the client's handshake. The handshake is constructed by concatenating the numbers from the first key, and dividing by the number of spaces. This is then repeated for the second key. The final result is an MD5 sum of the concatenated string:
GET /demo HTTP/1.1 Upgrade: WebSocket Connection: Upgrade Host: gravelleconsulting.com Origin: http://gravelleconsulting.com Sec-WebSocket-Key1: 4 @1 46546xW%0l 1 5 Sec-WebSocket-Key2: 12998 5 Y3 1 .P00 ^n:ds[4U
...to which the server responds:
HTTP/1.1 101 WebSocket Protocol Handshake Upgrade: WebSocket Connection: Upgrade Sec-WebSocket-Origin: http://gravelleconsulting.com Sec-WebSocket-Location: ws://gravelleconsulting.com/socket/server/socketDaemon.php Sec-WebSocket-Protocol: forum 8jKS'y:G*Co,Wxa-
As of HyBi 06, the client sends a Sec-WebSocket-Key which is base64 encoded. To this key the magic string "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" is appended, hashed with SHA1 and then base64 encoded. The result is then replied in the header "Sec-WebSocket-Accept". For instance, a string of "x3JJHMbDL1EzLkh9GBhXDw==258EAFA5-E914-47DA-95CA-C5AB0DC85B11" hashed by SHA1 yields a hexadecimal value of "1d29ab734b0c9585240069a6e4e3e91b61da1969". Here's what the version 6 handshake looks like:
GET /ws HTTP/1.1 Host: gravelleconsulting.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Version: 6 Sec-WebSocket-Origin: http://gravelleconsulting.com Sec-WebSocket-Extensions: deflate-stream Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw==
...to which the server responds:
HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: HSmrc0sMlYUkAGmm5OPpG2HaGWk=
At the time of this writing, HyBi 10, the latest version, still uses this header definition, which is considered to be stable moving forward.
WebSocket Frame Masking
On each frame, frame masking is now required. This prevents cache poisoning on proxy. Sec-WebSocket-Origin is added to prevent access from scripts that the service provider isn't aware of. Sec-WebSocket-Origin is added in place of HyBi 00's Origin key to prevent access from scripts that the service provider isn't aware of. It is to be shorted to just "Origin" for HyBi 11. The mask is the first 4 bytes of the frame payload and is decoded in place using the following algorithm:
data[i] = data[i] XOR mask[j MOD 4]
The mask key is different with every frame so that identical data produces a different payload every time .
The base framing protocol defines bits 4 to 7 as the 4 bit Opcode, which defines the content type for the payload data. The following values are defined:
- %x0 denotes a continuation frame
- %x1 denotes a UTF-8 text frame
- %x2 denotes a binary frame
- %x3-7 are reserved for further non-control frames
- %x8 denotes a connection close
- %x9 denotes a ping
- %xA denotes a pong
- %xB-F are reserved for further control frames
The important thing to take away from the above list is that Chrome 14 and Firefox 6/7 do not yet support binary data so the opcode needs to be %x1 to denote a UTF-8 text frame.
Web Server Compatibility Issues
Just because you've got a browser and server-side code that speak the same protocol, doesn't mean that you're good to go. The challenge now is to find a HTML5 WebSockets-compatible server to host the Web page. For a small development environment, node.js in conjuction with the Socket.IO library, is a popular platform for WebSocket experiments. It's not a traditional web server like Apache/nginx but it certainly can host a server programming language that's relatively easy to set up and manage.
For more robust solutions, there are a few products that support WebSockets:
- For Java developers there is Betty, a small, fast, embeddable web server and servlet container. Betty is used by numerous commercial and open source products.
- Wamp and XAMPP both support sockets for PHP, but you have to turn the feature on in order to use it.
- Tornado runs on Python 2.5, 2.6, 2.7 and 3.2 for any Unix-like platform.
- Microsoft has made the WebSockets Prototype available from the HTML5 Labs site. It can be added to your IIS server so that you can write your server in C# or VisualBasic.net.
Today we looked at the general state of the HTML5 WebSockets protocol as well as the required components to make successful WebSocket communications take place, whether in a small development environment or on a larger enterprise-sized platform. In the next few articles, I'm going to break down each step of the process to develop a simple interface using the Chrome 14 browser, the Wamp server, and the php-websocket library.