Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
Binary Data Reading Essentials
Working with binary data is a lot trickier than working with text because files containing binary tend to be a whole lot larger and can't be broken down by line. What you need is a buffer to hold ranges of bytes (or chunks) of raw data. The buffer that the FileReader API uses is called a blob. It also provides an attribute to retrieve the size of the chunk of data.
The FileReader provides three different ways to load binary data into memory:
- FileReader.readAsBinaryString(Blob|File): The result property will contain the file/blob's data as a binary string. Every byte is represented by an integer in the range of 0 to 255.
- FileReader.readAsDataURL(Blob|File): The result property will contain the file/blob's data encoded as a data URL.
- FileReader.readAsArrayBuffer(Blob|File): The result property will contain the file/blob's data as an ArrayBuffer object.
All of the above methods work asynchronously. Once the data has finished loading, an onload() event is fired and its result attribute can be used to access the file data. If you are familiar with the Ajax XmlHttpRequest, you'll notice that this employs an almost identical mechanism. That is no accident. The reason is the same: so that your browser won't freeze up while it's processing the data.
One task that browsers excel at is displaying images. Thus, it seems only natural that there would be a big demand for client-side image processing functionality - something for which FileReader is perfectly suited. The following complete page example shows how event-driven programming helps to add new function calls to events without clobbering existing ones. It also keeps related functionality together. For those of you who are unfamiliar with event-driven programming, it often includes the passing of event handlers as anonymous functions to event binders ( like addEventListener() ). In fact, any function that is only called by the browser and not explicitly in code makes a good case for anonymity.
In this case a new event handler is added to the window's onload event that in turn adds another event handler to the fileinput control's onchange(). Finally, within that event, a function is added to the FileReader's onload() event in order to process the image file:
Following the code into the fileinput's onchange() event handler, there is a three step reading process: the FileReader is instantiated, the onload() event handler is added, and the file is read using the appropriate method. In this case, the readAsDataURL() method is used. It's generally a good choice because it sets the result property to a DOMString that is a Data URL encoding of the File or Blob's data. While this bypasses the use of a buffer, the browser could throw an ENCODING_ERR for Data URLs which exceed URL length limitations for that particular browser. That could be problem in dealing with huge files such as video or an Access database, but it's not usually a factor for images. I tried the above code with the largest image files I could find and never received an ENCODING_ERR error.
Other FileReader Events
The onload event is only one of several FileReader events. Others include onloadstart, onprogress, onabort, onerror, and onloadend. These additional events give us the ability to monitor the progress of the file read; useful for the processing of large files, catching errors, and figuring out when a read is complete.
The example below demonstrates displaying a progress bar to monitor the status of a read using the onloadstart() and onprogress() events. Unlike the previous example, this one includes a function called addEventHandler() to make the binding of handlers to events work for both DOM-compliant browsers and Internet Explorer (as of version 10, Preview 2).
<!--CTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//E-->
If you want to try out the above code, select the largest file possible for greatest effect!
As always, make sure that you test browser support for the FileReader object either using your own tests of via a library such as Modernizr.
We've seen a few interesting uses for the FileReader so far, but there's still more that you can do with it. We'll be getting into ways that you can manipulate binary data shortly.