Read Text Files Using the JavaScript FileReader

By Robert Gravelle

HTML5-based local storage is a hot topic these days, but JavaScript can also work with the local file system. In fact, things are definitely looking up for the W3C's File API, a new JavaScript API that provides limited access to the local files system in Web applications. Its main functionality is already largely supported in the latest Firefox (version 5+). If the other major browsers follow suit--and they probably will--we'll be able to perform fairly sophisticated client-side processing on the content of local files without having to upload them to a server.

In today's article, we'll learn how to use the FileReader to retrieve file properties and the contents of text files.

Gaining Access to Files using the File Input Control

One simple way to access local files is via the <input type="file"/> HTML form element. That will give you access to readonly

information for an individual file such as its name, size, mimetype, and a reference to the file handle. Adding the "multiple" attribute to the file input element (as in <input type="file" multiple/> will get you a list of files to work with.

The Ubiquitous Browser Supports Tests

When it comes to HTML 5 functionality, you can never assume that the user's browser will support it. As such, it's imperative that you perform some type of cursory checking before trying to perform any processing that may fail. This is also true of file reading.

What ever you do, don't try this:

if (FileReader)
{
	//do your stuff!
}

That can cause an error because the JavaScript interpreter will treat the FileReader as an undeclared variable. Since all global objects are properties of the Window object, you CAN perform this more thorough checking without fear:

// Check for the various File API support.
if (window.File && window.FileReader && window.FileList && window.Blob) {
  //do your stuff!
} else {
  alert('The File APIs are not fully supported by your browser.');
}

Of course, you can always use Modernizr too.

The JavaScript FileList and File Objects

Whether you include the multiple attribute or not, the file input always returns a FileList object, which is a simple array of individually selected files from the underlying system. Like any array, it is zero based, so files[0] gets the first one.

The File object has some properties all on its own which have nothing to do with the FileReader, such as its name, type, size, and last modified date. Interestingly, the latter was not recognized by my Firefox 5 browser (remember to perform browser support tests! [see above]). These properties are available any time after the file has been loaded into the input control.

The FileList is exposed via the input's files property, which there are two ways to get at. The first is to reference the control directly using document.getElementById(ID). The other is through the event's target property (Event.target.files). I employed the latter in the code snippet below because the readSingleFile() function receives the event object as an argument.

The quickest way to gain access to the File's contents is to add an event listener to its onchange event. Of course, there's no reason why you can't retrieve them much later from a button click or other event.

The following code reads one file using a file input control and displays the first line along with some of the file's properties:

<input type="file" id="fileinput" />
<script type="text/javascript">
  function readSingleFile(evt) {
    //Retrieve the first (and only!) File from the FileList object
    var f = evt.target.files[0]; 

    if (f) {
      var r = new FileReader();
      r.onload = function(e) { 
	      var contents = e.target.result;
        alert( "Got the file.n" 
              +"name: " + f.name + "n"
              +"type: " + f.type + "n"
              +"size: " + f.size + " bytesn"
              + "starts with: " + contents.substr(1, contents.indexOf("n"))
        );  
      }
      r.readAsText(f);
    } else { 
      alert("Failed to load file");
    }
  }

  document.getElementById('fileinput').addEventListener('change', readSingleFile, false);
</script>

Here is the resulting alert box in Firefox 5. Note that in a real Web app you would most likely use dynamic HTML to write the data to the page.

Image 1

 

File Type Validation

The file's properties have a many uses, including validating the file type, which is important considering that the file input control doesn't restrict the file type at all. Here's a modified readSingleFile() function that demonstrates a simple way to limit the inclusion to text files:

  function readSingleFile(evt) {
    //Retrieve the first (and only!) File from the FileList object
    var f = evt.target.files[0]; 

    if (!f) {
        alert("Failed to load file");
    } else if (!file.type.match('text.*')) {
		    alert(f.name + " is not a valid text file.);
    } else {
      var r = new FileReader();
      //proceed with read…
    }
  }

The readAsText() method used above is asynchronous. For that reason, you can't just refer to the file contents after calling it. Like the also asynchronous Ajax calls, the FileReader also has loading states that help to ascertain the progress of the read. The one you'll use most often is the load, which signifies that the read has successfully completed. Its associated onload event is where you should attach your processing logic.

The file contents are stored in the FileReader's result string property. Being a string, it allows you to manipulate it just as you would any string. For instance, you can see it in the code above where I displayed the first line.

Reading Multiple Files and Properties Using a Closure

Reading multiple files can be a little trickier, depending on how you do it. One of the main difficulties is the accessing of file properties and contents at the same time, within the FileReader.onload() even. The problem of course is that the file is not known to the onload() event because of its delayed execution. One way to avoid a "<file variable> is undefined" error is to use a closure. It's created by uncluding a set of parentheses around the function ( (function x(args) {})(); ) that cause it to execute as inline code. By passing in the File object, we can bind it to the real function, which is returned to the onload event property:

<input type="file" id="fileinput" multiple />
<script type="text/javascript">
  function readMultipleFiles(evt) {
    //Retrieve all the files from the FileList object
    var files = evt.target.files; 
    		
    if (files) {
        for (var i=0, f; f=files[i]; i++) {
	          var r = new FileReader();
            r.onload = (function(f) {
                return function(e) {
                    var contents = e.target.result;
                    alert( "Got the file.n" 
                          +"name: " + f.name + "n"
                          +"type: " + f.type + "n"
                          +"size: " + f.size + " bytesn"
                          + "starts with: " + contents.substr(1, contents.indexOf("n"))
                    ); 
                };
            })(f);

            r.readAsText(f);
        }   
    } else {
	      alert("Failed to load files"); 
    }
  }
  
  document.getElementById('fileinput').addEventListener('change', readMultipleFiles, false);
</script>

We'll be taking a look at the FileReader's facilities for reading and manipulating binary data shortly.

 



Make a Comment

Loading Comments...

  • Web Development Newsletter Signup

    Invalid email
    You have successfuly registered to our newsletter.
  •  
  •