8 Rules to Follow When Using TWAIN for Your Image Capture Web App
Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
By Catherine Sea
If you’ve used an imaging device to scan something, chances are you’ve come across TWAIN somewhere in the scanning software. Using the TWAIN standard to scan something has been made quite simple over the years. But, developing web applications to leverage the standard is not so easily done. Web application developers must consider several factors before embarking on such a task.
TWAIN has been around since 1992. The TWAIN Working Group, which represents the imaging industry, put together the TWAIN standard. Many technology vendors in the imaging space make up the group. As the working group puts it, the purpose of TWAIN is “to provide and foster a universal public standard which links applications and image acquisition devices.” As a result, today we have the TWAIN standard. It is a software protocol and applications programming interface (API) that regulates communication between software applications and imaging devices.
Software and web app developers have long used TWAIN to link their applications with imaging devices. Today, this link is more critical than ever. Industries such as healthcare, financial and government largely use document management solutions for their paper-heavy industries. As more organizations move from a paper-based process to digital document management, the importance and use of TWAIN will only grow. In fact, many companies – from small businesses to large publicly-traded technology vendors – use TWAIN in their document management solutions. And, vendors sell these solutions into healthcare, financial, government and other such industries.
The use of image capturing within document management applications has many advantages. There’s a reduction in paper costs, a streamlining of workflow processes, simpler collaboration, enhanced security possibilities, and more. So, as web application developers continue to leverage TWAIN more and more, what are the critical elements to consider?
The Typical TWAIN Workflow Process
Rules for Developers to Follow
The first thought is whether the developer wants to embark on an approach to completely develop for TWAIN from scratch. As an alternative, a developer can leverage off-the-shelf software development kits (SDK) to expedite development. There are tradeoffs for each. On the one hand, development from scratch might offer levels of customization one cannot get from an off-the-shelf product. On the other hand, it significantly increases development complexity, time and costs.
Developers cannot skirt the fact that the current TWAIN 2.3 specification is an intense 600+ pages long. It can take weeks just to read through it. It can take months to years to get a full understanding of the specification. Additionally, once you've grasped TWAIN, there is a lot of work to enable an image compression codec, like TIFF, JPEG or PNG. Then you have to consider image upload / download features. It can all get really overwhelming really quick. So, the first rule a developer must follow is to clearly determine if they want to and / or can develop for TWAIN from scratch.
Regardless of the choice to develop from scratch or not, the most important experience to develop for is speed. In other words, how fast can you enable a user to be up and scanning? A recent New York Times article states after clicking a website, a user become frustrated waiting after just 400 milliseconds – that’s less than half a second. In 2012, research at UMass Amherst found that people begin abandoning online videos if they don’t load within two seconds. When users want to perform a task, asking them to download something beforehand can become a nuisance to them. So, when a user for the first time visits a web page requiring an ActiveX / plug-in component, you have to make it as fast as possible. The component must be downloaded from the web server to the user's browser and then installed. The larger the component size, the longer it takes to download. Every second counts if you don’t want to frustrate users. So, as a rule, make sure your component downloads and installs fast.
So, once you’ve allowed a user to speedily download your component, user interaction – or user interface (UI) – becomes the next important part of your application. A good UI can be the difference between widespread adoption and use of your application or having to take it back to the drawing board. To this end, you will have to decide to use the scanner's (or other imaging device) built-in UI or to construct your own interface. Choose wisely. Ensuring a rich experience will also be paramount. For example, users may need to preview scanned images or edit them before uploading them to a web server or saving them elsewhere. So, here’s the next rule: make sure your UI is speedy, easy to use and lets users edit and more.
Speaking of uploading, it will be important that your application support multiple compression formats. Naturally, the larger the scanned image, the longer the upload time. Longer upload times also increase the chance of upload failure. For example, the image size of a color A4 document scanned at 200 DPI can easily be more than 10 megabytes. For an ADSL connection, this might take more than 10 minutes to upload. A lot can happen to increase upload failure possibilities during this time. But, image compression is not just critical to performance, it also helps save on storage space and costs. Compression formats include JPEG and PNG. They significantly reduce the size of an image and, in turn, the time to upload them.
It’s important to remember that different compression methods have distinct features. For example, JPEG has a high compression rate but is lossy. This can make the JPEG format unsuitable for document images that require high precision. On the other hand, the PNG format is lossless. So, it retains almost all the information during the compression process but, files are larger than JPEG. The BMP format is another to consider. The point is, as a rule, you’ll want to provide users with multiple options for compression.
Many documents have multiple pages. So, it will be important that your TWAIN features support multi-page document saves. If each page is stored as a separate scanned image, retrieving and viewing these documents will involve handling multiple images by users. It’s just plain inconvenient and not ideal for any workflow process. Being able to store all pages of a document in a single file makes it much easier to manage multiple-page documents. This is an absolute necessity in any document management application. Follow the rule to support multi-page documents, for example, to let users save in multi-page TIFF or PDF formats.
Security for the data you are scanning will also be important. Users in healthcare, financial and government organizations will likely scan sensitive information. This can include people’s social security numbers, driver’s license numbers, personal addresses, health records, and more. So, that ActiveX / Plug-in component you want users to install – it must be secure. If a user downloads and installs it on their computer, it's possible that the hackers can access the component and do anything to the computer.
When a publisher of such a component marks it as “safe for scripting,” they are promising that the component will not intentionally harm the end users’ computer system. If it does intentionally damage the system, the publisher assumes legal responsibility. If the component is digitally signed, a dialog box with the publisher's legal name will appear when a person uses the scanning component for the first time. If the component is ever altered after the publisher has signed it, the digital signature will be broken and the user will be informed. This makes it practically impossible for the signed component to be infected by a virus or maliciously changed by hackers.
At the same time, if a component is not marked as safe for scripting or is not digitally signed, the default setting of popular browsers – like Internet Explorer and Firefox – will simply prevent the component from downloading or initializing. So, your users will not be able to use it. Follow the rule to ensure your component is digitally signed and marked safe for scripting.
Speaking of browsers, in many cases, you will not know which browser and operating system your end users will use. It could be Microsoft Internet Explorer 32-bit or 64-bit, Firefox, Chrome, or others. And it could be on a Windows or Mac machine. If you only support one or two browsers, or one operating system, you significantly limit your ability to reach a broad range of users. Today, you may know which web browser or operating system your users will leverage. However, it's highly likely you will need to expand your web scanning application to other browsers and operating systems in the future. So, the next rule to follow is to ensure your image capture web application supports all popular browsers and operating systems.
Your next consideration will be to ensure your scanning component uses standard technology protocols and supports all major web servers. HTTP and FTP are well-known standard Internet protocols for communication between machines. A component that uses any nonstandard protocol seriously increases development time and deployment costs. It can also add unnecessary complexity. It’s possible your existing infrastructure requires you to use a specific web server. Verify this. A component that uses an upload and download protocol that's not compatible with your chosen web server can cause serious headaches. As a general rule, you want to ensure you use standardized protocols in your application.
The TWAIN standard has two decades of maturation behind it. This makes it a solid bet as the foundation for your document management web application that needs good image capture features. But, there are many factors to consider before diving in. Developers must first decide if they want to embark on a long and costly journey to develop from scratch so they can customize the heck out of things. Or, should they significantly expedite development time and reduce costs with an off-the-shelf image capture software development kit?
Whatever they chose, the finished application better be fast, from the downloading of required components to the user interface. The user interface must also allow advanced features, like previewing, editing and more. Next, make sure to develop the application to support popular compression and multi-page formats. These are critical to the application’s performance and to the benefits it fosters in streamlining workflow processes. Security and flexibility are also vital. As a rule, ensure your components are digitally signed and marked safe for scripting. Also, as a rule, your application should support all popular browsers and the Windows and Mac operating systems. While you may only have users on one browser or another, or on one operating system, things can and do change in the blink of an eye.
Speaking of change, the TWAIN Working Group announced TWAIN version 2.3 in December 2013. Regardless of whether you’re going to develop from scratch or not, you’ll want to have at least a high-level overview of the many standards related to image capture. Here are some links to get you started:
TWAIN Working Group
Joint Photographic Experts Group (JPEG)
Portable Document Format (PDF)
Tagged Image File Format (TIFF)
Portable Network Graphics (PNG)
Author bio: Catherine Sea is the customer service manager for Dynamsoft. She regularly blogs to provide tutorials on TWAIN, image capture application, and version control best practices. She has also been a consultant to programmers working with them to develop document management applications. http://www.dynamsoft.com/.