In my last article I spoke with Shawn DeWolfe, a web developer in Victoria, B.C. Canada. Since our discussion went so well, I decided to contact him again about this story. Here’s what he had to say:
Shawn: “When you’re phoning to a remote source you don’t want it to see one call per web page load, because what happens on the server side when you ask for a web page, the web page goes to the API, the API works from the originating source, sends its response, hooks it and gives it to the web page for the build and pushes it out. You’re adding 1-30 seconds onto a web page request. To solve the problem, don’t make one-to-one ratio calls. Ideally, it should be how the date you are requesting, changes.”
Nathan: “Like a cache.”
Shawn: “Yes, you should build something like a cache and make sure it doesn’t run afoul of their API terms of use. A lot of them will say you can’t take their data; you can only use their data. There will be a loophole in the middle. You will be able to pull stuff from a cache but only to have it for a short lifespan, so you don’t pound them with API requests. You can store it for a bit but almost all of them will disallow you from pulling from an API. If you were to have the data for a week that would be thought of as keeping it, but if you were to have it in a cache for half an hour, that’s something probably no one would have an issue with.”
“The other thing about API’s is if you have any client side such as a JavaScript request like a Facebook or Twitter or eBay widget. If they’re JavaScript you may wind up with an accidental white screen. If the client side API is exceptionally poor, you’ll make your call, you’ll load your page. And what happens with the remote API calls is they’ call for a widget and go hang on for a second. Before it builds the page, it wants to make sure it has everything needed to build the page. It will stop the build of the page, request the API, get the widget or whatever to load in the space, bring it in and when it’s brought in, it will make the page thereafter.”
“In one case, a web page had a JavaScript API piece at the bottom and by itself was adding seven seconds to the page load. So you’re doing everything you can to load a page quickly, you may get it down to 5 seconds then this is added to the mix and the page load time goes from 5-12 seconds. Make sure your client side stuff is fast, like Google.”
“Another thing is if you have any push with the people pulling the API, you ask them to push out their output as soon as possible, because the trip across the wire for the data takes some time, processing takes some time, so some places will build their response so it gets the input, works on the input, gets everything ready to go and then pushes out the output.”
“While it’s waiting to get the output it will not give you any response and if you look at a web developer tool, it will show you the different bars, such as when the request was made, how long the response, how much of the data. Occasionally you will encounter a massive waiting bar between the request and the response. And that’s the hang time.”
“The solution in PHP is the Object Flush, where you flush out your data, even before you think you have the full response. What it does is makes a handshake and says “Hi,” I think I’ve got stuff for you and it does this little dance and that little dance is where the data begins but the data begins much faster and the transport of the data still happens. So if the transport of the data is 2 seconds and 2 seconds of hang time, even before you start the transport early and still buy a second of puffiness, it’s a 3 second instead of a 4 second trip. This is where you push out the output as fast as possible. It’s also called ‘time from first byte.’”
“The first byte is the first request and how long it takes to come back. Drupal is notoriously bad with its time to first byte, so whenever I do work I knock that down to as little as possible.”
Nathan: “Some of what you’re talking about reminds me of progressive JPEG images where the image loads a bit at a time, letting you know the rest is coming.”
Shawn: “That’s what the Object Flush would do. If you have any pull with API developers you can suggest that. Another thing is to ask the API developers if they have CDN (Content Distribution Network), because all those seconds count. So if you can find a source to call that is fewer Internet hops away from your server and their server. If it’s a really important API source you may want to relocate your data and put it really close to the API calls with as little hang time as possible. If you’re talking about 30 hops and you’re dealing with seconds just making the request. Doing that would fix that up (not sure about this bit).”
Nathan: “It’s reminiscent of interviewing someone on the other side of the world. You’re dealing with the speed of light calculation, the time up/down from the satellite to another part of the Earth and it’s literally 3-4 seconds.”
Shawn: “There’s a lot of hang time.”
Nathan: “These days you don’t see too much of that anymore.”
Shawn: “I think that’s because they’re smarter with the way they move the packets around.”
Nathan: “It seems that with the API the challenge is – how do we close the gap?”
Shawn: “Well, the ways to do it are with the Object Flush which pushes out data sooner than otherwise. If you have an API coming from Amazon and you’re hosting off Amazon, you’re almost in the same building. You’re in the same cloud. Another thing is looking at an easier to digest format, like JSON, (JavaScript Object Notation) as opposed to XML.”
“I find that JSON is a lot more forgiving and sturdier than XML. With JSON you can push raw API results through your response. So if you have data from your server talking to JavaScript on your page, you can push JSON data through without processing it. You don’t have to go through XML parsing, you don’t have to make sure the data comes through without any problems”
“In contrast, XML is harder format to work with and things get lost with the formatting. With JSON, every object, every array, is very flexible. Other formats are CSV, you can bring back CSV values and you can bring back finished HTML in some places and in some cases that’s ideal.”
Conclusion
In sum, here are the seven mistakes developers make when working with 3rd party API’s are:
1. Not watching for client-side problems
2. Not caching your copies (or caching them for too long)
3. Not caching your copies (or caching them for too long)
4. Not watching for hang time
5. Not being physically close
6. Not using JSON over XML
7. The last bit concerns legalities. Make sure that with the API you’re pulling, that the service agreement allows you to legally use the data the way you want to use it. If you have to bring it in and use it in an unaltered fashion, you want to work out how that works for you because you will be creating traffic and momentum for somebody who runs the API when you really want to satisfy what your goal is.
8. An 8th mistake which so many developers make is not testing their code. In this case, not only is testing important, it should be done so with artificially bad conditions.