For a couple of projects recently, I've ended up reverse-engineering website APIs, and then writing scripts to control sites using those APIs.
This is a black art that we can only hope will soon fade away as the web's new architecture of communicating services takes hold. But Internet time ain't what it used to be, so I expect I'll be doing this kind of thing for a while yet. Here are some examples of what I mean:
Automating common tasks in a Web-based issue-tracker. A couple of weeks ago I mentioned RequestTracker. It's really handy, but the novelty of pointing and clicking wears thin when you're trying to process dozens or hundreds of similar items. So, I've written a script to power through these chores.
Verifying website security.
One of my projects is a site that reacts to certain kinds of spider activity. The best way to test these defenses is to probe with a spider that impersonates an authenticated user.
Reformulating Web statistics.
For another project, I'm reformulating Web statistics. This, by the way, is a perfect example of the kind of problem that SOAP-style interfaces will solve. DON'T lock users into a specific HTML presentation. DO offer interfaces, use them yourself to create a default presentation, but let others use them directly to create alternate presentations. That's the vision, anyway. In reality, I'll bet the Web stats reprocessor I wrote yesterday won't be my last.
In cases like these, the name of the game is to first discover, and then use, the website's API. The fact that websites have APIs, even when they don't intend to, is one of the most remarkable aspects of the first-generation Web. I've shown elsewhere how it's possible to build novel Web services using existing sites (AltaVista, Yahoo) as components. So what comprises a website's implicit API? Basically, just these things:
In this volume of Best of BYTE, we explore the emergence of some heuristic algorithms. Although we have only scratched the surface of this intriguing subject, we hope we've suggested the potential of the synthesis of heuristics and algorithms.