Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers

ArticlesIn Search of SSL Spidering


February 1998 / Web Project / In Search of SSL Spidering

Building tools that fetch and process secure pages, I learn some painful lessons about WinInet and SSLeay.

Jon Udell

I was amazed to discover that peer-to-peer net working of Web sites was not only possible, but downright simple. This month, I'll show how we use this technology to monit or our own site and to extract information from a secure server at a partner's site.

Dave's Site Monitor

The BYTE Site is a cluster of Windows NT and Unix machines that provides a growing number of public and private services. A while ago, I realized that we needed a unified way to monitor the health of these services. I also realized that the many Web servers, FTP servers, NNTP servers, and mail servers that we operate do not in themselves define the list of services that users expect us to support.

For example, our search server runs two search engines in parallel -- SWISH and Excite. It's not enough to know that the NT4.0 machine that hosts these engines is healthy, or even that the Internet Information Server (IIS) 3.0 Web server on that machine is alive and kicking. What matters to users, ultimately, is whether SWISH and Excite a re working as advertised.

By a happy coincidence, the two properties that make each of our services available to users -- a URL and a corresponding Web page -- are just the things you need to build a robotic tester. I put the challenge to my associate Dave Rowell. The screen shows the incredibly useful solution Dave came up with.

Our site monitor is a Perl script, scheduled to run every half hour, that's driven by a table of address/result string pairs. One of the addresses is our home page, http://www.byte.com/ , and the corresponding result string is a piece of text that appears on that page.

The monitor fetches the page, and if it contains the expected result string, that test produces a green-light icon. If the home page takes longer than expected to arrive, the icon will instead be yellow, and the monitor may or may not issue an alert, depending on how we have configured the alarm threshold. If the home page never arrives, the icon will be red -- and in tha t case, the monitor will send e-mail to the new media team and to my pager.

There's more to the monitor than Web-page fetching. It watches the free space on various drives. It checks to see that log files expected to have been written by scheduled backup processes were, in fact, written. We'd like to add a module that pulls log entries from key systems -- both NT event logs and Unix syslog files -- and reacts to anomalies that would be defined in the monitor's configuration file. But if you strip away all the bells and whistles, what's left is a table-driven automatic URL fetcher that alerts you when static or dynamic pages arrive slowly, incorrectly, or not at all.

This tool has revolutionized our ability to monitor the health of our site, and we've come to depend on it. Therefore, it was a matter of some concern when, a few weeks ago, the monitor began issuing a stream of spurious alerts.

Why the Site Monitor Failed

The service that the monitor thought had failed was the secure or dering system on our primary Web server. The https:// URL that defines this part of the test was triggering a red-light response. And yet, whenever we fetched that URL interactively, all was OK. Neither Dave nor I could remember installing any software or making any configuration changes on the NT4.0 machine that hosts the monitor. What could have gone wrong?

Finally, Dave noticed that the target URL was expressed in the monitor's configuration file as an IP address rather than a Domain Name System (DNS) name. We switched to the DNS name, and that restored the errant test to green-light status. But why? That remained a mystery.

I should explain that when I proposed this project to Dave, I forgot to mention that he needn't bother trying to test the parts of our site accessed by way of Secure Sockets Layer (SSL). Not knowing that an SSL page wouldn't work, Dave included one in the tests -- and to my surprise, it did work. It worked because the URL-fetching tool I recommended to him was Win32::Intern et. This is a Perl module that talks to WININET.DLL.

The WinInet library comes with Microsoft's Internet Explorer (MSIE) and is also available separately as part of the ActiveX Software Development Kit (SDK). Among WinInet's functions are those that MSIE itself used to access FTP, HTTP, and HTTPS (secure) URLs. This is powerful stuff! When Microsoft introduced WinInet a few years ago, it was billed as a generic Win32 component -- like ODBC, but for Internet rather than SQL data sources.

True, it came with MSIE, but many important Microsoft components have debuted in an application context and gone on to become integral parts of Windows. That Dave's monitor could quite unexpectedly fetch a secure page convinced me that WinInet was indeed, like ODBC, the kind of subsystem that makes Windows increasingly useful.

A few days later, while working on a different tool, I stumbled onto the real reason the monitor had failed. The new tool's job was to fetch reports from a partner site, over an SSL con nection, and consolidate the data (see the figure "Using SSL-Enabled Web Client Technology" ). It worked on one Win32 system, but not on another. One difference between the two, I noticed, was MSIE. The tool worked on the system that had MSIE installed, but not on the one to which I had copied only WININET.DLL. Was the tool depending not only on WinInet, but also on MSIE? It was. Moreover, I found I could break the working version of this tool by twiddling settings in MSIE's Advanced Internet Options dialog box!

Armed with this knowledge, we went back to have another look at the machine that hosts the site monitor. Was MSIE installed there? Yes. Then, suddenly, I saw what had happened. Dave and I had not installed any software on the machine the day we broke the monitor, but we had used MSIE along with Navigator to try out an experimental new feature of our site.

In the process, we had visited MSIE's Advanced Internet Options screen and changed some settings. One of these was "Warn about invalid site certificates" -- the setting that causes MSIE to complain if the X.509 common name in a server certificate doesn't match the host name in a request from an SSL page. What looked to us like a small reconfiguration of MSIE was, in fact, a broader reconfiguration of WinInet -- and therefore of Win32::Internet and our site monitor. Call me old-fashioned, but I don't think system components ought to behave this way.

Maybe I shouldn't have been surprised. If MSIE is part of the OS, perhaps it makes sense that you reconfigure the OS when you reconfigure MSIE. There is precedent for this in Windows. In dial-up networking, for example, I can start by tweaking a particular dial-up connection and end up tweaking systemwide modem properties. Not a great idea to have set this precedent, perhaps, but there it is. What's more, MSIE's Advanced Internet Options dialog box does double duty as Settings->Control Panel->Internet, a systemwide configuration tool.

And yet, I can't help but believe there's something not quite right here. Suppose some database application came with a Settings dialog box that let you tweak the dialect of SQL spoken by ODBC -- and thereby affect all other ODBC-dependent applications? Would anyone agree this is right and proper?

Untangling the WinInet/MSIE Dependencies

In theory, I should be able to build a version of the report consolidator that runs on any vanilla Windows 95 or NT machine (i.e., a machine with no copy of MSIE installed). In practice, I never did figure out how to do this. My WinInet-based software never seemed to behave predictably when I moved it between MSIE and non-M SIE machines.

Here's an example. At one point, I noticed that my program, if given no proxy or basic authorization credentials, would pop up a dialog box soliciting this information. Further investigation showed that only the new (MSIE4) version of WinInet does this. Maybe I just needed to distribute the newer WININET.DLL with my stand-alone program?

I tried that. However, WinInet griped that it couldn't find a routine in another DLL, SHLWAPI.DLL. Should I distribute that one, too?

Clearly, I was on the slippery slope at this point. A quick Dejanews search turned up a litany of woes related to SHLWAPI, WININET, and MSIE. And, of course, since Windows locks down SHLWAPI.DLL at start-up, this wasn't going to be a smooth installation in the best of circumstances. A prospective user of my program would need instructions such as:

1. Reboot.
2. Hit F8.
3. Pick item 6 (Command Prompt) from the menu.
4. Copy a:shlwapi.dll c:\windows\system.
5. Reboot again.

Not too nice. But in the name of science I performed the experiment, and the results looked hopeful. Now WinInet had whatever it wanted in SHLWAPI.DLL. And I got a dialog box for stage 1 of authentication: the proxy server's challenge. However, at stage 2 of authentication, the final destination SSL server, oops...error in KERNEL32.DLL.

I tried a lot of things. In addition to Perl's Win32::Internet, which apparently does not handle proxies or basic authentication in an SSL context, I experimented with a few C++ programs that use WinInet. Both worked on MSIE machines. Neither one worked on non-MSIE machines. Eventually, I gave up and went looking for some other solutions.

Alternatives: SSLeay and IAIK-SSL

My report consolidator is a two-stroke engine. First, it fetches secure pages, and then it launches Perl to consolidate the data. What's another way to do the first stroke? Browsers can fetch pages and can also behave like components. For example, you can drive MSIE or Navigator, on Windows, using OLE Au tomation.

However, this didn't look like an attractive option. I wasn't sure that OLE Automation could handle the authentication dialog boxes. Moreover, the process actually needs to fetch two secure pages, each requiring a different set of credentials. When you do this manually, you have to quit the browser after the first fetch, in order to reset the credentials for the second, and this stop-and-restart scenario would have to be replayed under OLE Automation.

Next I tried SSLeay, Eric Young's freeware implementation of SSL. It was a snap to build SSLeay on a Silicon Graphics Irix system, and equally straightforward to adapt sconnect, a sample page fetcher included with the kit, to handle proxies and basic authentication.

However, my colleague runs Windows 95, not Irix, so I next tried the Win32 make file included with SSLeay. It has been recently upgraded and is now as competent as the Unix version. In short order, I had a redistributable package that I could send my colleague -- the SSLea y Win32 libraries, a version of sconnect, and a Perl script to consolidate the pages that were brought back by sconnect. Everything worked like a charm on a vanilla Windows 95 machine.

Unfortunately, I soon learned I could not send that package to anyone. When I went back and read the fine print, I realized I'd probably broken the law. Inside the U.S., use of SSLeay infringes patents held by RSA Data Security.

The right thing to do, according to Eric Young and RSA, is to link SSLeay with RSAREF, a reference implementation of the patented algorithms that RSA makes freely available. That's easier said than done, however.

Another alternative, which was mentioned in last month's Toolwatch, is IAIK-SSL. It's a snap to build an automatic SSL client with IAIK-SSL. But like SSLeay, this toolkit also does not use code licensed from RSA, and thus constitutes patent infringement if used in the U.S. What's more, there's no available Java counterpart to RSAREF. JavaSoft offers JSAFE, but it's pricey.

Sun's Java Electronic Commerce Framework, which is now in alpha release, is another possibility. However, it's probably overkill for simple chores such as my report consolidator. Web-spidering technology is, from a certain perspective, the ultimate middleware. Tools that enable Web spidering are simple, universal, and powerful.

Unfortunately, and for a variety of reasons, it's not straightforward to use these tools in SSL environments. I hope that this situation improves. Many of the most interesting applications on my to-do list will benefit from, if not require, secure channels.

So how did I deliver my report consolidator? Well, actually I didn't. For now, I've delivered only the Perl postprocessor; my colleague will have to fetch the pages interactively. I make the same application available to our sales staff, but without a client component. It runs on an NT server, where, provided I'm careful about how I use MSIE, I can leverage WinInet. This certainly isn't a very pretty picture. If there are better solutions, I'd like to hear about them.


TOOLWATCH

Hamilton C Shell.........................$350
Hamilton Laboratories
Internet: http://www.hamiltonlabs.com/

All your Unix friends -- csh, cron, du, ls, grep, and many more -- are reimplemented in the modern, threaded style of Windows NT. These are high-quality tools for serious professionals.


BOOKNOTE

Web Security and Commerce................$ 32.95
by Simson Garfinkel and Gene Spafford
O'Reilly and Associates
ISBN 1-565-92269-7
Internet: http://www.ora.com/

A helpful discussion of a wide range of issues, including encryption, SSL, certificates, Authenticode, access control, and legal aspects of cryptography.


Using SSL-Enabled Web Client Technology

illustration_link (24 Kbytes)

When business-to-business networking relies on secure channels, integrators need SSL-aware Web-spidering tools.


Web Security and Commerce

photo_link (18 Kbytes)


Dave's Goldmine

screen_link (37 Kbytes)

Dave Rowell's invaluable site monitor polls a list of URLs, times the responses, and issues alerts if pages don't arrive promptly.


Jon Udell is BYTE's executive editor for new media. You can reach him at jon@byte.com.

Up to the Web Project section contents
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network