Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesWeb Surveys


Octobe r 1996 / Web Project / Web Surveys

Helpful techniques for Web-based data collection and analysis.

Jon Udell

The BYTE Site is, among other things, a giant survey application. Each of its 6000+ archive pages presents a link to a feedback form. Or rather, as I explained last October ( http://www.byte.com/art/9510/sec9/art1.htm ), to a script that generates a form that's customized for each article. Recently, I began harvesting this data to answer questions like "How highly do site visitors rate State of the Art articles?" and "How often do site visitors say they read BYTE magazine?" We've also run Web surveys to ask visitors about their expe riences with ISDN, assessment of uninterruptible power supplies (UPSes), and OS preferences.

Along the way, I've refined the techniques and tools I use to capture, store, and analyze survey data. I hope you'll find them useful for surveys and other kinds of Web-based data collection and analysis.

The Form

I tend to bail out when confronted by a sprawling multipart Web form, and I assume you do, too. So I try to make my own forms as concise as possible -- ideally, just a single screen. The most effective space-saving device is the Hypertext Markup Language (HTML) tag does, of course, support multiple selection. You simply add the attribute multiple="yes." However, a multiselect drop-down list looks the same as a single-select one. No visual cues tell you that multiple selection is available or that Ctrl-click is the way to operate it. Explaining these things chews up screen space. If the number of choices is not too large, you may be better off with a self-explanatory set of checkboxes.

Without a sensible default choice, there's a similar trade-off between a drop-down list and checkboxes. A drop-down list always returns a value; a checkbox set may not. To differentiate a passive "no reply" choice from a drop-down list's active choices, you have to encode it explicitly as the default choice. You can use that item to document the drop-down list (e.g., "multiselect from below"). Still, I'm never quite comfortable with the semantic inconsistency of this approach. Again, you need to weigh the compactness of the drop-down list agains t the elegance of a checkbox set.

To lay out your forms, you can use an HTML editor, or you can just clone an existing form and tweak it with a text editor. I do the latter; HTML widgets are really quite simple. To simplify layout, I enclose elements between a <pre>...</pre> tag pair that specifies monospacing and hard newlines. Within this preformatted region, you can still use <strong> (bold), <em> (italic), or even Netscape <font size> tags to clarify the organization of the form.

The Database

For surveys and other data-collection applications, there's no reason to wire the form directly to a database unless you need the information available in real time. Dumping records into structured text files is a simple, low-overhead, and highly effective solution. For example, the BYTE on-line archive logs feedback as a set of files, each containing a single ASCII-delimited record. The concatenation of this set of files f orms a database import file.

When I began harvesting this data, though, I ran into a few limitations. The import records weren't self-descriptive -- that is, they didn't carry field names along with values. Lacking names, an import tool must rely on an ordering of fields defined elsewhere. Many import tools allow a special first row of field names. But including such a row in each record wouldn't solve another problem -- poor handling of multiline text data.

The surveys I run usually ask both quantitative questions ("How much did your ISDN installation cost?") and qualitative ones ("What was your ISDN experience like?"). No one tool is best suited to analysis of these two very different types of data. You want a relational database or statistics package for the quantitative data and a text database (possibly a searchable HTML archive) for the qualitative data. How can you store information destined for either or both of these repositories? I've settled on a text representation of a Perl associative ar ray:

%record = ('name1',"val1",'name2',"val2");

If you process the form data with a Perl Common Gateway Interface (CGI) script that uses cgi-lib ( http://www.bio.cam.ac.uk/cgi-lib/ ) or an equivalent library, you're given an in-memory structure of this type. Unfortunately, Perl lacks a primitive function to ASCII-ize such structures, but it's not difficult:

foreach $f (keys %record) {
 s/\"/\\"/g;
 print RECORD
"'".$f."\',\"".$in{$f}."\",\n";}

As long as you take care to convert double-quote to backslash-double-quote, as shown here, this technique handles multiline text fields nicely.

If the file handle RECORD maps to the file 0001.REC , a subsequent Perl script can parse and reconstitute the %record array with the single statement

do '0001.REC';

What's the point of all this? It's now trivial to write Perl scripts that transform collections of such files into a variety of database import formats or directly to an HTML textbase. For relational analysis, sqlload.pl ( http://www.byte.com/art/download/textbase.zip ) produces a SQL load file containing a bunch of INSERT INTO statements. For textual analysis, I use variants of another script that builds a simple, navigable Web archive.

The Tool

Tools that marry Web forms to databases tend to assume, reasonably enough, that you're acquiring data directly into a database. They typically use templatized HTML forms containing triggers that read or write database fields. For a survey, however, it's convenient to separate data colle ction from database import -- particularly when you're building both relational and textual databases from a common data set. Thus, there's no reason to use a template-oriented tool such as Cold Fusion or the Microsoft Internet Information Server (IIS) Internet Database Connector (IDC).

What's more, I've found that these products don't simplify database bookkeeping to the degree I'd like. They assume that you'll build both an HTML template and a corresponding database schema. You end up with two sets of field names that you have to maintain in sync. It's not a big deal, but life's short and I'm lazy, so I wrote form2db.pl ( http://www.byte.com/art/download/textbase.zip ) to generate a database schema automatically from an HTML form. This Perl script relies on the fact that browsers will qui te happily ignore user-defined HTML attributes. For example, browsers render the input text box described by the following code:

<input type="text" name="email" dbtype="char (50)">

which is the same as

<input type="text" name="email">

The dbtype attribute, which I simply invented, means nothing to browsers. However, when I write forms using this attribute, they can double as database schemata. Form2db.pl parses these enhance forms and emits a SQL CREATE statement. Do you have to abandon your HTML editor if you go this route? Not if it's a smart one that knows how to preserve user-defined HTML. Adobe, for example, says PageMill 2.0 will do this.

There's still more mileage to be gotten out of form2db.pl. I've said that I store each record initially as an ASCII-ized Perl associative array. That implies a CGI script, wired to the form, that writes the ASCII file. It's a simple CGI script -- so simple, in fact, that form2db.pl can create it automatically. When I wrote the form for our OS survey, for example, I pretended that the script os.pl already existed:

<form action="os.pl">

When form2db.pl reads this form, it makes the imaginary script real. The single occurrence of the name os in the form tag drives several related processes. It becomes the name of the table created by form2db.pl's SQL CREATE output, the name of the file containing that output, and the name of the subdirectory in which os.pl deposits records. The figure "One Form, Many Uses" summarizes these interactions.

The Analysis

In July's ToolWatch, I mentioned iodbc ( ftp://ftp.digex.net/pub/access/psii/iodbc.zip ), a command-line interface to the Open Database Connectivity (ODBC) subsystem on Windows 95 and NT. It's the thinnest-imaginable ODBC wrapper, and therein lies its strength. To load the OS survey da ta into a database, I used ODBC Administrator to create a new data source called OS. (The driver was MS Jet 3.0 and the format was .mdb, but I could have used any database supported by ODBC.) I then issued one command to execute the SQL CREATE code written by form2db.pl:

iodbc -S OS < os.sql

and another to execute the SQL INSERT INTO code written by sqlload.pl:

iodbc -S OS < os.lod

Then I launched iodbc in interactive mode to begin exploring the data set:

iodbc -S os
1> select count(*) from os

This approach makes me a knuckle-scraping Neanderthal or an avant-garde minimalist, depending on your perspective. I see it the latter way, because I find that operating a full-blown wizard-equipped GUI database can take more time and effort than just typing the small bits of SQL you need. What's more, as you write those SQL statements, you discover patterns -- that is, opportunities to parameterize and automate SQL queries.

Perl coupled with iodbc is one way to exploit those opportunities, but it's awkward. Perl has to write SQL statements to a file, invoke iodbc on that file, and then parse iodbc's output. A better solution is odbc.pm, a Perl 5 module that makes ODBC SqlExecute and SqlFetch calls directly available to Perl programs. This has two major advantages: The SQL code doesn't have to take a trip through the file system, and its output comes back neatly chunked by row and column.

These two methods are complementary. I use iodbc when first exploring a data set and odbc.pm to codify the repeatable patterns that emerge from that exploration. Today, all this happens on my NT systems only because, while ODBC itself is available for Unix, iodbc and odbc.pm are not yet available. But the Perl scripts I'm distributing with this article will work fine on Unix, as will the generic SQL code they produce. If you're in need of a lightweight Unix SQL engine to use in conjunction with these, try msql ( http://www.bunyip.com/ ).

The Methodology

What did our OS survey reveal? Nothing of value, I'm afraid. I got so absorbed in the mechanics of Web-based data collection and analysis that I ignored the most fundamental survey precept. BYTE senior editor Tom R. Halfhill puts it succinctly: "You can't let the studied population select itself." Team OS/2, an international band of OS/2 enthusiasts, drove that point home with a vengeance.

Two days into the survey, analysis showed that usage of the Mac OS, OS/2, Unix, and various Windows flavors -- on desktops and servers -- was comparable to what many other sources have reported. A few days later, Team OS/2 struck, and OS/2's numbers soared. An Alta Vista search of the Usenet uncovered one cause of the surge -- a posting to comp.os.os2.advocacy, whic h contained the uniform resource locator (URL) of the survey page.

We reported the results in last month's Bits section on page 32. However, please don't quote the three pie charts, which show OS/2's dominance, without also quoting the portion of text showing that an OS survey response was 12 times more likely to come from the Internet domains ibm.com and ibm.net than was a typical BYTE Site visit.

We'll continue to run surveys on The BYTE Site. The anecdotal information we've gathered, for example from the ISDN survey, seems valuable as an indicator of trends and opinions. Quantitative data may have some limited value as well, on subjects charged with less religious fervor. However, we won't put much stock in the numbers until we can invite true random samples of participants, probably from a (yet-to-be-developed) site-registration database. For that valuable lesson learned, we have Team OS/2 to thank.


TOOLWATCH

gnu-win32 (free
 at: 
ftp://ftp.cygnus.com/pub/gnu-win32/latest
)
Cygnus Support
Mountain View, CA
Internet: 
http://www.cygnus.com


The gnu tools and libraries -- gcc, bash, grep, bison, the lot -- for NT (x86 and PowerPC).


BOOKNOTE

World Wide Web Database Programming for Windows NT by Brian Jepson

John Wiley and Sons
Internet: 
http://www.wiley.com/compbooks/

Price:    $39.95

A gr eat Perl-oriented tutorial for NT-based Web developers.


Using the Perl 5 ODBC Module

use NT::ODBC; $o = new NT::ODBC("DSN=os"); 
@use_cli = ('use_cli_mac','use_cli_os2','use_cli_w95', 'use_cli_nt','use_cli_unix','use_cli_w3x');
foreach $s (@use_cli,@use_srv) {
  if ($o->sql("select $s, count(*) from $view group by $s") != 0)
      { die; };
  print "\n"; &printData;} }
sub printData {
  local ($data);
  while ($o->fetchrow) {foreach $f ($o->field names) {
    $data = $o->data($f); print $data . "\t"; } print "\n"; } }





One Form, Many Uses

illustration_link (41 Kbytes)


World Wide Web Database Programming for Windows NT

photo_link (44 Kbytes)


Jon Udell is BYTE's executive editor for new media. You can contact him at jon_u@dev5.byte.com .

Up to the Web Project section contentsSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network