A servlet-based group calendar becomes a surprising success and prompts an exploration of ways to bind Java programs to persistent storage.
Jon Udell
Group scheduling tends to generate fairly small amounts of complex object data. With nothing more than a servlet engine, the JDK 1.1, and a bit of ingenuity, you can create useful applications in this domain very quickly.-- from the June "Java Servlets" column
Boy, was I right about that! As I predicted, a modest effort--about three days' worth of work and 500 lines of Java code--yielded the simple Web-based group calendar that BYTE staffers had been clamoring for.
We're serious about Web-based collaboration lately. In March, BYTE's headquarters relocated from Peterborough, New Hampshire, to Lexington, Massachusetts. We retain satellite offices in Peterborough, San Mateo, and Frankfurt-- and, of course, staffers are as likely to log in from their homes, or from hotel rooms, as they are from any of these official locations. The Web is fast becoming the glue that holds our company together.
Two applications in particular help us collaborate: private news servers that we use for free-form document exchange and discussion, and now the Java-based calendar that enables us to share structured, time-based information. I'll say more about how we use news servers in another column. This time I'll focus on some lessons about Java persistent storage that I learned while building and using the BYTE calendar.
About the BYTE Calendar
This simple Web application (see
"ByteCal, the BYTE Calendar, in Action"
) aims to do nothing more than provide an electronic bulletin board, in the form of a calendar, that's universally available to BYTE staffers worldwide. Like the Polls servlet that I described in the June column, ByteCal just manages a simple namespace. In fact, the two servlets share the same data structure--a hashtable of hashtables. In ByteCal's case, the keys of the top-level hashtable are a set of user names, and the values are secondary hashtables. The keys of each secondary hashtable--one per user--are date strings, such as "Mon May 19 1997"; the values are user-supplied strings, such as "Dentist appointment 8 AM."
One user name, _Global, is special: All other user names inherit from it. For example, the Edit screen for user "Jon Udell" and week "Mon May 26 1997" contains no data for Monday, but the View screen for me (or any other user) reports that Monday is Memorial Day. Why? There's an entry for Memorial Day on the global calendar. This inheritance helps keep ByteCal's data structure lean an
d sparse.
Data grows slowly for other reasons, too. Secondary hashtables spring into existence only when first referenced. They add new entries only for days that record activities. And an entry does not consume many more bytes than the combined lengths of "Mon May 19 1997" and "Dentist appointment 8 AM."
So what? Well, consider that, after a month of use, the disk file to which ByteCal serializes the calendars of two dozen staffers is still under 40 KB--the size of an average Web page. Not everyone on staff uses ByteCal yet, so let's assume a doubling or quadrupling of users and entries in the coming months. Still, a year's worth of calendar data uses up just a megabyte or two.
Where's a good place to manage a megabyte or two of data? How about in RAM? In fact, that's just where ByteCal keeps the data. Updates flush to disk for safekeeping and so that ByteCal can restore state when the server restarts. But when you fetch eight weeks' worth of calendar data for viewing, it comes straight
from memory. I can't think of a better use for the 2 of the 64 MB of RAM in the server that runs ByteCal. When you're dialing into the Internet from a notebook PC over a crummy hotel phone line, you don't need any unnecessary delays.
Synchronization + Serialization = Persistence
As I explained in the June column, you can use a Java servlet to solve a difficult problem--safe multithreaded use of complex data--in a simple way. Just add the "synchronized" keyword to the methods that touch in-memory objects. The Java virtual machine (VM) ensures orderly thread-at-a-time access to those objects.
If servers never had to restart, synchronization alone would solve the entire problem for data sets small enough to fit conveniently in memory. In the real world, of course, power occasionally fails and servers sometimes crash, so ByteCal serializes its data to disk. This technique, new with JDK 1.1, primarily serves the needs of Java's Remote Method Invocation (RMI) facility. RMI needs to flatten Java objects into bit streams in order to pass them over networks. But you can also easily redirect these bit streams to disk files, which become a primitive but surprisingly handy form of persistent storage.
As does the Polls servlet, ByteCal takes the path of least resistance. On every update it calls the WriteObject method of the root hashtable, thus serializing the calendars of all users at once. With a database that's still tiny, there's currently no reason not to do it this way. Clearly, as the database grows, so will the time required to complete this write operation. I can think of three ways to combat this problem:
1) Serialize in a background thread. Users now wait for the write to
complete, but they don't really need to.
2) Serialize on a scheduled basis and supplement with a transaction log.
3) Subdivide the data. Currently there's just a single disk file,
called
bytecal.obj
, containing the whole set of calendars. But an
update a
ctually involves only one user's calendar. Saving the
per-user hashtables in per-user files would yield a much more
granular process of serialization.
All three approaches would make the application more complex. I prefer the last one because it's a minimal solution that rewrites only what needs rewriting. However, I don't think I'll ever implement any of these schemes. Why not? Object databases are a better way to make nontrivial Java data sets persistent.
Java-Aware Object Databases
Java and object databases are a marriage made in heaven. If you prowl around in comp.databases.object, you will sense a ground swell of interest in the subject. Why? Java's immature SQL foundation worsens the impedance mismatch that always plagues object applications wired to relational data stores. So, developers are looking for ways to connect those apps to persistent object storage. Of course, Java's ODBMS foundation is no Rock of Gibraltar yet, either. But my experiments with Ob
jectStore 5.0 (see the Eval "What's in Store for the Web") convinced me that persistent Java is a reality now--and a promising future direction.
I should mention that Object Design (http://www.odi.com/) isn't the only provider of persistent Java. Poet Software (http://wwww.poet.com/) offers a solution I haven't yet tried, and I bet there will be others by the time you read this.
At the moment, though, Object Design's low-end PSE (Persistent Storage Engine) for Java, which both Netscape and Microsoft are bundling with their next-generation browsers, seems the most convenient way to get started. It's a self-contained
, pure-Java implementation. PSE for Java, which is freely downloadable, delivers simple persistence. PSE Pro, which currently costs $250, adds a database-recovery tool and the ability to open multiple databases at once.
The PSE products share a common Java API with the flagship product, ObjectStore. Object Design hopes you'll like the rowboat and trade up to the ocean liner. Note, though, that while PSE was conceived for browser-based local storage, it's not restricted to that use. In my case, although I still have little use for client-side Java, I'm forging ahead with server-side Java. PSE is nominally a single-user product, but that's not necessarily so if you bind it to a servlet. Use Java synchronization to isolate servlet invocations from one another, and you can actually deploy PSE in a multiuser application.
Making ByteCal Persistent
ByteCal serializes a hashtable of hashtables, plus several vectors. ObjectStore can't store objects of the native Java types
Hashtable
and
Vector
. But it does provide persistence-capable equivalents to these classes:
OSHashtable
and
OSVector
. Converting to these types was a simple search-and-replace operation. Since
OSHashtable
mirrors the interface of
Hashtable
, none of the code that does Get and Put operations had to change.
Simple? So far, but things got trickier. Matching the thread model of the servlet engine, Acme.Serve, to the thread model of ObjectStore's Java interface was a puzzle. I was glad I had an Object Design engineer on hand to help--the company says this is standard practice for all customers, not just BYTE reviewers--and even he had to call the home office for help.
What finally worked was to record the servlet engine's thread ID in a class variable and then refer to it from a database-initialization call in each invocation of ByteCal. With this arrangement, the servlet engine owned a pipe to the database that many invocations of ByteCal (or, for that matter, other servl
ets running in the same Java VM) could share.
Next came transactions. You can't read or write persistent data outside transaction boundaries. I fiddled with different schemes for a while and finally settled on a single pair of transaction calls bracketing ByteCal's main service routine. There is a trade-off here between transaction granularity and simplicity. I took the easy route, but if I deploy ByteCal using ObjectStore, I'll need to revisit this issue.
Note that neither version of ByteCal currently does pessimistic locking. So, if I'm editing my calendar for the week of June 2, and you are, too, the last writer wins. In ByteCal's case, the probability of such a collision is small. But the transactional semantics of ObjectStore don't help here. Synchronizing multiple live copies of a record in multiple workstations is a classic problem. A Web application, like any application, must deal with it (by providing users with abort/retry options) or accept the consequences.
Finally, I wanted to
create a reusable reference to my top-level hashtable. Persistent Java programs begin by associating transient objects with database roots. In the case of ByteCal, a reentrant servlet, you have to re-create that association each time. Isn't there some way to remember, across invocations, that an
OSHashtable
object called
hByteCal
represented the database root
ByteCal
? Yes, there is. If you call the transaction-commit routine with the flag RETAIN_HOLLOW, ObjectStore remembers the association.
A Smooth Migration Path
I'm still running ByteCal in serialization mode. But I've got an ODBMS-aware version waiting. For our own use, PSE Pro will likely suffice. Its pure-Java implementation of persistent storage can't deal with thousands of users or gigabytes of data, but it ought to handle our calendar just fine.
Would ByteCal ever need to scale massively? It's conceivable. A future version of The BYTE Site might offer calendar services as a subscriber benefit. I don't
know if that will ever happen, but if it does, a ByteCal/ObjectStore capable of handling 10,000 users is ready to go.
NOTE:
I'll be speaking at the O'Reilly Perl Conference, August 19-21, at the Fairmont Hotel in San Jose, California. See
http://www.ora.com/info/perl/conference
. Hope to meet some of you there.
TOOLWATCH
pat 1.0...........................$10 (shareware)
Steven R. Brant
Internet: http://www.win.net/~stevesoft/pat
This Java library does regular expression matching à la Perl 5. Even better, you can extend the pattern matcher so that it recognizes user-defined classes of strings--for example, valid dates.
BOOKNOTE
Java Threads
..............................$29.95
by Scott Oaks and Henry Wong
O'Reilly and Associates
Internet: http://www.ora.com/
Java makes thread synchronization seem easy, but under the hood it's still a scary subject. This excellent guide delves deeply into scheduling, synchronization, and deadlock avoidance. Halfway through I jumped up to rewrite a servlet that, I then realized, was unnecessarily calling one synchronized method from another, risking possible deadlock.
photo_link (20 Kbytes)

screen_link (60 Kbytes)

The calendar present three screens to users: Main, View, and Edit.
Jon Udell is BYTE's executive editor for new media. You can reach him by sending e-mail to
jon_u@dev5.byte.com
.