Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers

ArticlesPersistent Java


August 1997 / Web Project / Persistent Java

A servlet-based group calendar becomes a surprising success and prompts an exploration of ways to bind Java programs to persistent storage.

Jon Udell

Group scheduling tends to generate fairly small amounts of complex object data. With nothing more than a servlet engine, the JDK 1.1, and a bit of ingenuity, you can create useful applications in this domain very quickly.-- from the June "Java Servlets" column

Boy, was I right about that! As I predicted, a modest effort--about three days' worth of work and 500 lines of Java code--yielded the simple Web-based group calendar that BYTE staffers had been clamoring for.

We're serious about Web-based collaboration lately. In March, BYTE's headquarters relocated from Peterborough, New Hampshire, to Lexington, Massachusetts. We retain satellite offices in Peterborough, San Mateo, and Frankfurt-- and, of course, staffers are as likely to log in from their homes, or from hotel rooms, as they are from any of these official locations. The Web is fast becoming the glue that holds our company together.

Two applications in particular help us collaborate: private news servers that we use for free-form document exchange and discussion, and now the Java-based calendar that enables us to share structured, time-based information. I'll say more about how we use news servers in another column. This time I'll focus on some lessons about Java persistent storage that I learned while building and using the BYTE calendar.

About the BYTE Calendar

This simple Web application (see "ByteCal, the BYTE Calendar, in Action" ) aims to do nothing more than provide an electronic bulletin board, in the form of a calendar, that's universally available to BYTE staffers worldwide. Like the Polls servlet that I described in the June column, ByteCal just manages a simple namespace. In fact, the two servlets share the same data structure--a hashtable of hashtables. In ByteCal's case, the keys of the top-level hashtable are a set of user names, and the values are secondary hashtables. The keys of each secondary hashtable--one per user--are date strings, such as "Mon May 19 1997"; the values are user-supplied strings, such as "Dentist appointment 8 AM."

One user name, _Global, is special: All other user names inherit from it. For example, the Edit screen for user "Jon Udell" and week "Mon May 26 1997" contains no data for Monday, but the View screen for me (or any other user) reports that Monday is Memorial Day. Why? There's an entry for Memorial Day on the global calendar. This inheritance helps keep ByteCal's data structure lean an d sparse.

Data grows slowly for other reasons, too. Secondary hashtables spring into existence only when first referenced. They add new entries only for days that record activities. And an entry does not consume many more bytes than the combined lengths of "Mon May 19 1997" and "Dentist appointment 8 AM."

So what? Well, consider that, after a month of use, the disk file to which ByteCal serializes the calendars of two dozen staffers is still under 40 KB--the size of an average Web page. Not everyone on staff uses ByteCal yet, so let's assume a doubling or quadrupling of users and entries in the coming months. Still, a year's worth of calendar data uses up just a megabyte or two.

Where's a good place to manage a megabyte or two of data? How about in RAM? In fact, that's just where ByteCal keeps the data. Updates flush to disk for safekeeping and so that ByteCal can restore state when the server restarts. But when you fetch eight weeks' worth of calendar data for viewing, it comes straight from memory. I can't think of a better use for the 2 of the 64 MB of RAM in the server that runs ByteCal. When you're dialing into the Internet from a notebook PC over a crummy hotel phone line, you don't need any unnecessary delays.

Synchronization + Serialization = Persistence

As I explained in the June column, you can use a Java servlet to solve a difficult problem--safe multithreaded use of complex data--in a simple way. Just add the "synchronized" keyword to the methods that touch in-memory objects. The Java virtual machine (VM) ensures orderly thread-at-a-time access to those objects.

If servers never had to restart, synchronization alone would solve the entire problem for data sets small enough to fit conveniently in memory. In the real world, of course, power occasionally fails and servers sometimes crash, so ByteCal serializes its data to disk. This technique, new with JDK 1.1, primarily serves the needs of Java's Remote Method Invocation (RMI) facility. RMI needs to flatten Java objects into bit streams in order to pass them over networks. But you can also easily redirect these bit streams to disk files, which become a primitive but surprisingly handy form of persistent storage.

As does the Polls servlet, ByteCal takes the path of least resistance. On every update it calls the WriteObject method of the root hashtable, thus serializing the calendars of all users at once. With a database that's still tiny, there's currently no reason not to do it this way. Clearly, as the database grows, so will the time required to complete this write operation. I can think of three ways to combat this problem:

  1)  Serialize in a background thread. Users now wait for the write to 
        complete, but they don't really need to. 
  2)  Serialize on a scheduled basis and supplement with a transaction log.
  3)  Subdivide the data. Currently there's just a single disk file,
        called 
bytecal.obj
, containing the whole set of calendars. But an
        update a
ctually involves only one user's calendar. Saving the
        per-user hashtables in per-user files would yield a much more
        granular process of serialization.

All three approaches would make the application more complex. I prefer the last one because it's a minimal solution that rewrites only what needs rewriting. However, I don't think I'll ever implement any of these schemes. Why not? Object databases are a better way to make nontrivial Java data sets persistent.

Java-Aware Object Databases

Java and object databases are a marriage made in heaven. If you prowl around in comp.databases.object, you will sense a ground swell of interest in the subject. Why? Java's immature SQL foundation worsens the impedance mismatch that always plagues object applications wired to relational data stores. So, developers are looking for ways to connect those apps to persistent object storage. Of course, Java's ODBMS foundation is no Rock of Gibraltar yet, either. But my experiments with Ob jectStore 5.0 (see the Eval "What's in Store for the Web") convinced me that persistent Java is a reality now--and a promising future direction.

I should mention that Object Design (http://www.odi.com/) isn't the only provider of persistent Java. Poet Software (http://wwww.poet.com/) offers a solution I haven't yet tried, and I bet there will be others by the time you read this.

At the moment, though, Object Design's low-end PSE (Persistent Storage Engine) for Java, which both Netscape and Microsoft are bundling with their next-generation browsers, seems the most convenient way to get started. It's a self-contained , pure-Java implementation. PSE for Java, which is freely downloadable, delivers simple persistence. PSE Pro, which currently costs $250, adds a database-recovery tool and the ability to open multiple databases at once.

The PSE products share a common Java API with the flagship product, ObjectStore. Object Design hopes you'll like the rowboat and trade up to the ocean liner. Note, though, that while PSE was conceived for browser-based local storage, it's not restricted to that use. In my case, although I still have little use for client-side Java, I'm forging ahead with server-side Java. PSE is nominally a single-user product, but that's not necessarily so if you bind it to a servlet. Use Java synchronization to isolate servlet invocations from one another, and you can actually deploy PSE in a multiuser application.

Making ByteCal Persistent

ByteCal serializes a hashtable of hashtables, plus several vectors. ObjectStore can't store objects of the native Java types Hashtable and Vector . But it does provide persistence-capable equivalents to these classes: OSHashtable and OSVector . Converting to these types was a simple search-and-replace operation. Since OSHashtable mirrors the interface of Hashtable , none of the code that does Get and Put operations had to change.

Simple? So far, but things got trickier. Matching the thread model of the servlet engine, Acme.Serve, to the thread model of ObjectStore's Java interface was a puzzle. I was glad I had an Object Design engineer on hand to help--the company says this is standard practice for all customers, not just BYTE reviewers--and even he had to call the home office for help.

What finally worked was to record the servlet engine's thread ID in a class variable and then refer to it from a database-initialization call in each invocation of ByteCal. With this arrangement, the servlet engine owned a pipe to the database that many invocations of ByteCal (or, for that matter, other servl ets running in the same Java VM) could share.

Next came transactions. You can't read or write persistent data outside transaction boundaries. I fiddled with different schemes for a while and finally settled on a single pair of transaction calls bracketing ByteCal's main service routine. There is a trade-off here between transaction granularity and simplicity. I took the easy route, but if I deploy ByteCal using ObjectStore, I'll need to revisit this issue.

Note that neither version of ByteCal currently does pessimistic locking. So, if I'm editing my calendar for the week of June 2, and you are, too, the last writer wins. In ByteCal's case, the probability of such a collision is small. But the transactional semantics of ObjectStore don't help here. Synchronizing multiple live copies of a record in multiple workstations is a classic problem. A Web application, like any application, must deal with it (by providing users with abort/retry options) or accept the consequences.

Finally, I wanted to create a reusable reference to my top-level hashtable. Persistent Java programs begin by associating transient objects with database roots. In the case of ByteCal, a reentrant servlet, you have to re-create that association each time. Isn't there some way to remember, across invocations, that an OSHashtable object called hByteCal represented the database root ByteCal ? Yes, there is. If you call the transaction-commit routine with the flag RETAIN_HOLLOW, ObjectStore remembers the association.

A Smooth Migration Path

I'm still running ByteCal in serialization mode. But I've got an ODBMS-aware version waiting. For our own use, PSE Pro will likely suffice. Its pure-Java implementation of persistent storage can't deal with thousands of users or gigabytes of data, but it ought to handle our calendar just fine.

Would ByteCal ever need to scale massively? It's conceivable. A future version of The BYTE Site might offer calendar services as a subscriber benefit. I don't know if that will ever happen, but if it does, a ByteCal/ObjectStore capable of handling 10,000 users is ready to go.


NOTE: I'll be speaking at the O'Reilly Perl Conference, August 19-21, at the Fairmont Hotel in San Jose, California. See http://www.ora.com/info/perl/conference . Hope to meet some of you there.


TOOLWATCH

pat 1.0...........................$10 (shareware)
Steven R. Brant
Internet: http://www.win.net/~stevesoft/pat

This Java library does regular expression matching à la Perl 5. Even better, you can extend the pattern matcher so that it recognizes user-defined classes of strings--for example, valid dates.


BOOKNOTE

Java Threads
..............................$29.95
by Scott Oaks and Henry Wong
O'Reilly and Associates
Internet: http://www.ora.com/

Java makes thread synchronization seem easy, but under the hood it's still a scary subject. This excellent guide delves deeply into scheduling, synchronization, and deadlock avoidance. Halfway through I jumped up to rewrite a servlet that, I then realized, was unnecessarily calling one synchronized method from another, risking possible deadlock.


Java Threads

photo_link (20 Kbytes)


ByteCal, the BYTE Calendar, in Action

screen_link (60 Kbytes)

The calendar present three screens to users: Main, View, and Edit.


Jon Udell is BYTE's executive editor for new media. You can reach him by sending e-mail to jon_u@dev5.byte.com .

Up to the Web Project section contentsGo to next article: Surprising Uses for Servlets
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network