Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesClean Up: C++ Garbage Collection


January 1996 / Core Technologies / Clean Up: C++ Garbage Collection

Here's a tool that helps you keep your code clean by taking out the trash

Justin Miller

Every C++ programmer at one time or another has had the following problem: Somewhere, somehow, your program calls new more times than it calls delete. Or perhaps it's the inverse: Your program tries to use an object that's been deleted. Maybe it deletes an object more than once -- perhaps two, even three times.

These related problems can manifest themselves in mysterious ways. A temporary string object doesn't get deleted; your program continues to run but inexplicably exhausts memory. Other times, the manifestations are not so mysterious. A rout ine attempts to dereference a pointer that is pointing to a deleted object and the whole program crashes spectacularly.

It's difficult to keep track of all the objects you have to delete. Consequently, during design time, you're forced to repeatedly ask: "When is it safe to delete this object?" Because this is often a tough question to answer, this ongoing wrestling match with memory almost always complicates your design. And even the most meticulous programmer makes an occasional error, leading either to a global protection fault (or its equivalent) or memory that's never freed.

Pandora's Box Closed

The solution to all this is a garbage collector. With a garbage collector, you never have to delete anything. The collector automatically deletes an object when your program no longer references that object. If you've seen garbage collectors, you're probably concerned about their performance. Some work by "waking up" intermittently and searching through memory for objects that can be deleted. This can bog down the application.

The speed of a garbage collector depends on its type. The most general types of garbage collectors -- which do, in fact, search through memory -- are slow if you compare them to "manual" memory management (i.e., explicitly using delete). However, most of the time your program won't need such a complicated collector. A reference-counting garbage collector is sufficient.

A reference-counting garbage collector is fast because it doesn't have to search through memory. Instead, for every object, it keeps a running count of the number of pointers that point to that object (this running count is the object's "reference count"). Every time a pointer is assigned to point to an object, that object's count is incremented. Every time a pointer is reassigned to point to a different object, the original object's count is decremented, and the count on the object to which the pointer is newly assigned is incremented. Also, every time a pointer goes out of scope, the count for the object to which it pointed is decremented.

Because the garbage collector keeps track of the number of pointers referencing the object, it knows that when the reference count for an object reaches zero, the object can be safely deleted (if there are no pointers to the object, your program can't access the object).

Limitation

The only drawback to such a garbage collector is that it can't "collect" cyclical structures. For example, suppose you have three objects, A, B, and C. They are tied to one another in the form of a circular list: A contains a pointer to B, B contains a pointer to C, and C contains a pointer to A. Initially, some other pointer may have pointed to A, but that pointer has gone out of scope and, consequently, this circular structure is no longer accessible from your program. It should be put in the garbage.

However, since each object points to the next member of the loop, each of these objects still has one outstanding reference. As a result, a reference-counting garbage collector won't collect the members of this structure; they will be "leaked."

If your program does include a circularly linked list, that doesn't mean you cannot use a reference-counting garbage collector. You can arrange to have the garbage collector properly delete the structure, using what I call the "linchpin" method. With this technique, you pick one object in the cycle to be the linchpin object. When your program no longer needs to use a circular structure, it pulls the linchpin; that is, your program sets all pointers inside the linchpin object to null, thereby breaking the loop. Since the loop has now become a linear list, the garbage collector can delete the remaining objects appropriately.

C++ to the Rescue

At this point, though you may be inclined to agree that a reference-counting garbage collector would be a fine programming tool, you're probably concerned about the time you'd have to spend to build one. And if you do decide to build one, you must be exc eedingly careful in its construction: Any errors in the reference-counting code of the collector would produce an unacceptably leaky garbage collector.

Happily, C++ operator overloading permits you to implement a completely transparent and foolproof means of reference counting. I've used this concept to create the reference-counting garbage collection system accompanying this article.

RGC, my reference-counting garbage collector, defines two template classes. These template classes emulate pointers by overloading the =, ==, *, and -> operators. Behind the scene these operators do the reference counting for you. Whenever the reference count of a particular object falls to zero, these operators delete the object.

You needn't be aware of the reference counting at all -- it's entirely transparent. And because you're unaware of the ongoing reference counting, you can't make any mistakes regarding it. This ensures the accuracy of the reference counts and hence the reliability of the garbage col lector.

Using RGC

Incorporating RGC in your next C++ development project is simple. First, you'll need to include RGC.h in your source files. This file defines three classes: TRGCObj, TRGCRef, and TRGCRefObj. The first, TRGCObj, should be declared as the virtual public superclass of any object to be managed by the garbage collector. TRGCRef is a template class. Its parameter is the type for which it will act as a pointer. This type must inherit from TRGCObj. The third class -- TRGCRefObj -- is also a template class. TRGCRefObj is just like TRGCRef, but it inherits from TRGCObj so that it can be tracked and deleted by the collector. TRGCRefObj exists so that the system can support the equivalent of pointers to pointers.

Inside RGC

TRGCObj includes two member functions, RegisterReference and DeregisterReference. These functions respectively increment and decrement an object's reference count. Additionally, they return the updated number of outstanding refe rences.

Garbage collection takes place inside the TRGCRef template class. At any given time, one of its instances (which I'll call a reference from now on) either points to a valid object in memory or is null. TGRCRef overloads the assignment operator so that every time a reference is assigned a value, two things happen. If the reference had previously been non-null, DeregisterReference is called on the object to which it had been pointing. Then, if the reference is being assigned to a non-null value, RegisterReference is called on the object to which it is to point. This keeps an object's reference count accurate as assignment operations occur.

Additionally, the TRGCRef destructor handles the situation of a reference's pointing to an object when that reference is destroyed. The destructor calls DeregisterReference on the object to which the reference points. This keeps an object's reference count accurate when references go out of scope or are deleted.

Finally, if DeregisterReference ever r eturns a 0, indicating that there are no outstanding references to a particular object, the reference will delete the object.

An Example

Examine the following listing fragment:

#include "rgc.h"
class myclass : public virtual TRGCobj
{ };
typedef TRGCRef
 Pmyclass;
void main()
{
Pmyclass A = new myclass;
Pmyclass B = new myclass;
A=B;
}

When A is assigned to the pointer returned by new, the TRGCRef::setpointer function is called. This stores the pointer to which A is being assigned and calls RegisterReference on the object to which it points. The second assignment (of B) does the same thing.

But when A is assigned to B (the third assignment), a series of calls within the overloaded assignment operator cause DeregisterReference to be called. DeregisterReference will return a 0, since A had been the last reference to the object it had been pointing to. The garbage collector will delete this object (the result of the first new).

This as signment also causes RegisterReference to be called on the object to which A is being assigned. This increments the reference count in the object.

When the main function concludes, the destructors for the local variables (A and B) are called. The destructor for B is called first, which causes DeregisterReference to be invoked on the object to which B refers. DeregisterReference returns a 1, since A still points to the object. But when the destructor for A is called, DeregisterReference will be invoked on that same object and this time will return a 0, causing the object to be deleted.

Note that it is important not to use "regular" pointers to an object that is to be garbage-collected. A regular pointer will not be counted in the reference count, so it's possible that the object will be deleted while your program still holds an outstanding pointer to it.

Keep It Clean

RGC also handles pointers to pointers. For a reference that is a "second level" pointer -- i.e., a refer ence that points to another object and that is itself pointed to -- you must use TRGCRefObj. This allows the garbage collector to work on references just as it works on objects; when there are no references to the reference, the reference will be garbage-collected.

The above applies to arbitrary depths of pointers. In general, the outermost layers of pointers may be implemented as either TRGCRef or TRGCRefObj types; inner layers must be implemented as TRGCRefObj types.

RGC is a simple, efficient, and robust reference-counting garbage collector that releases you from having to keep track of when to delete an object. It thereby ends memory leaks, ensures that you never free an object too soon, and simplifies your code. In short, it will make you more productive and make your C++ programs more reliable.


Circular References Remain

illustration_link (3 Kbytes)

A reference-counting garbage collector's weakness is a circular structure. Even though all outside references to the structure are gone, each member still has an outstanding reference pointing to it.


Take RCG for a Test Drive

screen_link (26 Kbytes)

RGC is available from the BYTE Web Site at http://www.byte.com . RGC includes RGC.h, a test-driver program, and a self-tracing program that prints messages wheneve r an object is created or destroyed or its reference count changes.


Justin Miller is studying electrical engineering and computer science at the Massachusetts Institute of Technology. You can reach him on the Internet at jcmiller@mit.edu .

Up to the Core Technologies section contentsGo to previous article: Go to next article: More Control, Fewer HeadachesSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network