Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesFace Value


February 1995 / State Of The Art / Face Value

Faster and more sophisticated algorithms are helping computerized facial-recognition systems come of age

Edmund X. Dejesus

Most pictures on driver's licenses challenge peoples' facial-recognition abilities. Until recently, real-time facial recognition has been impossible for computers. Now, however, the MRMV (Massachusetts Registry of Motor Vehicles) is betting that the algorithms to control a CFR (computerized facial recognition) system are sophisticated enough to quickly analyze its entire database of driver's licenses and help eliminate false IDs.

The Commonwealth of Massachusetts is implementing a CFR application built on Photobook, a research project at MIT. David Lewis, senior deputy registrar for the MRMV, expects the system will store digital images of 4.2 million Massachusetts drivers and will be operational at the central Boston headquarters and at over 30 branch offices by this summer. The system will use a central DEC Alpha-based server to hold the digitized facial images, and an existing IBM mainframe will handle the names, addresses, and other demographic data of licensees. Branch offices will use DEC PCs as local servers and clerk terminals.

According to Lewis, the facial-recognition capability will be added once the hardware is in place. The opportunity to compare a picture from a driver's license with millions of digital facial images is only one reason that motor vehicle registries are storing digital images of licensed drivers. Another use is to accommodate people who have lost their license. With no identification other than their face, a Massachusetts driver may soon be able to apply for a duplicate license.

In the future, CFR might help thwart crime. For example, although they're convenient, ATMs (automatic teller machines) are the source of annual fraud that in some estimates totals m illions of dollars per year in the U.S. Fraud in government-benefits payments is estimated at tens of billions of dollars per year. CFR systems promise immediate verification of ATM cardholders or benefits recipients.

The facial-image database at the MRMV might be made available to law enforcement officers searching for criminals. However, Lewis says that photo images would not be considered public record and their distribution would be limited only to the police to avoid the specter of Big Brother and the potential fear by some people that a central facial database might lead to civil rights abuses.

Better Algorithms

What makes implementing CFR possible is the recent research that is beginning to yield fast, accurate, and commercially viable algorithms for a variety of facial-recognition applications. Previous attempts at incorporating CFR required powerful and expensive computers, which were often slow and produced inaccurate results. A person's new hairstyle or eyeglass es could confuse and defeat many systems.

Now, with Photobook, a set of interactive computer tools for browsing and searching images, you can use the system to recognize various types of images--including shapes, textures, and decorative patterns. Its facial-recognition capabilities are perhaps its most intriguing features. For example, Photobook lets you find all the faces that most closely match a target face. An entire search through a database of thousands of faces takes less than a second.

According to professor Alex Pentland, a Photobook developer at MIT's Media Laboratory, facial recognition is also a convenient means of identification because you don't have to worry about losing your ATM card or forgetting your PIN (personal identification number). ``You always have your face with you,'' he quips.

Pentland regards the explosion of multimedia applications, accompanied by the growing use of computers to create visual images and store digital images in databases, as a vast potential market for CFR. Currently, it's difficult to automatically search stored digital images for content. Typically, you must create text descriptions of each image and then search the text descriptions for keywords. Manual searches for images are tedious, slow, and expensive. However, programs like Photobook create and search for compressed versions of images. Editors could use this content-based database to rapidly search for, say, all photographs showing both the president of the United States and the prime minister of Japan.

Similarly, in film and video productions where postproduction costs can eat up large portions of the budget, the ability to search for particular actors in certain scenes and simplify editing makes CFR economically attractive. In offices, CFR-savvy computers may also be able to recognize their own users. Some computers now come with video cameras mounted in the monitor. You can use these cameras with CFR to recognize users, log users onto the computer or network, and configure the c omputer with the user's known preferences. Pentland is further researching ways for computers to interpret the human emotions behind facial expressions.

Another CFR application that takes a different approach from that of Photobook is TrueFace, developed by Miros (Wellesley, MA) (see ``A Neural Net that Knows Faces''). According to Dr. Michael Kuperstein, a neural-network researcher formerly with MIT and currently the president of Miros, TrueFace is a better biometrics security solution than fingerprints, retinal scans, voiceprints, or hand-geometry systems. Besides beating most of these other biometrics systems in verification accuracy (with rates often over 98 percent), TrueFace and other CFR applications are passive and nonintrusive.

About Faces

In CFR, computers perform three distinct but related tasks: verification, recognition, and locating the face within the image. With verification, the system attempts to match a live face with a specific reference digital image. Recognition (or identification) lets the system try to match a live face with any saved faces in a central computer database. The location task lets the system ask the question, where is the face in this picture? This task is also necessary to perform verification and recognition, because the face must first be located within the digital image before any verification or recognition can take place. The location task can also be an independent application.

Verification is considered a much simpler task than recognition, because only a single comparison is necessary. System developers can adapt verification algorithms to perform one-to-one comparisons of the target face with each image in the database and then retain all those images that match. They can also adapt recognition algorithms to perform verification tasks by limiting the database to the single reference face and testing to see if the computer adequately recognizes that face.

Naturally, because recognition requires many more comparisons, recognition algorithms must be quick to be practical. By contrast, verification algorithms need not be nearly as fast, because only one comparison is necessary.

Location identification can be relatively simple (e.g., finding two circles that are assumed to be eyes), or it can consist of complex minirecognition algorithms that divide the entire image into smaller subimages and attempt to recognize a face in each subimage.

Real World

The goal of Photobook and other CFR systems is not only to perform these functions but to do so in real time or near real time. Photobook runs on Unix platforms, and a commercially available version of the recognition algorithm software, which is written in C and called Sherlock, supports DOS, Windows, and OS/2 platforms. Pentland serves as an adviser to Facia Reco Associates (Waltham, MA), a company set up to distribute the recognition software. Victor Colantonio, principal of Facia Reco, points out that Sherlock can identify other images besi des faces. For example, in a medical application the system could recognize specific patterns in microscope slides. Facia Reco licenses Sherlock to customers seeking to add its recognition capabilities to their own products and systems.

While people might remember a person's face by the size of their nose, the shape of their eyes, and the curve of their mouth, Photobook eschews such obvious features. Instead, its algorithm uses basic concepts from information theory. First, the program separates each face into a 2-D arrangement of light and dark areas (see ``How Photobook Recognizes Faces''). Then the algorithm determines the best facial features to discriminate the features of one face from those of another. Researchers call these discriminating features eigenfaces. The algorithm then represents each facial image as a combination of the eigenfaces. The Photobook stores an eigenface representation of each face in the database.

To identify a target facial image, the program compares its eigenface characteristics with all those in the database. The algorithm selects those faces whose representations most closely match the target face. If a recognition threshold has been defined and any of the matches satisfy the threshold, then the target face is recognized. Alternatively, the program can display any matching faces for you, in order of matching, and you can manually recognize the target face.

The eigenface algorithm is attractive for several reasons. Typically, a sample of only 40 eigenfaces gives excellent recognition results. This amount of data is far smaller than the number of features (i.e., pixels) in the actual face image (16,384 pixels for a 128-by-128 black-and-white image, and three times that number for a color image). Each face can be represented by a small number of bytes. If a 2-byte floating-point number is used for each eigenface value, only 80 bytes are required to represent each face. This is far less than the original image (which may be 250 KB before compression) or the 128- by-128 facial image of 16,384 bytes (before compression). The original image can be recovered quite faithfully from this small number of bytes (as a linear combination of the eigenfaces). Clearly, this property can be useful in itself, as it offers a way to compress facial images in otherwise unmanageably large digital databases, while allowing extraction of recognizable faces.

The representation of a face using eigenfaces is simple and fast. A face can be evaluated in as little as 1 second, according to Pentland. In addition, the comparison of one face to other faces is simple and fast. Comparisons can be done at the rate of millions per second. From a hardware perspective, the comparison process is memory-intensive: The more memory that's available, the better for recognition performance.

Saving Faces

Depicting faces as 2-D images and then encoding those images to preserve the most important discriminating characteristics involves two related processes: initialization (o r training) and recognition. The initialization process uses a set of digital facial images to produce an average face and eigenfaces.

The more controlled the circumstances of image acquisition, the simpler subsequent steps will be. Eliminating background clutter, using consistent and simple lighting, and limiting orientation of faces are all important. The creators of one database that Pentland used captured images at a booth during a Boston photography show. The booth's controlled environment allowed photographers to consistently set lighting and the background. Participants snapped their own picture when they could see two LED lights simultaneously, which ensured that their faces were uniformly oriented.

The size of the facial image also strongly affects algorithm performance, so each image should be scaled to approximately the same size. This can be as simple as expanding or contracting the image to make sure the eyes always appear in the same position or if conditions vary, become more comp lex. Orientation of the face is also important. You can rotate images clockwise or counterclockwise to ensure that the eyes are on a horizontal line or to satisfy symmetry or some more complex criterion. In addition, you can adjust brightness and contrast of the digital image to produce a standard image. Using a 2-D Gaussian window, you can clip the face. Besides simplifying the image, this also eliminates some possibly confusing hairstyle effects.

At this point, Photobook is ready to calculate an average face. To do this, the system averages (using the simple arithmetic mean) the brightness values at each pixel of the set of standardized digital facial images. These averaged values form the average face. The system then subtracts the average face from each individual digital face, and the result of this step is a set of differences from the average face. These differences are the basis for the next series of calculations.

Photobook performs a principal components analysis (or Karhunen-Loeve exp ansion) on these facial differences. This analysis finds the eigenvectors and eigenvalues of the covariance matrix, each column of which is formed from an image. To perform this on, say, a 128- by 128-pixel image (N=128) involves finding the eigenvectors of a 16,384-by-16,384 matrix (N-squared-by-N-squared matrix)--an intractable computational problem. Instead, Photobook users decide beforehand how many eigenfaces they want to analyze. In practice, M=40 eigenfaces have proven adequate. Users thus seek the M orthonormal eigenfaces that best discriminate one face from another. These are the M eigenfaces with the M largest eigenvalues. In effect, this reduces the dimension of the image space from N-squared dimensions to M dimensions (from 16,384 by 16,384 to 40, in the example). This smaller M-dimensional subspace of the original image space is called face space. The M eigenfaces span face space (i.e., any face can be represented as a linear combination of the M eigenfaces). The M eigenfaces become the eigenface s. Although eigenfaces represent the most discriminating features of the set of digital face images, they do not represent any particular recognizable features that people would use to identify a face.

The results of this initialization process are threefold: the average face for this set of digital facial images, the M eigenfaces for this set of digital facial images, and a database of known faces encoded in terms of the eigenfaces.

Face to Face

With this work completed, it's now possible for Photobook to perform the recognition process. First, it locates and standardizes the target face image, as described in the preceding section on initialization. Photobook then subtracts the average face from the target face. The system decomposes the difference in terms of the eigenfaces. In matrix terms, this is the product of the difference with the transpose of the matrix of eigenfaces. The result is a set of M coefficients (or M weights) of the eigenfaces that characterizes the t arget face. This set of M coefficients can also be regarded as the M coordinates of a single point in face space or as the M components of a vector in face space. These coefficients are like a recipe for constructing the target face out of the eigenfaces: so much of this eigenface plus so much of that eigenface.

Photobook can compare the M coefficients of the target face with those of each encoded face in the database. The simplest way to do this is to regard each face (including the target) as a point in face space, and to calculate the Euclidean distance between the target face point and each other face point in the database. (Actually, using the square of the distance precludes performing a time-consuming square root for each point in the database.) Computationally, this involves M subtractions, M multiplications (squaring), and M-1 additions. The smallest calculated distance is the closest match, the next-smallest distance is the next-closest, and so on. Alternatively, you can perform the search as a database lookup (assuming that the faces are sorted by their coefficients).

At this point, the system is ready to order the faces by distance and present the results to the users. The result is a list or display of the closest matching faces. Notice that it is the simple nature of the comparison step described above that makes this algorithm so fast. You don't need a supercomputer to use Photobook; a high-end PC or Unix workstation is adequate. If the target face image is not an actual face, the distance from the database faces will be huge. This is one way to test if an image is actually a face. When faces are added to the database, the average face, the eigenfaces, and all the coefficients of each saved face must be recomputed. If a new face is closer to the average face than one of the existing database faces, then recomputation isn't essential. In any event, the recomputation can be done off-line.

The Eyes Have It

Pentland and Baback Moghaddam, an MIT graduate stude nt, have recently added a new layer of discrimination to the eigenface algorithm. Called eigenfeatures, this layer can locate and compare specific facial features, such as eyes, noses, and mouths. The eigenfeatures algorithms are similar to the eigenface algorithm and use discriminating characteristics (e.g., eigeneyes, eigennoses, and eigenmouths) to help distinguish similar faces from each other. Using eigenfeatures boosts the accuracy of recognition by several percentage points, Pentland says.

Photobook usually isn't fooled by complications such as hats, eyeglasses, and changed hairstyles. In addition, it can handle different facial expressions, changes in lighting, inclination of the head, and changes in facial hair. Of course, extreme efforts at disguising a face can fool the algorithm (as they fool humans). However, for most commercial CFR applications, a person wants to be recognized, for example, to use an ATM, gain entrance to a building, or receive benefits payments. As a result, getting them to pose correctly or remove eyeglasses or headgear usually isn't a problem.

Test Drive

In a typical session with Photobook, you select a face from a random sample of faces displayed. Practically instantaneously, Photobook finds all those faces that most closely match the selected face, sorts those faces, and displays them on the screen for further use.

Despite its simplicity and speed, the Photobook algorithm appears to be accurate. In one test, Pentland used a database of 7562 facial images of nearly 3000 different people. These images included a number of participants with different facial expressions, eyewear, hairstyles, and headgear--all factors that you would expect to complicate the task of recognition. The test used 200 faces chosen randomly from this database, and Photobook selected the most closely matching face. If Photobook's selection was in fact the same person, it was scored as correct. If Photobook's selection was not the same person, it was scored as inco rrect. According to Pentland, even with the complicating factors mentioned above, Photobook achieved a 95 percent recognition rate.

In a similar test emphasizing verification over matching, Photobook scored 99.9 percent accuracy using the same database, Pentland says. For comparison purposes, this level of verification is at least as good as that provided by a single fingerprint, although CFR is far simpler and less intrusive than fingerprinting.

The U.S. Army recently conducted tests of several different algorithms and approaches to CFR to verify the sometimes inflated claims of researchers. Preliminary results from these tests indicate that the Photobook algorithm had the best overall performance with scores of over 90 percent in recognition and nearly 100 percent in verification.

``The positive aspect of face recognition is that it's a little bit like living in a small town,'' Pentland observes. ``You walk up to the cash machine, and it knows who you are.''

The widespread use of CFR may turn the world into a small town: Wherever you go, your face will be recognized, and you will be trusted. This may have a distinctly humanizing effect on the world.


Edmund X. DeJesus is a BYTE senior editor. He has a Ph.D. in physics and has been a professional programmer for over 15 years. You can reach him on the Internet or BIX at edejesus@bix.com .

Up to the State Of The Art section contentsGo to previous article: Piecing Together PuzzlesGo to next article: How Photobook Recognizes FacesSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network