Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesDNA by the Numbers


D ecember 1995 / Features / DragNET / DNA by the Numbers

The database for storing the DNA profiles is surprisingly easy to build. Although humans have an immense amount of genetic information encoded in their DNA (about 3 billion amino acids), the FBI's databases need to store only a handful of integers to identify each person. The integers roughly measure the length of a DNA strand containing a particular gene after a special set of enzymes slices up the DNA.

These enzymes cut only the genes where specific genetic patterns occur, and the location of these patterns varies widely from person to person. The result is that the lengths of the strands of DNA left after the enzymatic cutting vary widely from person to person. The lengths of these strands are unique and as personal as fingerprints.

The FBI's DNA database looks for a match between two subjects by comparing the lengths of the strands that contain a particular gene location. If the lengths fall within a fixed percentage of each other -- 2.5 percent to 6 percent -- investigators consider it a match. The locations where the enzymes do their cutting is so variable that the distribution of the lengths of the DNA strands is broad, and the probability of two people matching is extremely low.

For instance, one lab might choose to slice up the DNA and test the lengths of the strands that contain four common genes (D2SS44, D157, D1580, and D17S79 are some popular versions). If the probability that two people produce strands with the same rough length is about 1 in 40, the odds of all strands having the same length is roughly 1 in 2.5 million.

The FBI's database also judges the "strength" of a match to predict the likelihood that two people would have the same mix of genes that turn out to look identical to the test. The FBI's system uses a collection of tables that were developed by genetic scientists. These tables describe the distribution of DNA readings throughout the population.

The database must hold the results from a variety of different genes because many labs use different selections of genes. At least 12 genes are common throughout the country. In general, though, these genes were selected from regions of the DNA that don't seem to have any relationship to physical characteristics. Forensic scientists use these to avoid any future temptation to use the database for purposes beyond identification.

Technicians match DNA samples in the SQL database via a custom front end. After choosing the loci (or markers) used in the DNA sample, technicians input the corresponding data, and any matches appear on the screen.

Each local lab maintains a database of samples it processes. If it can't find a match in this collection, it forwards the query to the state database, which contains a copy of records from all other lab s in the state. If there is still no match, the query is passed to Washington, D.C., where the FBI maintains DNA records for the entire country.


Customized DNA Database

screen_link (19 Kbytes)

The custom front end to the FBI's DNA database lets technicians choose the loci used in the DNA sample, input the corresponding data, and search the SQL database for a match. In this case, the system found only one exact match.


Up to the Features section contentsGo to previous article: DragNETSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network