The database for storing the DNA profiles is surprisingly easy to build. Although humans have an immense amount of genetic information encoded in their DNA (about 3 billion amino acids), the FBI's databases need to store only a handful of integers to identify each person. The integers roughly measure the length of a DNA strand containing a particular gene after a special set of enzymes slices up the DNA.
These enzymes cut only the genes where specific genetic patterns occur, and the location of these patterns varies widely from person to person. The result is that the lengths of the strands of DNA left after the enzymatic cutting vary widely from person to person. The lengths of these strands are unique and as personal as fingerprints.
The
FBI's DNA database
looks for a match between two subjects by comparing the lengths of the strands that contain a particular gene location. If the lengths fall within a fixed percentage of each other -- 2.5 percent to 6 percent -- investigators consider it a match. The locations where the enzymes do their cutting is so variable that the distribution of the lengths of the DNA strands is broad, and the probability of two people matching is extremely low.
For instance, one lab might choose to slice up the DNA and test the lengths of the strands that contain four common genes (D2SS44, D157, D1580, and D17S79 are some popular versions). If the probability that two people produce strands with the same rough length is about 1 in 40, the odds of all strands having the same length is roughly 1 in 2.5 million.
The FBI's database also judges the "strength" of a match to predict the likelihood that two people would have the same mix of genes that turn out to look identical to the test. The FBI's system
uses a collection of tables that were developed by genetic scientists. These tables describe the distribution of DNA readings throughout the population.
The database must hold the results from a variety of different genes because many labs use different selections of genes. At least 12 genes are common throughout the country. In general, though, these genes were selected from regions of the DNA that don't seem to have any relationship to physical characteristics. Forensic scientists use these to avoid any future temptation to use the database for purposes beyond identification.
Technicians match DNA samples in the SQL database via a custom front end. After choosing the loci (or markers) used in the DNA sample, technicians input the corresponding data, and any matches appear on the screen.
Each local lab maintains a database of samples it processes. If it can't find a match in this collection, it forwards the query to the state database, which contains a copy of records from all other lab
s in the state. If there is still no match, the query is passed to Washington, D.C., where the FBI maintains DNA records for the entire country.
screen_link (19 Kbytes)

The custom front end to the FBI's DNA database lets technicians choose the loci used in the DNA sample, input the corresponding data, and search the SQL database for a match. In this case, the system found only one exact match.