A system for matching static or distortable fonts
Scott Boggan and Michael De Laurentis
As the popularity of the dozen or so inexpensive font libraries on the market attests, computer users cannot resist fonts. Perhaps because fonts deliver on the promise of personalizing the PC, documents are dressed in everything from Aachen to Zapf Dingbats.
This creative freedom comes at a price, however, as fonts present a significant barrier to document portability. When documents containing fonts are exchanged between platforms and among networked workgroup users, problems inevitably occur. For example, a proposal formatted in Century Old Style that's sent to a Windows system that doesn't have Century Old Style installed will likely be displayed and printed on the Windows system
in Times New Roman, destroying the document's line endings and page breaks in the process. Or a PC document containing CG Omega might be sent to a Macintosh counterpart that doesn't recognize that the font is identical to Optima and instead displays the document in Courier.
The Panose typeface-matching system from ElseWare (Seattle, WA) attempts to solve these font problems. Panose has been adopted by a variety of software and hardware vendors, including Aldus, Go Corp., Hewlett-Packard, Lotus, Microsoft, and No Hands Software. Many type vendors, including Agfa, Bitstream, and Microsoft, have also licensed Panose for use in their retail font products. By objectively classifying fonts according to their visual characteristics, the Panose system selects and replaces fonts in documents on a variety of platforms, including Windows, Macintosh, DOS, Unix, and PenPoint systems.
The Trouble with Fonts
Font-portability problems are not surprising, given the widespread popularity of font libraries fro
m many vendors. Since font packages provide users with lots of fonts at a cost of just pennies each, most users have one or two of these packages installed on their system.
The font vendors themselves are partially to blame for the font-portability problem. Over the past 20 years, many font vendors have licensed or recut the popular type designs, marketing the fonts under new names. Name variations between vendors are the most common. Linotype-Hell owns a typeface that it calls Optima; Agfa calls its version of the same design CG Omega. Interplatform font-name variations are also common, even within a single vendor's product line. Adobe uses the name GoudyOldStyle for one of its fonts on the Macintosh but calls its PC version Goudy Old Style (with spaces). Such confusing name variations alone are enough to throw a monkey wrench into the process of sharing documents.
The growing popularity of workgroup products such as Lotus Notes and Microsoft Windows for Workgroups aggravates font-portability p
roblems by increasing the likelihood of shared documents between computers. Portable computing also adds to the document shuffle between computers with different font configurations. Since fonts don't travel with documents, formatting is lost unless all computers that call up a document have the fonts requested by it.
Current Approaches to Font Substitution
Windows and the Macintosh address the font-substitution problem in very different ways, as do different applications vendors. Even without knowing it, most Windows users have experienced the Windows solution to missing fonts: When you open a document containing a missing font, the Windows font mapper finds a substitute and supplies it to the application. Since the Windows font mapper provides no notification of this process, it's often difficult to tell when a font is missing. Furthermore, there is no easy way to customize Windows font replacements, since Microsoft chose not to provide an interface to Windows font mapping.
How does the Win
dows font mapper work? The Windows core mapper uses weighted penalties to identify the closest font replacement and then provides the application with the font having the smallest penalty. The mapper assigns large penalties to font attributes such as character set, output precision, variable or fixed pitch, face name, family type, and height. Although these attributes preserve the overall feeling of the font, they ignore such critical visual characteristics as serif style, weight, proportion, and contrast. In addition, the Windows mapper doesn't handle font-name variations very well. In practice, Windows usually replaces missing fonts with either Arial or Times New Roman.
The Mac OS takes a different approach: If the requested font is not available, it notifies the user and displays the missing font in Courier. Although it effectively highlights the problem, this approach forces the user to either install the missing font or reformat the document.
Windows 3.1 provides vendors of applications and
fonts with a solution to font portability: font embedding. Embedding lets you include TrueType fonts in a document file. There are two types of font embedding: read-only and read-write. Read-only fonts allow a recipient of a shared document to use the embedded font for viewing and printing only; when the document is closed, the fonts are deleted. Read-write embeddable fonts are permanently installed on the system, allowing the user full access to the font for editing, viewing, and printing documents.
However, font embedding is not a practical solution to the font problem. Few applications vendors support it, and the idea has been coolly received by font vendors, most of whom limit their support to read-only access. Most important, embedding fonts increases document file sizes according to the size of the font files. For instance, because TrueType fonts range from 35 KB to 70 KB, embedding four fonts in a document increases its size by approximately 200 KB.
Panose Font Mapping
In an effort to
resolve font problems in portable-document software, the Panose font mapper uses techniques that differ from those of font embedding. The key to the Panose font classification and matching system is the Panose typeface classification number, a 10-byte description of a font's visual characteristics.
There are two other components to the Panose system: classification procedures and the Panose mapper. The classification procedures are used to assign a Panose number to a font. The Panose mapper accepts the number of each missing font, compares it against the fonts on the system, and then selects the closest match. The mapper also provides an interface for the user to adjust the mapper tolerances and override the replacements it provides.
The Panose number is an array of 10 digits. The first digit identifies the font family and determines the meaning of the remaining nine digits. The standard fonts used for European languages belong to the Latin Text and Display family, and the remaining nine digits
describe serif style, weight, proportion, contrast, stroke variation, arm style, letterform, midline, and x-height. Script fonts belong to the Latin Script fam-ily, and the remaining nine digits describe tool kind, weight, monospace, aspect ratio, contrast, topology, form, finials, and x-ascent (see the figure "A Sample Panose Number").
Since each digit in the Panose number is an integer, it expresses a few discrete values. Therefore, a font's weight (which can range from light to extra black) is measured and categorized in one of 11 buckets. Similarly, serif style is placed in one of 14 buckets according to its shape. This approach keeps Panose numbers very compact while providing enough information to find the closest match to a given font.
Panose classification procedures are the rules and equations used for determining a 10-digit Panose classification number. Font classification begins by printing selected characters from a font and measuring various attributes. For example, the widest and
narrowest stems on the uppercase O are measured, and the ratio between the two is used to determine the value of the contrast digit. There are a total of 65 Panose measurements, but through a process of elimination most fonts can be accurately classified in five steps (for sans serif) or nine steps (for serif), depending on shape complexity.
Once a font has been classified, the Panose classification number is registered and stored in a database that is distributed with the Panose mapper. Panose numbers are also embedded in documents created by applications that are Panose-aware. When documents are shared between Panose-aware applications, those applications reference the Panose database before referencing the Panose numbers that are already embedded in the documents.
The Panose Mapper
When a document is brought into a Panose-aware application, the application or operating system requests a font name from the MAI (Mapper Application Interface), which in turn queries the Core Mapper Services. T
he core mapper returns a Panose number to the MAI, which consults the Panose exceptions database for a custom mapping. If there is no exception in the database, the mapper displays a Results dialog box that tells the user what was found and allows the option to override the mapping. Finally, the mapper supplies the replacement font to the application or operating system (see the figure "The Panose Architecture" on page 190).
The Panose mapper software determines the closest possible font match on any given system by comparing the Panose numbers of the requested and available fonts. The individual Panose digits are compared, weighted by their typographic importance (e.g., weight carries more importance than contrast), and summed to provide a numerical visual distance.
The components of the Panose mapper include the Core Mapper Services and the MAI. The Panose mapper also includes a database of registered Panose numbers for most common font names. This allows accurate replacements should a documen
t provide only the missing font's name or request a font without an embedded Panose number.
The Core Mapper Services represent the basic Panose services for selecting the closest visual match or enumerating fonts by visual distance from a target font. The mapper looks at several factors when mapping fonts. These include the following:
-- The match value is the number returned by the font mapper to assess the visual similarity of two fonts. It is obtained by comparing each of the digits of the Panose number, multiplying each comparison by a weight, and adding them together. A small match value indicates a good match.
-- The threshold is a number that indicates the highest acceptable match value. This is used as an optimization by the font mapper to abort the match process once it has determined that the match value will be beyond the threshold. If no fonts can be found with a match that's less than the threshold, the default font is used. The threshold can be relaxed so that the mapper compute
s the match value regardless of its size.
-- Penalty tables contain the numbers that evaluate the closeness of two Panose digits. The tables can be thought of as 2-D arrays where the value from one digit indexes the row and the value from the other digit indexes the column. In reality, the mapper stores the tables in a compact form since, depending on the digit, there may be a great deal of repetition or a clear pattern in the penalty values. Each digit in a Panose number has one penalty table.
-- Mapper weights are numbers that control the impact each penalty table value has on the match value. There is one weight for each digit. After the mapper computes the penalty value, it multiplies the result by the weight. All the weighted penalties are added together to yield the match value.
-- Cross-class mapping makes it possible to use a Panose number from one classification system to select a font in a different classification. The current mapper supports cross-class mapping for Latin text to kanji t
ext and vice versa.
The MAI provides additional services for Windows and Macintosh applications developers. The MAI includes sample dialog boxes for both platforms, database services for retrieving Panose numbers, and an API that's designed for simple integration of Panose mapping into the application's existing font-selection mechanism.
The mapper algorithm starts with two lists: fonts requested by a document, and fonts available on the system. The mapper considers each font in the requested list independently and looks for the best choice among the available fonts. The result is a one-to-many mapping from the required list to the available list.
The mapper first checks the available fonts; if no matches are found, it looks at the exceptions list. Finally, it compares the required font against the entire list of available fonts and maps to the first font returned. If the mapper can't find a match within the specified tolerance, it returns the default font.
The MAI also looks at ot
her factors when matching fonts. These include the following:
-- Substitution tolerance sets Panose mapping tolerance, which determines when the mapper gives up and substitutes the system default.
-- The Alternate Spellings feature enables the user to modify the spelling list that comes with the MAI. This list captures cross-platform naming variations, such as Avant Garde (Macintosh) and AvantGarde (Windows).
-- The Matching Exceptions feature lets the user customize the behavior of the mapper. Exceptions are typically used to break ties between two otherwise identical matches. For example, Times on the Macintosh would typically map to Times New Roman in Windows, but an equally valid match may be Dutch Roman (a Bitstream font). Exceptions can also be used to create special mappings should the user want to do so.
The Panose number database contains over 2500 name-to-Panose number records for common TrueType, Type 1, Unix, and printer fonts. The database is included with the rest of the P
anose mapper components and is redistributed to the end user. All the Panose mapper components contribute very little overhead to a Windows system, consuming only 238 KB of file space.
Overcoming the Limitations
Panose 1.0's bucketizing scheme keeps Panose numbers compact while providing enough information to find the closest match to a given font, but it does have some limitations. For instance, since the font mapper uses lookup tables to calculate the differences between two fonts, classifying a new family (e.g., kanji) under Panose 1.0 requires an updated table for all existing systems. In addition, a large number of font attributes don't fit neatly into buckets, especially with distortable font technologies such as Apple's TrueType GX and Adobe Multiple Masters. Distortable type allows the user to modify a font's attributes (e.g., weight or width) to generate thousands of variations from a single master outline.
The revised Panose system, Panose 2.0, expands the 10 digits used in Panose 1
.0 to define 36 font traits in 100 bytes of data. In addition, the classification scheme is more quantitative. For example, Panose 1.0 uses a single value for serif style; Panose 2.0 has individual measurements for serif width, height, tip size, tip roundness, angle, balance, foot pitch, and more.
Panose 2.0 numbers have an arithmetic relationship and can be viewed as axes of a coordinate system. In Panose 2.0 terminology, each Panose 2.0 digit is represented by an axis in Panose space. The value of a single Panose digit is represented by a point along the axis. The Panose match value is simply a weighted distance between two points (here, weighted means each axis can be scaled to place greater emphasis on the distance for that digit). In simple terms, the Panose match value (or visual distance) can be expressed using the standard Cartesian distance formula.
Given the Panose-space base properties, any font can be defined in terms of Panose space. This provides a comprehensive system for describi
ng and comparing fonts. In the figure "A 2-D Panose 2.0 Space," the font that is closest to font A is font G, a single-axis distortable font (increasing the axis value for the font increases its contrast and weight). To find the closest match for a particular distortable font, Panose first locates the normal through the distortable font (line G in the figure) that passes through the requested font (point A in the figure). The distance along that normal is the match value between the two fonts. In addition, that point on the distortable font represents the instance of the font that you want to match.
A Panose 2.0 number contains sufficient information for converting from Panose space to the distortable font's space. Thus, once it finds the point nearest to the requested font, Panose can derive the appropriate settings for the distortable font technology to construct the font.
This highlights a fundamental difference between Panose 1.0 and 2.0. Panose 1.0 digits describe a font, but the logic for
assessing the visual distance between two fonts resides in the mapping software. Panose 2.0 digits represent a font's position in a Panose space where, by definition, the distance between two fonts is their visual distance. Thus, the logic for assessing visual distance actually happens when the font is classified (i.e., when its position in Panose space is determined).
This means the Panose 2.0 mapping algorithm is very simple: Each digit is stored with an ID number, or tagged digit. The mapper lines up digits with the same ID value and executes the distance algorithm. This allows for a small, fast, scalable algorithm that never needs to be modified.
The Next Step
Since Panose can provide so much detail about a font, the next logical step is to use Panose numbers to synthesize fonts. ElseWare's Infinifont system, an extension of the Panose system, can do just that. For distortable font technologies such as TrueType GX and Multiple Masters, Panose can create many fonts from one master font. As
with Panose 2.0, each of these master fonts can be represented by a shape (e.g., a line, square, or cube) in Panose 2.0 space.
Using the 36 Panose 2.0 digits, Infinifont can synthesize a font that captures the basic serif shape, stem shape, weight, contrast, and width. This provides enough data for Infinifont to re-create the approximate shape of the font, but the font would be somewhat homogenized and would lack the subtle intricacies that distinguish the best type designs.
To capture these intricacies, the Infinifont system accepts input from detail strings, which provide additional data for adjusting specific aspects of a font design. Infinifont supports global detail strings that adjust the characteristic of a font (e.g., the thickness of all uppercase diagonal stems) and local detail strings that adjust the individual aspects of a particular character (e.g., the distance by which a lowercase j extends below the baseline).
The small size of these descriptor files makes Infinifont ver
y efficient. Because most TrueType fonts consume 30 to 70 KB of disk space, a library of 150 fonts could easily consume 7.5 MB; Infinifont can store the same library in roughly 500 KB. This makes Infinifont attractive for such system components as printers, personal digital assistants, operating systems, and software applications.
Panose Space Properties
1. Each digit represents an axis; thus, Panose space can have up
to m dimensions, where m is the maximum number of Panose digits.
2. A single static font is represented as a point in Panose space.
3. A distortable font is represented as a higher-order object, such
as a line, polygon, or cube.
4. The distance between two fonts in Panose space measures the
visual similarity between the fonts: The shorter the distance, the
greater the similarity.
5. Panose space is extensible. In the rare circumstance that a font is
created that does not exist in Panose space, new digits are used to
account for it, thus
widening the scope of Panose space to include the font.
Figure: A Sample Panose Number
The Panose typeface classification number is a 10-digit description of a font's critical visual characteristics.
Figure: The Panose Architecture
A depiction of the Panose mapper's execution flow when a font is requested by an applicaton and/or the operating system. A font is requested when a document is brought into a Panose-aware application.
Figure: A 2-D Panose 2.0 Space
A 2-D Panose 2.0 space containing distortable fonts. Known Panose-classified fonts are represented as a single point in Panose 2.0 space (e.g., points A-F). Distortable fonts can be represented by a shape such as a line or a square (e.g., line G and square H).
Scott Boggan is the technical marketing manager at ElseWare Corp. (Seattle, WA) and is coauthor of Developing Online Help for Windows (Sams, 1993). You can reach him on the Internet at
scott@elseware.com
or on BIX c/o "editors." Michael De Laurentis is a senior developer at ElseWare. He is author of the Panose 1.0 font-mapping software and the "Panose 2.0 White Paper," the vision document for Panose. You can reach him on the Internet at
mike@elseware.com
or on BIX c/o "editors."