Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesThe Icon Programming Language


May 1994 / Hands On / The Icon Programming Language

A new way to deal with strings and structures

Ralph E. Griswold

Icon is a very high-level, general-purpose programming language with a strong emphasis on processing strings of characters and complicated structures. It was developed at the University of Arizona under the support of the National Science Foundation as a byproduct of research on high-level facilities for nonnumeric computation.

That description is accurate, but it doesn't really tell you what the language is like or why so many programmers love it. I'll explain these things and give some examples that convey the nature of the language.

An important issue in designing Icon was making programming easy, quick, and, we hoped, fun. The success of this design philosophy is illustrated by the fact that Icon programs are typ ically one-tenth to one-third the size of equivalent C programs and can be written correspondingly faster.

Programming tasks that require extensive manipulations of strings and structures are surprisingly common. Compilers, word processors, and databases are examples. Icon has been used for many things, including text formatting, natural-language processing, program generation, rapid prototyping, and AI. Because it's easy to program in Icon, programmers often use it for one-shot, throwaway applications. But this language is also popular for the most complex applications, including those of a speculative nature, where quick results and ease of modification are vital.

Icon started on Unix but is now available for many platforms, ranging from PCs to mainframes: the Amiga, the Atari ST, the Macintosh, MS-DOS, MVS, OS/2, many Unix systems, VAX, and VM/CMS. Implementations for Win32 and Windows NT are in progress. All implementations, including the source code, are in the public domain.

Major Features

Icon has many types of data to support a wide range of computational tasks. Integers, real (i.e., floating-point) numbers, and strings are familiar. In Icon, structures such as lists and tables with associative lookup are also data types. More on this later.

To make programming easier, Icon does not have type declarations. Type declarations make it easier to implement a language, but they often require tedious, verbose, and error-prone coding.

Although Icon lacks type declarations, it is a strongly typed language. During program execution, it checks types when necessary to ensure that they are correct. It also converts types automatically when necessary; for example, it converts numbers to strings for writing without any explicit action in the program. Similarly, in operations that expect a number, a string that looks like a number is automatically converted to a number. In other words, the implementation handles many matters that you have to explicitly program in most programming la nguages.

Like other powerful programming languages that date back to Lisp, Icon manages storage automatically. It creates objects as needed during program execution. Space for them is provided automatically, and unused space is garbage-collected when necessary. There also is no limit on the size of objects other than the amount of available memory. Some kinds of objects even grow and shrink automatically.

With an emphasis on processing strings and structures, you'd expect Icon to have a large repertoire of operations for dealing with such data. It does, but more important, it provides new ways of thinking about strings and structures that make programming easy and natural. For example, strings in Icon are true first-class values, not arrays of characters. A string-scanning facility supports sophisticated pattern matching without tedious bookkeeping. At the heart of Icon is goal-directed evaluation, which automatically searches among alternative results.

On the face of it, an Icon program looks a lot like a C or Pascal program. A program consists of a collection of procedures, and each procedure contains expressions that perform computations. Syntax clearly isn't everything, and as you get into programming in Icon, you'll see how its powerful semantics and automatic handling of details stand behind a familiar appearance.

It's time for a few simple examples. Here's about as simple a program as you'll find--a main procedure that writes out a greeting:

procedure main()

write("Hello world")

end

Now suppose you have a file that contains a lot of numbers, one per line, and you want to know their sum:

procedure main()

count := 0

while count +:= read()

write(count)

end

Each number that is read increments the count. Note that Icon automatically converts the values read from strings to numbers for addition, and it automatically converts the final number to a string for writing. The loop t erminates when there is no more data to read. No test for an end-of-file is needed--read() simply fails at the end of data.

Expression Evaluation

Much lies behind expression evaluation. In the real world, we constantly try to do things that may or may not be possible, and sometimes our attempts fail. That's fundamental to expression evaluation in Icon. If a computation cannot be performed (which is different from an erroneous computation), the expression fails. Control structures use the success or failure of a computation initiated by an Icon expression instead of the somewhat formal notions of true and false. This idea alone makes Icon programs shorter and easier to write than those in most programming languages.

In the real world, there are often many ways of doing things--alternative courses of action. For example, you may have a choice of doors by which to leave a large store. The same situation occurs in many programming tasks. Suppose, for example, that you want to locate equal signs i n a line of a program. There may be many equal signs. You may or may not want to know where the first one is. You may want to know where all of them are. In most programming languages, you have to pick your way through the line, keeping track of where you are, doing index arithmetic, and so forth.

In Icon, the computation is handled differently. A computation that has many alternatives generates those alternatives as needed. For example, upto(s1, s2) generates all the locations, from left to right, at which characters of s1 occur in s2. If you only ask for one, you get the first, as in

first := upto("=", line)

which produces the location of the first equal sign in the line. If there is no equal sign, upto() fails, and no assignment is made. It's a good idea to check for this. If you want all locations, there is a control structure to do that:

every write(upto("=", line))

writes all the locations.

Sometimes a successful computation depend s on a combination of things. For example, to find out if an equal sign occurs in a line at a location greater than 10, all that's needed is

if upto("=", line) > 10 then write("yes")

else write("no")

Here, the comparison operation keeps asking for locations until one is greater than 10. If there is one, yes is written. If there isn't, no is written.

The idea of generators opens up all kinds of possibilities. One useful generator is i to j, which generates the integers from i to j in sequence. With this, Icon doesn't need a for control structure. For example,

every step := 1 to 10 do

p(step)

calls p(1), p(2),...p(10). This can also be done more compactly with just

every p(1 to 10)

Alternation, denoted by a vertical bar, generates its arguments. For example, in

upto("=", line1 | line2)

the second argument of upto() is a generator, so the locations of equal signs first in line1 a nd then in line2 are generated. You can even write your own generators using procedures, so the possibilities are endless.

String Scanning

I'll shift gears now and look at string analysis. Scanning is based on the idea of a subject string that is the focus of the analysis. A cursor keeps track of the location of interest in the subject.

String scanning has the form s ? expr, where s is the subject of scanning and expr performs the scanning. The cursor starts at the beginning of the subject and can be moved by move(i), which advances it i characters, and tab(i), which sets it to the ith character. These functions fail and don't move the cursor if it is outside the range of the subject. As an example,

text ? while write(move(1))

writes the characters of text, one per line.

Much of the power of string scanning comes from using analysis functions like upto() to provide the argument to tab() and to move the cursor accordingly. Suppose, for example, that you are interest ed in the op codes that are used in an assembly language program. I'll assume a syntax in which op codes follow the first blank field of a line and are themselves followed by a blank field before their operands. Op codes can be found as follows:

ws := "\t"

line ? {

tab(upto(ws))

tab(many(ws))

opcode := tab(upto(ws))

}

The string ws defines white space--blanks or tabs. Several expressions are needed for scanning, so they are enclosed in braces. The first expression locates the beginning of the first blank field. (In string scanning, analysis functions are applied to the subject and need no second argument.) The second expression skips over this field, using many(), which produces the location at the end of a sequence of characters. The op code is whatever follows until the next white-space character.

In practice, a little bit more is needed to skip comment lines, handle op codes without operands, and perhaps check for correct syntax. I've omitted these niceties here to avoid complicating the example, but you'll find them in the listing "Tabulating Op Codes."

You can use the scanning expression above in a number of ways, which suggests that you should encapsulate the code in a procedure:

procedure opcode(line)

ws := "\t"

line ? {

tab(upto(ws))

tab(many(ws))

return tab(upto(ws))

}

end

The argument is a line of code, and the value returned is the op code. For example, you could use this procedure to write out all the op codes:

while line := read() do

write(opcode(line))

This is just one example of the endless possibilities of string scanning.

Structures

Except for the simplest file analysis and generation, string-processing tasks require structures to organize the data--lists of strings, symbol tables, and so on. Icon provides four kinds of built-in structures: records, lists, sets, and tables.

Records are similar to those in other programming languages, and I won't bother to describe them here. Lists in Icon can be used in two ways: as one-dimensional arrays subscripted by position, and as stacks and queues in which elements are added and removed at the ends.

Sets are collections of distinct values. You can add and remove members to and from sets and compute the union, intersection, and differences of sets.

Tables provide associative lookup; they are like lists, but they can be subscripted with any kind of value, not just integers. Table subscripts are called keys, and each key has a value associated with it. If a table is subscripted with a key that is not already in the table, a new table entry is added.

Structures themselves are data values. You can assign them to variables, pass them as arguments to procedures, and so on. All structures in Icon can be heterogeneous; that is, they can contain values of any type, and the same structure can contain values of different types. For example, a list can contain integers, strings, and even lists.

I'll continue with the example of extracting op codes from an assembly language file to illustrate how you can use some of Icon's structures. Writing out all the op codes as illustrated above might be useful in some situations, but it's usually more helpful to accumulate all the op codes and process them in some way. A list of op codes is a good place to start. It can be constructed as follows:

oplist := list(0)

while line := read() do

put(oplist, opcode(line))

The first line assigns an empty list--a list with no elements--to oplist. In the loop, instead of writing out the op codes produced by opcode(), you can append them to oplist using one of Icon's functions that treat a list as a queue. The final result is a list of all op codes in order of appearance. Note that it is not necessary to know in advance how many op codes there are: Lists grow in size automatically, and there's no li mit to their size except the amount of available memory.

You could use the list of op codes in many ways. To get a listing of the op codes, you could index through the list, writing each element:

every i := 1 to *oplist do

write(oplist[i])

The expression *oplist gives the number of elements in the list.

There's a better way to do this. The expression !X generates all the elements of the structure X. For lists, it generates them from beginning to end. To write all the elements of oplist, all you need is the following:

every write(!oplist)

Suppose, now, that you want to know which op codes are used in the program. The complete list almost certainly contains duplicates--probably many. Finding the distinct op codes is a job for a computer, not a person.

Icon's sets make this easy. By definition, a value can be a member of a set only once. A simple change to the code above is all that's needed for writing the distinct op codes:

opset := set()

while line := read() do

insert(opset, opcode(line))

every write(!opset)

The function set() creates an empty set. Insertion automatically checks for values already in the set and doesn't add them--something you'd have to program in most languages. Incidentally, Icon does this efficiently with dynamic hashing--not something you'd be likely to implement yourself.

Another thing needed to make the code really useful is sorting. All you need is an additional function in the last line:

every write(!sort(opset))

Having gotten this far, you can do one more thing that might be useful in the analysis of op-code usage: Count the number of occurrences of each one. Here is where Icon's table data type comes in:

optab := table(0)

while line := read() do

optab[opcode(line)] +:= 1

oplist := sort(optab, 3)

while write(get(oplist), " : ", get(oplist))

The first line assigns an empty table to optab. The 0 is not the size of the table, but rather the initial value used for all entries in it.

Each value that opcode() produces is used as a key to subscript the table. For example, if opcode() returns "mov", the subscripting expression is equivalent to

optab["mov"] +:= 1

which increments the entry for "mov" by 1. The first time optab is subscripted with this key, a new entry is created with an initial value of 0, increasing the table size. This new value is then incremented to 1.

The expression sort(optab, 3) sorts the table, producing a list in which there are two elements for each table entry: one for the key, and another for the value associated with the key. The value 3 sorts the table according to the key so that the op codes are in alphabetical order. The last line writes out the keys and their associated values with a separating colon. The function get() removes the first (i.e., left-most) element of the list--first a key and then its as sociated value. Note that write() has three arguments, which are written in order on a line.

A complete program to tabulate op codes is shown in the listing "Tabulating Op Codes." In addition to the improvements on string scanning mentioned earlier, the output is formatted in columns, as shown in the sample output.

Other Features

Icon has a large repertoire of functions. It's not just a language for processing strings and structures. You can do numerical computation if you want. It also has a number of sophisticated features, including an expression-level coroutine facility, that I don't have room to describe here.

Recently, contributors at the University of Arizona have added high-level facilities for graphics and window operations to Icon; these facilities are comparable in power to Icon's repertoire for processing strings and structures. But that's a whole other story.

Editor's note: The source code for OPCODES is available electronically. See page 5 for details.


Listing: Tabulating Opcodes



procedure main()
  # Table for the opcodes
  optab := table(0)
  # Process the input, counting the opcodes
  while line := read() do
    optab[opcode(line)] +:= 1
  oplist := sort(optab, 3)


  # Write out the results in columns
  write("Opcode tabulation:")
  write()
  while write(left(get(oplist), 6), right(get(oplist), 6))
end


procedure opcode(line)
  ws := "\t"                    # White-space characters
# Analyze the line
  line ? {
    if any(";") then fail        # Skip comment lines
    if tab(upto(ws)) &           # Find blank field
      tab(many(ws)) &            # Skip blank field
      code := tab(upto(ws) | 0)  # To blank or end of line
    then return code
    else fail
    }
end




Listing: Program Output



Opcode tabulation:


call        2
cmp         1
db          1
dd          3
dw          8
jnz         1
les         3
mov        22
pop         1
push        6

ret         2
xor         3




What's Available and Where

There is both an interpreter and an optimizing compiler for Icon. The interpreter gets into execution quickly and is best for program development. Interpreted code runs fast enough for most applications. A 32-bit C compiler can be used for applications where you need the fastest possible execution time. There is also a large library of programs and procedures that is an excellent resource for persons new to Icon as well as for experienced Icon programmers. The main documentation for Icon is The Icon Programming Language (Griswold and Griswold, Prentice-Hall, 1990, ISBN 0-13-447889-4). There is also a book on the implementation: The Implementation of the Icon Programming Language (Griswold and Griswold, Princeton University Press, 1986, ISBN 0-691-08431-9). Technical reports provide supplementary documentation. A free newsletter about Icon is published three times a year and is available from the Icon project. Implementations, the program library, and supplementary documentation are available by anonymous FTP to cs.arizona.edu; cd/icon and get READ.ME for navigation instructions. The Internet newsgroup comp.lang.icon provides a forum for discussion about Icon. BIX also has a conference on Icon, named icon. It maintains the current version of the most popular PC implementations of Icon and provides gateways to comp.lang.icon and the Arizona FTP site. To subscribe to the newsletter, order books, get program material on magnetic media, or just find out more about Icon, contact Icon Project Department of Computer Science The University of Arizona Tucson, AZ 85721 (602) 621-8448 fax: (602) 621-4246 icon-project@cs.arizona.edu
Ralph E. Griswold is a Regents' Professor in the computer science department at the University of Arizona. He specializes in programming language design and implementations. You can reach him on the Internet at ralph@cs.arizona .edu or on BIX c/o "editors."
Up to the Hands On section contentsGo to previous article: The Panose Typeface-Matching SystemGo to next article: IPX and NetBIOS for OS/2SearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network