Princeton tool tops dictionary


Steven Schultz

Princeton NJ -- For anyone who can't use a dictionary without being distracted by random words along the way, an encounter with WordNet could be a real problem.

 

WordNet, a database of 138,000 words developed at Princeton, is like no dictionary or even thesaurus. It gives definitions and exhaustive lists of synonyms, but also links words by a web of semantic relations.

Type in "car," for example, and WordNet can display all its "hypernyms" (a car is a kind of what?) -- motor vehicle, vehicle, conveyance, instrument, artifact, physical object, thing. Then ask for its "hyponyms" and find 22 kinds of car, from ambulance to hot rod to sedan. Looking for parts of a car? Ask WordNet for "meronyms" and it gives 140 parts and subparts from accelerator pedal to windshield wiper. And in a boon to Scrabble players, WordNet will find all 303 words containing the letters "car," from carabao to vicar.

The possibilities for distraction are endless, said George Miller, a professor of psychology emeritus who started WordNet 16 years ago. But far more than a toy, the project has become a valuable tool to researchers in the fields of computer science and information processing, in which the challenge is to allow computers to "understand" and analyze human language.

"WordNet does not solve that problem, but it presents language in a way the computer can deal with," said Miller.

Computer programmers are often faced with the problem that many words have more than one sense, he noted. The word "bank," for example, can mean a place to store money, a sloping piece of ground or a group of objects arranged in a row, such as a bank of telephones. WordNet places each of those senses in a distinct hierarchy of related concepts, making it easier to compare the context of a sentence to possible interpretations of the words.

"If you want to do anything with the Web where you need to have (the computer respond to) a large vocabulary, you need a lexicon," said Martha Palmer, an associate professor of computer science at the University of Pennsylvania. "Basically, WordNet is the most widely used."

While commercial dictionaries are available in database form for large licensing fees, WordNet is free and provides its unique web of relationships.

"The information processing community is very grateful for WordNet," said Palmer, who frequently uses it in her own work and has a current graduate student using it in a dissertation project.

Among many other areas, the interaction of computers and language is important in government intelligence gathering. Miller recently received a grant from a U.S. government organization called Advanced Research and Development Activity, which has a program in "information exploitation" that involves finding ways to retrieve information from large amounts of text. Miller is scheduled to give a presentation on WordNet this week at an ARDA workshop on "question answering" technologies.

WordNet also has begun to attract the attention of linguists from other countries, who will gather in January for the first international conference of the Global WordNet Association. Miller's collaborator, staff scientist Christiane Fellbaum, formed the association along with Dutch linguist Piek Vossen. The conference will take place in Mysore, India, where linguists are building WordNets for all 17 official languages of that country.

When Miller started WordNet in 1985, he saw it as an experimental device for answering questions in psychology about how the brain retains and organizes the human "dictionary in your head."

To demonstrate the problem, Miller cites a fragment from a Robert Frost poem: "but I have promises to keep and miles to go before I sleep." He found these 13 words have an average of nine senses each, so there are 3 trillion possible combinations of meanings. Interpreting the words is like entering a maze with 13 intersections, each with nine possible turns.

"Yet we run that maze without even realizing there are alternatives, and get it right -- every time," he said. "How do we do it?"

Eager to tackle such problems, Miller and a few others began typing. "I took the nouns, Christiane took the verbs and my wife took the adjectives," said Miller. With initial funding from the Office of Naval Research, the National Science Foundation and other sources, the project grew rapidly, doubling in size from 60,000 entries in 1992 to 120,000 in 1995. For the last 10 years, the researchers have received computer-programming help from technical staff member Randee Tengi.

The project, however, has shifted from its beginnings in psychological research. Miller noted that many of the theories about human language processing that spurred the start of WordNet have now been discarded.

"There is very little use (of WordNet) by psychologists," Miller said, noting that interest among linguists and computer scientists picked up as interest by psychologists waned. "This has gone in a direction dictated by practical need."

The change has not dampened Miller's enthusiasm. A glance around his office reveals the depth of his fascination with the project. Dictionaries are everywhere. His favorites are in a stack half a dozen high next to his computer, while others line the shelves. His all-around favorite is the American Heritage Dictionary; the most comprehensive is Webster's Third New International Dictionary, unabridged. Then there's the Oxford English Dictionary online and medical, legal, business and many other specialty dictionaries.

"I never look a word up in just one dictionary," he said. "I don't trust one that much. I look it up in several and then I begin to get a sense of what it means."

Miller noted that if he were starting over again, he would build in several more features, such as distinguishing between the defining parts of an object (wheels on a car) and the non-essential parts (a windshield wiper).

"These are tips for the next time someone tries something crazy like this," he said.

WordNet is available for free. It can be downloaded in versions for Unix, PC and Mac from <http://www.cogsci.princeton.edu/~wn> where it also can be accessed directly, in limited form, through a Web interface.
 

top


December 3, 2001
Vol. 91, No. 11
previous   archives   next

Contents

In the news
Memorial service set for Sept. 11 victims
Friend Center intended as a tribute and a crossroad
Faculty
Princeton tool tops dictionary
Lewis: Strong sense of history compels Muslims
Linguist's goldmine
Inside
Biotech pioneer, New Yorker editor honored
Clark finds volunteer work for local Red Cross rewarding
Campus UW drive continues
People
Miller named to head Alumni Council; Taylor to remain on staff part-time
Seniors chosen for Marshall awards
Spotlight, Brief
Sections
• By the numbers:
Campus building
Nassau Notes
Calendar of events


The Bulletin is published weekly during the academic year, except during University breaks and exam weeks, by the Office of Communications. Second class postage paid at Princeton. Permission is given to adapt, reprint or excerpt material from the Bulletin for use in other media.

Deadline. In general, the copy deadline for each issue is the Friday 10 days in advance of the Monday cover date. The deadline for the Bulletin that covers Jan. 14&endash;Feb. 3 is Friday, Jan. 4. A complete publication schedule is available at <deadlines> or by calling (609) 258-3601.

Subscriptions. The Bulletin is distributed free to faculty, staff and students. Others may subscribe to the Bulletin for $28 for the academic year (half price for current Princeton parents and people over 65). Send a check to Office of Communications, Stanhope Hall, Princeton University, Princeton, NJ 08544.

Editor: Ruth Stevens
Calendar editor: Carolyn Geller
Staff writers: Jennifer Greenstein Altmann, Steven Schultz
Contributing writers: Marilyn Marks, Evelyn Tu
Photographer: Denise Applewhite
Design: Mahlon Lovett, Laurel Masten Cantor
Web edition: Mahlon Lovett