Word sense disambiguation

related topics
{language, word, form}
{theory, work, human}
{math, number, function}
{rate, high, increase}
{system, computer, user}
{album, band, music}
{specie, animal, plant}
{food, make, wine}
{water, park, boat}
{ship, engine, design}

In computational linguistics, word sense disambiguation (WSD) is an open problem of natural language processing, which governs the process of identifying which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings (polysemy). The solution to this problem impacts other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, inference and others.

Research has progressed steadily to the point where WSD systems achieve sufficiently high levels of accuracy on a variety of word types and ambiguities. A rich variety of techniques have been researched, from dictionary-based methods that use the knowledge encoded in lexical resources, to supervised machine learning methods in which a classifier is trained for each distinct word on a corpus of manually sense-annotated examples, to completely unsupervised methods that cluster occurrences of words, thereby inducing word senses. Among these, supervised learning approaches have been the most successful algorithms to date.

Current accuracy is difficult to state without a host of caveats. In English, accuracy at the coarse-grained (homograph) level is routinely above 90%, with some methods on particular homographs achieving over 96%. On finer-grained sense distinctions, top accuracies from 59.1% to 69.0% have been reported in recent evaluation exercises (SemEval-2007, Senseval-2), where the baseline accuracy of the simplest possible algorithm of always choosing the most frequent sense was 51.4% and 57%, respectively.

Contents

Full article ▸

related documents
E-Prime
Universal grammar
Metonymy
Periodization
Question mark
Glottochronology
Optimality theory
Proto-Indo-Europeans
Japanese numerals
Interpunct
Figure of speech
Pittsburgh English
General American
Schwa
Plural
Austro-Bavarian
Autocatalytic set
Expletive
Suppletion
Cedilla
Northern dynasties
Elision
Bislama
Philology
Luganda language
Turan
Romani language
Defamiliarization
Locative case
Southern dynasties