A hapax legomenon (pronounced /ˈhæpɨks lɨˈɡɒmənɒn/ or /ˈheɪpæks/) (pl. hapax legomena, sometimes abbreviated to hapaxes) is a word which occurs only once in either the written record of a language, the works of an author, or in a single text. While technically incorrect, the term is also sometimes used of a word that occurs in only one of an author's works, even though it occurs more than once in that work. Hapax legomenon is a direct transliteration from the Greek form "ἅπαξ (λεγόμενον)", meaning "(something) said (only) once".
The related terms dis legomenon, tris legomenon, and tetrakis legomenon refer respectively to double, triple, or quadruple occurrences, but are far less commonly used.
Hapax legomena are quite common, as predicted by Zipf's Law, which states that the frequency of any word in a work or corpus is inversely proportional to its rank in the frequency table. For large corpora, about 40% to 60% of the words occurring are hapax legomena, and another 10% to 15% are dis legomena. Thus, in the Brown Corpus of American English, about half of the 50,000 words are hapax legomena within that corpus.
Note that the term hapax legomenon refers to a word's appearance in a body of text, not to its origins, nor to its prevalence in speech. It thus differs from a nonce word, which may never be recorded, or may find currency and be recorded widely, or may appear several times in the work which coins it, and so on.
Hapax legomena in texts pose difficulties in translation and decipherment, particularly when the words in question are used only once in the entire record of an ancient language. Inferring meaning from context is easier and more certain when there are multiple contexts to compare. For example, many of the remaining undeciphered Mayan glyphs are hapax legomena, and Biblical hapax legomena (particularly in Hebrew) pose sometimes difficult issues in translation. Hapax legomena also pose challenges in natural language processing.
Full article ▸