Background and History of the Project (1985-1994)
Next
As part of the Princeton-IBM Project Pegasus in the 1980's, an electronic archive of 2300 Cairo-Geniza transcriptions was created. IBM Corporation donated six PC-XT's, a 3812 Laser Printer and printer software to the project. The Department of Near Eastern Studies funded the keyboarding. The total volume of texts transcribed is close to 10 MB.The texts were input with a version of Kedit for DOS that had been configured to write Hebrew alphabet letters from right to left. Dr. Michael Sperberg-McQueen, then computer consultant to the humanities at Princeton, adapted Mansfield Software's Kedit for DOS for Hebrew and Arabic letters. This product was called Kedit/Semitic and served as a limited right-to-left wordprocessor for MS-DOS vintage 1985. It was demonstrated at IBM higher education conferences.
In Kedit/Semitic the cursor starts at the right side of the screen and moves one space to the left after a Hebrew or Arabic letter is typed; similarly, pressing the space-bar moves the cursor one space to the left. Pressing the "Enter" key returns the cursor to the right side of the screen. While this system would be awkward to use as a wordprocessor for Hebrew since it lacks a function to wrap Hebrew text correctly when the typed text reaches the left margin, this system is quite adequate for the line-by-line transcriptions that do not require continuous text wrap. The system also has a left-to-right mode in which it is a full function wordprocessor still widely used today for it power and simplicity.
Hebrew and Arabic letters are displayed on the screen and printed using the Duke Language Toolkit.
The texts come from a collection of roughly 15,000 Geniza texts (out of the total 200,000+) which deal with the daily life of the Jewish community in Cairo and those of other places in the Mediterranean, mainly in the 11th to 13th centuries. These so-called "Geniza documents," range in size from a few words to long letters of 80-100 lines.
This collection had been the focus of the scholarship of S. D. Goitein (1900-1985) at the Hebrew University in Jerusalem and from 1957 at the University of Pennsylvania and after his retirement in 1971 at the Princeton Institute for Advanced Study. The multi-volume study, A Mediterranean Society: the Jewish communities of the Arab world as portrayed in the Documents of the Cairo Geniza 6 vols. (Berkeley: University of California Press, 1967-1993) is the standard work on what has become known as the "documentary Geniza," to distinguish the sub-set of these 15,000 fragments from others that contain religious or other literary material such as Bibles, rabbinic texts, liturgy, poetry, mysticism, religious philosophy, magical texts, and even fragments from books of Islamic literature.
The term "document" and hence "documentary Geniza" has a technical function in Geniza studies that is obscured by the common use of the term document to mean any electronic text file. In Geniza studies "document" means any text, a self-contained leaf or a leaf from a notebook, that originates in daily, routine activity (letters, legal documents, business accounts, marriage contracts, divorce documents and lists of all kinds) rather than being pages from a copied manuscript on a literary subject.
In 1985, under the supervision of the project director, Near Eastern Studies Professor Mark R. Cohen, keyboarders began computerizing "documents" in Judaeo-Arabic (Arabic written in Hebrew characters) and Hebrew from the Cairo Geniza. The goal was to create a free-text data-base of these texts that could be searched electronically for information on the economic and social history of Jews and Muslims in the medieval Mediterranean, as well as on the history of the Arabic language.
Over ten years, working at varying paces, keyboarders covered (1) all the major book-length publications of Geniza documents (many poor editions being corrected by comparison with photocopies of the manuscripts kept in the department's "S. D. Goitein Laboratory for Geniza Research"); (2) most of the documents published by S. D. Goitein in article form, incorporating corrections made by him in his personal offprints; (3) documents deciphered or re-edited by scholars M. Cohen and A. L. Udovitch of the Near Eastern Studies Department; (4) many documents "edited," that is, typed by Professor Goitein, but not published.
The total of about 2300 individual texts that has thus far been completed represents a "provisional corpus." Necessarily it encompasses mostly longer documents, of the type normally published in the course of research. Thus in terms of actual bytes of data the total comprises rather more than 15 percent of the target figure of 15,000 fragments (roughly estimated to amount to between 40 and 50 Megabytes when completed). The remainder consists of the most challenging texts, for they are the ones requiring original decipherment by skilled students of the language of the Geniza. However, it should be noted that the provisional corpus alone will immeasurably facilitate this task by providing decipherers quick access to words and phrases in the "Geniza vocabulary." Furthermore, the provisional corpus provides a solid, if far from exhaustive, basis on which significant research can already be done.
We would be remiss if we did not mention the transcribers without whose dedication and spirit this archive would not have been possible. Each document bears the initials of the transcriber and the date the document was last updated.
- Catherine Beaumont - C.B
- Daniel Beaumont - D.B.
- Olivia Remie Constable - O.C
- Zafrira Friedmann - Z.F.
- Sumaiya Hamdani - S.H.
- Jane Hathaway - J.H.
- Naomi Heger - N.H.
- Yael Steinberg - Y.S.
- Christopher Taylor - C.T.
- Nurit Tsafrir - N.T.
Although funding for active transcription work in the project from Pegasus ended in 1990, the archive has been maintained and augmented through the scholarship of Prof. Cohen, Prof. Udovitch and their students.
To make the archive keyword searchable was one of the top priorities of Prof. Cohen and his colleagues. Believing that no one can anticipate all possible items of information of interest to scholars, now and in the future, the project designers felt from the beginning that it was essential to use an "all-words-as-keywords" approach. The main problem during the past decade has been to find an effective "search engine" to retrieve information written in right-to-left font.
Prof. Cohen has built a searchable prototype using a small subset of the documents dealing with poverty and assistance to the poor using Nota Bene's Hebrew function and indexing option on a Toshiba Laptop. While this system works well with a small set of files, it would be impractical to index the entire corpus using Nota Bene. This system has been demonstrated at international Geniza conferences by Prof. Cohen and is used by him in a seminar on the subject of poverty and charity. It serves as an important tool in his work with one aspect of the communal life of medieval Mediterranean Jewish society.