Project Materials

This page contains an inventory of the materials associated with the Princeton Charrette Project (PCP), and which collectively comprise the snapshot of the project as of October 2003. Each item is described by a relevant subset of the Dublin Core standard of metadata categories. Also, where appropriate, the items below contain links to downloadable files.

  • Manuscript Images
  • Manuscript Transcriptions
  • Majuscule Data
  • The Foulet-Uitti Edition
  • Lexical and Grammatical Data
  • Rhetorico-Poetic Data
  • The Figura Database
  • The FAUX Charrette
  • The Traditional Charrette
  • Figura TG
  • Manuscript Images

    Description: The PCP hosts a collection of 341 images taken of the manuscript folio sides (i.e. pages) used by Professors Alfred Foulet and Karl D. Uitti in creating their critical edition of the Charrette poem. Also provided are a set of thumbnail images, 100 pixels in height, for convenience in creating display applications. In addition to copyright information, each image contains an overlay of line numbers, given every five lines. These refer to the corresponding line number in the critical edition.

    Creator: Scanning, conversion and overlay of line numbers begun by Peter Shoemaker and Gina Greco and then completed by students supervised by Karl D. Uitti.

    Source: Each image was converted from a TIFF image, in turn generated by scanning a 35mm photograph, each taken on-site at the library where the manuscript is housed. The original TIFFs are not available.

    Format: Each image is a JPEG file, 1025 x 1536 pixels, with a bit depth of 24. The actual DPI values of the images are unknown, as the size of the original page is needed to calculate this.

    Title: The filename for each image has been changed to a standard format for referencing manuscript pages. For example, C-28-r.jpg refers to the recto side for folio 28 from Manuscript C. This naming convention is used throughout the PCP when manuscript pages are referenced, allowing easily generated links between page references and their images, either by hand or by program. The filename extension has been normalized to the more common "jpg."

    Rights: The copyright information for each image is inscribed as paratext on each image. The rights to display these images from Princeton's web site were obtained by Professor Karl D. Uitti from each of the owning institutions, and these rights of use have been vouched for the Princeton University.


    Manuscript Transcriptions

    Description: A collection of XML documents containing the diplomatic transcriptions for each of the folios in the primary data set. Particular attention has been paid to representing punctuation and other glyphic features, as many of these are considered unique to the manuscript tradition. These features have been classified according to an encoding scheme that is used in the attributes of elements that begin with the string char_. This content model for this scheme is represented in the DTD file, which itself references various catalog files.

    Creator: The final digital format of the transcription files for the snapshot edition of the PCP (see Format below) was created by Rafael Alvarado through a simple transformation of the XML files created by Alexei Lavrentiev (see "Format" below for more information about the new format).

    Source: The scholarly content and digital encoding of these documents are the result of the work of numerous cohorts of graduate students from Princeton's Department of Romance Languages and Literatures, initially trained and led by Gina L. Greco. The first generation of transcription protocol was developed, in accord with Karl D. Uitti's principles, by Greco and Peter Shoemaker with important contributions from David Wrisley. Wrisley and Amy Ogden were significant contributors to the second generation of transcription work. Alexei Lavrentiev initiated the third generation of transcriptions, including the XML conversion, assisted and proofed by Matthieu Boyd, Katherine Brown, Peter Eubanks, K. Sarah-Jane Murray, Maud Pérez-Simon and Sinda Vanderpool.

    Format: The basic format of the files is a modified version of TEI Lite (P4), created by Rafael Alvarado, and adapted from the SGML used by Peter Shoemaker, et al. (Note that is is not TEI-compliant per se, as it adds a number of elements and attributes to the DTD.) Alexei Lavrentiev significantly extended this DTD with a set of elements to describe punctuation and other glyphic content, adapting an earlier SGML-based scheme which encoded this information within a set of entity names. Detailed information about Lavrentriev's rationale for encoding can be found in the following two documents:

    The principle change introduced by Alvarado for the snapshot edition of these files has been to associate each manuscript line with its physical position on the manuscript page, as opposed to its reference to critical edition line number. This eliminates the need to include empty line numbers in transcriptions to represent critical edition lines not found in the source manuscript. References to critical edition lines remain as "key" attributes in the line elements , e.g. <l key="FU-32" >. . .</l>. In addition, naming conventions for manuscript pages and other items have been normalized. (Note: the effect of viewing synoptically the relationship between critical edition lines and their associated manuscript pages is now accomplished with the "FAUX Charrette," which is described in detail below.)

    The data are represented as one manuscript per file.

    Title: Each file is named like so, MS-A.tei.xml, where the capital letter following MS- refers to the manuscript code.


    The Figura Database

    Description: "Figura" refers to the relational (SQL) database that was produced for the Charrette project in order to solve the problem of collaboratively adding non-nesting content to the critical edition, viz. the rhetorical and poetic data. In addition, the database contains the Foulet-Uitti critical edition itself, the lexico-grammatic data for the critical edition, and the majuscule data for the manuscript collection. The database was created to serve as the back-end for a series of web-based applications by the same name.

    Creator: The data model was conceived of, and the DDL SQL was written by, Rafael Alvarado.

    Source: For information about the sources for the data in Figura, see the relevant items listed below.

    Format: The tables are encoded in MySQL, version 4.0. The fonts are encoded in ISO-8859-1.


    The Foulet-Uitti Critical Edition

    Description: An electronic edition of the Foulet-Uitti critical edition of Chrétien de Troyes’ Le Chevalier de la Charrette. The text file which was parsed for input into the Figura database.

    Creators: Rafael Alvarado parsed the file for input into Figura as a table of "words" (see Format below). Before this, it was converted into various formats by Toby Paff and Peter Shoemaker from a set of WordPerfect files prepared by Karl D. Uitti.

    Source: Karl D. Uitti and Alfred Foulet prepared an edition of the Charrette poem for Classiques Garnier. Professor Karl D. Uitti reserved electronic rights to the poem when it was published by Bordas. He made slight revisions to the published edition for the electronic version.

    Format: The format of the file in which it was received just prior to input into Figura was plain text, in the ISO-8859-1 character set. Within the database, the text is stored as a table of words, or tokens, split by the Perlish regular expression /\b/, which stands for a word boundary as understood by the C Programming language. Words are associated with their following punctuation marks. Thus, contractions are stored as two "words" in Figura.

    Rights: The electronic edition is copyrighted to Karl D. Uitti and Princeton University. When Professor Uitti published an edition of the poem in the Classiques Garnier collection, he reserved the rights to an electronic form. The version served here is slightly different from the print edition.

    Rhetorico-Poetic Data

    Description: These data — known collectively as "figures" — describe the presence of adnominatio, chiasmus, enjambment, oratio obliqua, oratio recta, and rich rhyme in the critical edition

    Creators: The data model and specific table design were conceived of by Rafael Alvarado, in consultation with those who contributed to the source data (see next item). Adéle Auxier, Matthieu Boyd, Katherine Brown, Emma Goodwin, and O'Brien contributed to proofing and data-entry.

    Source: Primary data collection proceeded, with active input from Karl D. Uitti, in subcommittees led by Princeton University doctoral students: Deborah Thalheimer Long (adnominatio), Jessica Vitz McGibbon and Julia Zarankin (enjambement), K. Sarah-Jane Murray (oratio), Ellen Thorington (rich rhyme), Catherine Witt (chiasmus). Subcommittees were coordinated by general editor Sarah-Jane Murray.

    Format: Figures are classified by a genus table, a species table, a token table, and a segment table. A figure token instantiates a genus and species, and is comprised of one or more segments. A segment is a set of contiguous words (defined in terms of the span of word positions). Segments (and series of segments) may be named, as in the case of chiasmus (e.g. A1,B1,B2,A2). The two oratio rhetorical figures are further qualified by their voice; Figura contains an "agent" table that stores the names of all the characters who speak in the poem, including the narrator.

    The mapping of figures onto sets of words within a database, rather than a pure mark-up approach that would employ container elements in a file of the critical edition itself, follows the pattern of "stand-off" or "just-in-time" mark-up, which finds theoretical justification in the concept of "annotation graphs" described by Stephen Bird.


    Lexical and Grammatical Data

    Description: The lexical and grammatical data define the "dictionary forms" and grammatical features of each and every word found in the criticical edition. Grammatical features include such categories as part of speech, tense, mood, number, etc.

    Creators: Molly Robinson Kelly, Toby Paff, Peter Shoemaker, Karl D. Uitti and David Wrisley were responsible for the original conception and modeling of the database. Rafael Alvarado implementated the final version of the database (a single table) in Figura.

    Source: As Associate Editor, Kelly trained assistants and supervised the collection of data through 2002; Juliet O'Brien served as coordinator from 2002-2003. Visiting Researchers and Fellows contributed significantly to the database: Emma Goodwin (Visiting Researcher), Alexei Lavrentiev (Visiting Research Fellow and Associate Editor), Maud Pérez-Simon (Visiting Research Fellow and Associate Editor).

    Format: The data are stored in a single table within the database, which was imported directly from the Access table created by Ms. Kellly.


    Majuscule Data

    Description: These data describe the location and features of the large, decorated letters found throughout the manuscript tradition. For example, this majuscule, of the letter "A," is found in Ms. A:

    Creator: The orginal Access database, of which the Figura version is merely a copy, was created by Sarah-Jane Murray. Sinda Vanderpool was responsible for adding new data to the Access database table. It was imported without significant structural modification into Figura by Rafael Alvarado.

    Source: The majuscule data has two primary sources. First, it builds on the transcription work described above; Sarah-Jane Murray extracted the data from the transcription files and put these in an Access table. Second, after noting various gaps and discrepancies, Murray added new fields and data to the table, including information about size and ligatures. Sinda Vanderpool was enlisted to validate the database by systematically comparing it with the manuscript images, and updating or extending the data whenever necessary. In the process, Vanderpool introduced several modifications to the data that were crucial to the success of the project.

    Format: The data are stored as a single table, with 11 fields devoted to describing the glyphic features of the majuscule, such as height and color.


    The FAUX Charrette

    Description: The FAUX Charrette is an exported version of the entire contents of the Figura database into XML form. "FAUX" stands for Figura-as-unitary-XML. The value of a file in this format is that it can be used independently of a database management system, such as the MySQL server that hosts the SQL data, and can be read by anyone with a text editor—and enough memory on their computer. FAUX shines, however, when used with more advanced XML tools such as XPath, XSLT and XQuery. With these, one may query the text for complex searches—such as, "find all figures of a certain genus in which a the character Lancelot is named"—with relative ease. Users should consider using a native XML database, such as eXist or MarkLogic to reap the full benefits of FAUX.

    Creator: FAUX was created by Rafael Alvarado via a Perl script.

    Source: The source of FAUX is both the Figura database and the source of the Figura database. That is, it is a transformation of both the data model and the data itself found in Figura.

    Format: FAUX is formatted using a set of intuitive XML tagnames that follow from the logic of the source database. The hierarchy of data described by the implicit shema is roughly as follows:

    • Episode
      • Line
        • Word
          • Grammatical data
          • String
          • Punctuation
          • Encompassing figures, if present
        • Manuscript lines, if present
          • Line content
          • Majuscule data, if present


    The Traditional Charrette

    Description: The "Traditional Charrette" refers to the original web site that was created to display the results of the Charrette Project, and which the current site supplants. At its core are a set of pages that each represent a span of transcribed lines for a given manuscript, with each line labelled according to its critical edition number. Each page in turn contains links to manuscript images and to other manuscript pages with corresponding lines. In addition to these pages, the site contains a synoptic menu page with links to each page, and a key to the conventions used to encode the variety of glyphic features found in the manuscripts. The current web retains the core of the previous site for both historical and functional purposes, as the latter presents the materials in a useful and intuitive manner.

    Creator: The earliest version of the site was created by Peter Shoemaker in the 1990s and the updated and redesigned by Sarah-Jane Murray in 1999. The site has been modified slightly by Rafael Alvarado for hosting on the current web site.

    Source: The sources of the transcriptions and the images are essentially those described for the same materials listed above, with the exception that the transcriptions in the site do not reflect the significant changes introduced by Alexei Lavrentiev.

    Format: The format of the site is HTML with SGML entities represented literally. The format of the images is described above. The format of the transcriptions are based on the TEI versions described above, where the lines are identified by their corresponding critical edition line and the glyphic features are represented as SGML entities.

    Title: The site is known informally as the "Traditional Charrette" and sometimes the "Original Charrette."

    Identifier: The site is hosted at the following URL:


    Figura TG

    Description: Figura TG is a web-accessible front-end to the Figura database described above. Although it provides dynamic and interactive access to all of the data associated with the Princeton Charrette Project, its distinctive feature is its technique of visually representating the rhetorico-poetic data. Figura TG also allows the user to exploit the transitive relations that exist in the database so that the user may move freely from manuscript page image to text line to figure and vice versa. Figura TG provides four points of entry in the Charrette database: Pages, Figures, Words and Text. The "Pages" entree provides access to the manuscript pages; "Figures" leads to the catalog of rhetorical figures; "Words" leads to an index of words in the text organized by their "dictionary form," or "dform"; and "Text" leads to the text itself in 50 line intervals. Each point of entree produces a list of results with links that allow the user to drill down in different directions, depending upon context.

    Creator: Figura TG was conceived, designed and coded by Rafael Alvarado.

    Source: The source materials that comprise the content served by Figura TG are those of the Figura database, described above.

    Format: Figura TG was originally written in Perl 5, using ModPerl and EmbedPerl, and Oracle 8. It was subsequently converted PHP 4 and MySQL 4. The site makes extensive use of CSS and DHTML.

    Title: The expression "Figura TG" refers to the database-driven web site described here. The term "Figura" has been used to describe a suite of interactive web applications used to support to the Charrette Project, and to the database itself that resulted from the process of collaborative data entry and data integration.

    Identifier: Figura TG can be accessed at the following URL:


