Skip over navigation

Syntax in Aleksandr Pushkin’s Eugene Onegin:
A Statistical and Interpretive Approach

Adviser: Michael A. Wachtel

Andrew T. Davis

Slavic Languages and Literatures

“Striking out into new territory might have proved stressful at first, but it gave me the opportunity to take ownership of the ideas I was advancing.”


Some would say that my thesis experience was fairly uncommon. I have no thesis horror stories—I slept regularly, likely wasted more time than I should have, and still managed to complete writing an interesting thesis. My thesis, itself, was somewhat of an oddity; I produced an original discussion of Russian poetry using the same statistical software as many economics concentrators. In retrospect, however, I feel that my thesis experience reflected some of the best qualities of the thesis process, which allowed me to write something that was a product of my unique perspective on my field.

I stumbled upon my thesis entirely by chance. As a junior, I was faced with a somewhat unique challenge; as part of the applied and computational mathematics certificate program, I had to incorporate applied mathematics into a piece of independent work for my department; as a Slavic major, this entailed finding a topic that was at the intersection between Russian literature and mathematics. Upon hearing of this requirement, I doubted such a topic existed. Yet somehow, two days later I came across such a topic. I mentioned my problem to Michael Gordin, a professor in the history department, who suggested that I investigate some of the prominent mathematicians of early-20th-century Russia. At this time, there was a significant amount of crossover between disciplines, with poets, mathematicians, and theologians influencing one another. I set out to choose a mathematician to investigate, settling on Andrei Kolmogorov, an important figure in probability theory. Looking through a complete bibliography of his work, I came across a list of articles in something called “mathematical versification.” Though I had little sense of what mathematical versification was at the time, it proved to be the intellectual focus of my final two years at Princeton.

Mathematical versification is a branch of scholarship in Russian poetry that uses statistics and probability theory to analyze the structure of poetic form—its rhyme, rhythm, and syntax. Traditional work in this discipline places the most emphasis on the role that poetic rhythm and meter play. Meter denotes where stress is expected to fall on the line; for example, the strong positions in a line of iambic tetrameter are the second, fourth, sixth, and eighth syllables of the line. Rhythm, in contrast, is where stress actually falls in a given line of verse; in the Russian tradition, stress can fall on any strong position but must fall on only the last, whereas stress can never fall on a weak position. As a result, for a given meter of Russian poetry, there are a discrete number of allowable rhythms that the poet can utilize. Mathematical versification investigates both the “typical” rhythm of a particular meter and notable rhythmic patterns in the work. The ultimate aim of this research is to gain insight into the relationship between form and meaning.

Kolmogorov’s work in versification was the foundation for my first piece of independent work at Princeton, a junior paper (JP) I wrote with Professor Michael Wachtel as my adviser. I was so fortunate to have had the opportunity to work with Professor Wachtel during my time at Princeton; he is not only incredibly knowledgeable about poetry, but also incredibly well versed in the study of versification. Understanding Kolmogorov’s work was certainly difficult at times; I was forced to do research in a language that was still very foreign to me, and many of Kolmogorov’s ideas were applications of statistics that, at first glance, were highly unorthodox. Still, the JP allowed me to see how one very intelligent researcher approached the problem of mixing poetic analysis and mathematics, and with the constant support of Professor Wachtel, I acquired not only an understanding of how Kolmogorov viewed poetry, but my own understanding of how poetry functions and how one should analyze form in the context of the work.

Thus, I based my thesis on an almost ideal foundation—a command of versification developed by an amazing adviser and a JP in the discipline. On one hand, I needed only to choose a thesis topic in which I could make an original contribution to the literature, hopefully contributing to a broader discussion by applying my mathematical background; at the same time, the topic could not be so broad that I would not be able to do it justice in only a year. I finally decided upon my thesis topic when my adviser proposed attacking an open question in the literature on one of the most significant works in the Russian tradition, Aleksandr Pushkin’s Eugene Onegin. In short, the work portrays a young Russian dandy who, upon inheriting his uncle’s landed estate, moves to the country; the plot revolves around his friendship with a young poet and nobleman, Vladimir Lensky, and his acquaintance with a young woman, Tatiana. What particularly interested me about Eugene Onegin, however, was its unique form; called a “novel in verse” by Pushkin, the entire work is composed of more than 350 14-line stanzas, each with the same rhyme scheme. The aim of my thesis was to look at how the syntax of this stanza—that is, how each 14-line stanza was divided into syntactic units—varied across all of Eugene Onegin, and posit the role that this syntactic variability played in the semantics of the work. In this sense, there were two clear questions that my thesis set out to address. First of all, what should the syntax of an Onegin stanza look like? And second of all, how can we better understand Eugene Onegin through its syntax?

To my mind, the first question was essentially statistical—I needed only a way to represent the syntax of a stanza quantitatively, then I could use basic statistics to get a sense of the “average” stanza. This procedure stood in contrast with much of the other literature on the subject, which asserted a type of ideal structure without any rigorous justification of this fact. Representing the syntax of a stanza in a spreadsheet seemed difficult at first, but I simply decided to adapt the methodology of a previous researcher, essentially doing in statistical software what he had originally done by hand. The second half of the thesis, however, was more ominous; because I did not have any clear sense of what the syntax of the stanza would look like, there was no way of predicting what the bulk of my thesis would look like.

This made the thesis somewhat dangerous; I was approaching an issue that was relatively untouched in the literature. My particular statistical methodology was completely new; the second half of my thesis struck out into completely new territory, and it was based largely on the portrait of the stanza’s structure that I developed in the first half of my thesis. I was, without question, taking a risk, doing a great deal of work on my own. At the same time, the notoriety of Onegin made the thesis particularly nerve-wracking, as I was attempting to put together original research on work that had been endlessly discussed to that point. In a sense, this thesis topic was akin to attempting to produce an original interpretation of Hamlet—everything imaginable had already been written on the subject, and if an approach had not been taken, it was likely flawed. Thus, even though I had a good sense of the literature on the subject, there was the nagging feeling that someone must have addressed this issue, if not exactly in the same way I planned to. Moreover, if no one had taken this perspective, it ran the risk of just being incorrect.

Neither of these fears proved to be legitimate. Striking out into new territory might have proved stressful at first, but it gave me the opportunity to take ownership of the ideas I was advancing. Just as the thesis itself somehow reflected my background, I identified with its reasoning and conclusions. As such, I invested a great deal of myself in my thesis, and as a result, I enjoyed the process much more than I might have imagined my freshman year. At the same time, my adviser was an incredible resource during the entire process. Professor Wachtel not only has a tremendous knowledge of Russian poetry, but he was consistently available throughout the process to discuss ideas and read drafts. Generally, advisers can be a resource that makes the thesis process much more manageable.

At the same time, it was a blessing to have a bulk of my thesis work be reading Eugene Onegin—as one of these most significant works in the Russian language, it felt like an accomplishment to read the work as a non-native speaker. Moreover, the work is simply beautiful, and I suspect there will be few times in my life when sitting down to read 30 pages of poetry can be considered an afternoon’s worth of work. At the same time, it was helpful to keep in mind that the end result of the process—a thesis of almost one hundred pages—was not the only aim of the process. Regardless of whether my thesis had any scholarly value, the thesis required me to think and work independently more than at any point in my Princeton career. One of the realizations of the process is that these skills are ends in themselves.

What ultimately surprised me most about the thesis process was how easily it turned out. I was able to put together a thesis that was recognized as a critical addition to the scholarly literature, both in developing a statistical model of the Onegin stanza and at least starting a discussion on the role that syntax plays in the work. Moreover, it did not take away all of my free time, and my friendships did not suffer. Really, it’s nothing to be afraid of—with a bit of luck and a good amount of work, the thesis proved not only to be surmountable, but an unforgettable part of my time at Princeton.

Syntax in Aleksandr Pushkin’s Eugene Onegin:
A Statistical and Interpretive Approach

Andrew T. Davis

Michael A. Wachtel

Professor of Slavic Languages and Literatures

“Andrew showed how statistics and poetics can be mutually illuminating.”

What I like most about thesis advising is that students are forced to take on broad topics that couldn’t be covered elsewhere. The greatest reward of advising theses is when a student comes up with something that you yourself had not known and could not have figured out on your own.

In my particular specialization (Russian poetry), it’s rare to find a student with sufficient linguistic skill to take on a really ambitious topic. Though Andrew started Russian late (the summer after his sophomore year), he made incredible progress. Moreover, he was intellectually prepared for advanced work. Part of this was natural inclination (he has an enviable combination of curiosity, fearlessness, and industriousness); part of it was preparation, as he had done his junior independent work with me and taken my graduate seminar on poetics.

Before deciding on Slavic, Andrew intended to major in math. As it happens, Russia has a rich tradition of mathematical study of metrics. This was begun by the great Symbolist poet and prose writer Andrei Belyi, and it continued in the Soviet period under Andrei Kolmogorov, one of the most brilliant mathematicians of the 20th century, who applied probability theory to questions of verse. In the United States, there are few scholars who work on this subject, but in Russia it continues to thrive.

Andrew took a problem that has been haunting the field of Russian poetry for at least a century: the function of the “Onegin stanza,” a 14-line verse form devised by Aleksandr Pushkin for his “novel in verse” Eugene Onegin, the cornerstone of the Russian literary tradition. The basic question always has been how and why it works so well.

Andrew read through the fundamental works on this subject, written by the most formidable scholars of Russian poetry. (These essays are all written in Russian, so this in itself was no easy task for an undergraduate.) He then used his computer skills to subject Pushkin’s novel to a rigorous syntactic analysis. The main question could be summarized as follows: To what extent does the fixed rhyme scheme determine the syntax? The sonnet, for example, has a very firm tripartite structure, which breaks down according to the rhyme scheme. The Onegin stanza, as Andrew proved, does not. Once one recognizes this freedom (and many scholars have not), one begins to understand the creative possibilities that this affords the poet. This is what Andrew explored in the thesis.

Andrew’s discoveries confirmed some of what was known, but he was capable of extending our understanding of the form. Most significantly, he went beyond numbers, showing how the statistics help us interpret the novel. In short, Andrew showed how statistics and poetics can be mutually illuminating.

I myself had worked on the Onegin stanza, and I was the one who suggested to Andrew that he take this on in his thesis. (In early September, he was planning to study Andrei Belyi’s poetry.) I was delighted to see how he took to the subject. Not only did I learn from his work; I’m convinced that others will benefit as well, and I have encouraged him to submit parts of it for publication.

As far as advice for future seniors: The key is, of course, to start early. Though Andrew only decided on this topic in the fall, he went at it with complete energy from the beginning. He met all departmental deadlines for chapters; in fact, there was no need for me to remind him to do so. He worked independently, but he came by frequently whenever he had a question. Hence I was able to be a sounding board when he encountered problems, and I could point him toward new secondary literature when his work went in unanticipated directions. But let me emphasize that this was truly Andrew’s thesis: my contribution was minimal.