If you are of a generation of Germanists who were trained after the Second World War and who became fascinated with Broch's TDV as an epic pinnacle, the order of the 200 words below will stimulate an amazing aesthetic/intellectual cascade effect. For the rest of you - who have not struggled with TDV - I am afraid this whole example will not mean much; however, a few words of explanation may be in order. It is precisely the linguistic difficulty that obscures a tremendous promise - in the sense of potential. TDV promises no less than a first hand account of the German Götterdämmerung of the thirties and fourties from a lofty neo-classical perspective, but a neoclassicism that understands the proletariat, that understands power and the "state" and that understands human transcendence.
On one level there is aesthetic transcendence - obviously - but there is also a mystical transcendence of llife on the edge of death. [Vergil expires within hours of his arrival in Brundisium and after one final interview with his and all Romans' political and cultural master - Augustus Caesar, the physical incarnation and embodiment of all spirit that emanates from the Mediterranean.] It is the transcendence through death in Book 4. The actual death og Broch and the consequent lack of further explanations that has added to the mystery and fascination.
Every encounter with TDV has a mysterious dimension - several hours with the book and a brandy - an hour rreading passages to a friend - weeks spent trying to nail down chronologies or space relationships. not to speak of days spent figuring out what other may have thought.
Through all that - the tantalizingly impenetrable fog of Broch's language defeats all systematic analysis - that is - until you see the list below for the first time. Poof goes the fog. Suddenly, all sorts of questions can be asked...
|
5296 und 4809 der 4292 die 2651 das 2324 es 2295 in 1936 war 1847 des 1786 zu 1779 er 1592 sich 1544 nicht 1543 den 1443 dem 1324 sie 1138 von 1102 ist 940 ein 928 als 927 du 860 mit 810 ich 806 im 785 noch 781 so 780 daß 719 wie 683 auch 594 auf |
593 hatte 562 um 561 an 559 mehr 546 aus 522 zur 514 ihm 484 da 471 zum 470 für 468 sein 457 nur 401 nichts 393 oh 386 doch 383 eine 361 ihn 350 wird 348 aber 336 vor 333 selber 327 wurde 321 werden 317 über 311 dies 309 ihr 297 immer 294 denn 285 wieder |
279 nun 275 ja 270 vom 269 dir 267 oder 264 wir 254 einem 242 ihrer 242 alles 240 was 237 ins 236 sehr 236 ohne 235 waren 235 einer 234 wenn 233 seiner 232 dich 231 durch 230 nach 220 selbst 217 gewesen 216 hat 215 seine 214 mir 203 augustus 200 zeit 196 schon 190 vergil |
185 wirklichkeit 180 sind 179 mich 174 menschen 174 ihre 169 weil 167 hier 167 alle 166 man 165 hast 163 erkenntnis 156 wäre 154 plotius 152 kein 151 nacht 150 wissen 149 unter 149 dort 148 dein 146 keine 146 hätte 146 cäsar 145 stimme 143 mein 142 lucius 141 all 139 leben 139 haben 139 am |
138 uns 138 ihnen 137 dieses 137 damit 137 bis 136 jetzt 135 diese 135 dennoch 135 bloß 134 trotzdem 133 dieser 132 seinem 131 ob 131 geworden 131 eines 130 habe 130 etwas 129 zwischen 126 einen 122 bist 121 irdischen 120 dann 119 sogar 114 dessen 113 muß 113 kaum 112 schönheit 111 wohl 110 niemals |
110 aller 109 tod 109 einmal 108 will 108 liebe 107 plotia 107 allein 105 weiter 105 nein 105 mußte 103 schicksal 103 deine 103 also 102 vielleicht 102 seinen 102 sagte 100 sprache 100 knabe 100 bei |
[NOTE: Hermann Broch, Der Tod des Vergil, (Rhein Verlag: Zurich) 1952 and Hermann Broch, The Death of Vergil, (Pantheon Books: New York) 1945. The 1945 English translation and the 1952 German edition were put into electronic for by means of a Kuzweil Data Entry Machine at the Duke University Humanities Computing Facility.]
A glance at the alphabetical wordlist as well as the sorted frequency list can yield tremendous insight into a text. For example, there are 145,822 word in the original German edition; of those, 21,572 are unique. To rephrase that, there are 21,572 different words in the text; in the table above you see the frequencies up to 100. Figure 2, following shows the frequencies for the English translation which contains 165,491 words of which 13,332 are unique. Even the cursory comparison of the two lists yields amazing insights. There are roughly 20,000 more words in the English translation than in the original. Yet there are many more (8,000+) unique German words than English words.
This is best seem in tabular format:
| text | Total # of Wds. | Total # of unique Wds. |
| German (TDV) | 145,822 | 21,572 |
| English (DOV) | 165,491 | 13,3322 |
| BK 1 | BK 2 | BK 3 | BK 4 | |
| Total German | b1 | b2 | b3 | b4 |
| Unique German | b1 | b2 | b3 | b4 |
| Total English | b1 | b2 | b3 | b4 |
| Unique English | b1 | b2 | b3 | b4 |
This seems to indicate that German has many more compound forms and many more inflected forms than English. However, the numbers of the coordinating conjunction "und" and "and" are 5296 and 5339 respectively. Given the wide disparity of the total number of words and to number of unique words, a difference of 43 for a coordinating conjunction seems puzzling. A closer look at the breakdown of chapter frequencies shows that the numbers stay very close in each chapter. Further investigation leads to preliminary conclusions of the use of and-pairs to anchor a translation.
| BK 1 | BK 2 | BK 3 | BK 4 | |
| und (TDV) | b1 | b2 | b3 | b4 |
| and (DOV) | b1 | b2 | b3 | b4 |
|
13231 the 6588 of 5339 and 4397 to 3165 in 2706 a 2523 it 2484 was 2048 that 1786 he 1466 as 1430 had 1358 for 1274 you 1239 his 1223 not 1220 with 1141 which 1128 this 1071 by 1058 is 1056 be 1033 from 868 into 841 ? 803 i 796 all 790 on 755 no |
747 but 726 its 714 him 709 one 653 have 619 so 564 there 557 were 551 more 543 been 540 at 537 an 532 even 522 they 497 only 496 their 447 who 438 oh 434 your 432 are 415 or 408 out 392 them 391 if 383 time 374 now 371 again 355 itself 352 still |
348 yet 320 could 318 my 315 nothing 311 like 308 me 300 would 298 without 297 will 290 though 284 what 280 death 274 over 263 reality 260 through 256 must 255 himself 254 up 253 we 252 being 246 than 246 back 236 human 232 life 232 because 225 own 223 other 222 did 220 do |
218 has 214 light 213 night 211 perception 207 very 197 augustus 196 beyond 195 too 184 longer 183 voice 182 virgil 181 every 179 within 179 just 178 here 178 earthly 176 once 174 come 173 when 173 became 170 also 169 boy 168 toward 165 then 165 before 163 came 162 gods 161 might 161 her |
158 man 156 well 156 should 156 never 156 become 155 such 152 yes 152 plotius 152 knowledge 150 world 150 people 148 these 148 caesar 148 beauty 146 any 146 almost 146 after 144 fate 141 lucius 140 most 140 down 139 way 137 work 137 order 136 said 136 about 132 last 132 first 132 dream |
132 between 131 where 131 may 131 love 131 always 130 able 128 off 127 state 127 memory 127 how 127 both 126 she 124 our 124 hand 124 creation 121 truth 121 existence 121 earth 121 breath 121 art 120 slave 119 same 115 soul 115 name 115 already 114 while 114 us 114 something 114 away |
114 although 113 remained 112 upon 112 am 111 everything 108 plotia 108 nor 108 darkness 108 above 105 shall 104 seemed 104 know 104 held 103 however 103 far 102 past 102 face 101 symbol 101 new 101 great 101 each 100 much |
A quick glance at the whole wordlist shows some predictable but also some surprising items. For example, there are 74 instances of genze. Some quick detective work shows that Günter Grass' Die Blechtrommel [NOTE: Günter Grass, Die Blechtrommel, (xxx:XXX, dddd).] a work roughly contemporary to Broch's Death of Vergilhas only three references. Without any idea of proving anything, or even of knowing what this particular feature might mean, let us investigate some of the grammatical variants and compounds. A search for the string "grenz" surrounded by wild cards, "*grenz*" will yield the following list:
|
2 begrenzt 2 begrenzte 4 begrenzten 2 begrenztheit 1 begrenzung 1 gewölbegrenzen 74 grenze 31 grenzen 6 grenzenlos 2 grenzenlose 6 grenzenlosen 1 grenzenloser 5 grenzenlosigkeit 1 grenzentrückten 1 grenzerkenntnis 1 grenzfernen 1 grenzgleichgewicht 2 grenzjenseitigkeit 1 grenznachbarn 2 grenzraum 1 grenzraumes 1 grenzspiel 1 grenzt |
1 grenzüberschreitung 1 grenzüberwachsend 1 grenzumschlossen 1 grenzzustand 1 kippgrenze 1 klarheitsgrenzen 1 nachtgrenzen 1 raumesgrenze 1 raumesgrenzen 2 reichsgrenzen 1 sphärengrenze 1 sphärengrenzen 3 traumgrenze 4 unbegrenzt 1 unbegrenztheit 1 unendlichkeitsgrenze 1 unwirklichkeitsgrenzen 1 waldgrenze 1 weltengrenzen 1 wendegrenze 1 wirklichkeitsgrenze 2 zeitengrenze 1 zeitgrenze |
We shall consider the implications of such data for an quite specific and detailed interpretation of Tod des Vergil in the theoretical section of this presentation. At present we would like only to clear some intellectual space, to create a positive climate for this sort of computer magic.
The pulley forever changed the method of erecting buildings; the gun forever changed warfare. More subtle but equally far reaching effects on civilization were opportunities in media management caused by the pencil and eraser or the loose-leaf binder. Perhaps electronic indexing of texts will have an analogous influence on the way we interpret and analyze texts. Today we can say with some confidence that the computer will be involved in all phases of life, especially the intellectual life, and not just confined to a technological, scientific-engineering sphere.
But let us give the concept of a "tool" in the study of literature a historical dimension within the debates of the time. For example, the tools of the early 19th century philologists, paleography and indo-european linguistics must have seemed bizarre to the majority of conservative, rationalist teachers of the classics of that time. Similarly, today the tools of "Ideologiekritik," the explication of cultural bias,
[NOTE: Or perhaps one should speak of explicating texts of various periods in terms of cultural and political determinants. See: Christa Bürger, Textanalyse als Ideologiekritik. Zur Rezeption zeitgenössischer Unterhaltungsliteratur (Frankfurt / a.M.: Athenäum, 1973). Bürger is an early practicioner of "Ideologiekritik" who traces her roots to the discussion of "Kulturindustrie" by Adorno and Horkheimer in the sixties, p. 55. The focus on "popular literature" (read: Trivialliteratur, as the genre was generally recognized at the time before the contemporary canon crisis) is logical since one is not studying "eternal aesthetic or moral values" whispered into human consciousness by muses, but the products of a semi-industrialized production process.]
seem bizarre to the practitioners of close readings and the discovery of aesthetic values. There is a natural tension of competing perspectives that seems to divide scholars along ideological lines. We hope that the implementation of "electronic tools" will not be seen as a last chance by some faction to defend the discipline from the evil of the "word-counting devils" who are chipping away at their view of the semiotic process. Rather we hope that electronic tools will be embraced by all factions as an opportunity to make more precise the gathering of data to support arguments, be they historical, aesthetic or gender-based.
Thus we would like to carve out for the computer and the indexing of literary texts a role more fundamental, than most analysis paradigms would claim. Full text research has a "pre-analysis" data gatering function. It is possible that many competing and succeeding analysis paradigms would ALL use computers and indexed texts. These arguments can be considered an internal question, to be addressed by literary critics of various stripes. We will try to focus on describing the actual the tool. "tool" and not be sidetracked by what we may want to prove by The actual tool, however, the computer and the theories that make it work are quite external questions that demand attention to other fields of study.
We will develop two points in our discussion of the relevance of computing to literature, texts and literary scholars:
DOS word processing has been widely recognized as a "trojan horse" to bring computers to non-technological areas. However, we would like to supplement this commonplace by adding a corollary - familiarity or virtuosity with the DOS word processing interface puts the capability to do powerful and sophisticated computing into the hands of relative computer neophytes. Often the humanists who master word processing think no more of that skill than they do about mastering type writing. This is a vast underestimation of the potential for computer aided research within easy grasp.
[NOTE: The only prerequisite for this type of work is to expand the range of activity of a user of word processing. A computer literate humanist already has a word processor and is familiar with the concept of "text," i.e. word processor files, on-line. The only additional concepts are to replace the word processor with an indexing program and to replace the word processing files with the "full" text of the primary works to be studied.]
The capacity to handle "strings" (defined as a sequence of letters - a word or a phrase) is a very fundamental and easy function for a computer. "Indexing" strings puts words into an alphabetical order within the computer that allows rapid access to all occurrences of the words. Thus, for example, the word "butterfly" occurs 14 times in Lord Jim. In the index of the string "butterfly," fourteen pointers indicate the location of the fourteen references in the text in the memory of the computer. At the touch of a single key, [Return], all fourteen references can be pointed and retrieved without any elaborate search in a time frame of milliseconds. Thus even inexpensive ($1,000) and computationally slow micro computers can do very rapid retrieval of strings.
From the field of linguistics we shall appropriate semantics, since defining semantic categories is an important prerequisite to deriving useful information from a full text data base. In addition, we will appropriate the notion of syntactic structures to disambiguate strings into the appropriate semantic category and to achieve a "lemmatized" text in which the retrieval of a "root" form will bring out all inflected, regular and irregular forms.
[NOTE: For an introduction to some of the questions, discussions and vocabulary of this area of computational linguistics see: Graeme Hirst, Semantic Interpretation and the resolution of ambiguity (Cambridge: Cambridge University Press, 1987). The book treats both lexical and structural disambiguation, gives a summary of representative projects and sketches the direction of the discipline.]
We shall have more to say in the body of the book about the relationship between linguistics (computational and otherwise) and literary studies, which should be exploited for the creation of parsing tools for humanists so that the historical and aesthetic side of language is not lost entirely in the quest for the universal phrase structure grammar.