What is a "Tool" in Textual Studies?
Prerequisites

In the area of text preparation, we know what a tool is: it is a pen, a pencil, a typewriter, a word processor. The hand, in case of a pencil, or the fingertips, in the case of a word processor, manipulate the tool in order to dip into the stream of the writer's thoughts and to affix them to paper so others can read them. But alas, that is not the whole story of computers and text any more than the Stanley Steamer was the whole story of vehicular locomotion.

In the discussion below we will introduce several electronic tools that were designed to facilitate the study of texts. Our goal is to present information for all sorts of Humanists, literary scholars, philologists, students of particular philosophical traditions, medievalists, and colleagues in other departments who study textual monuments and who have been exposed to micro computers via word processing. We would like to offer very practical pointers for a textual scholars who would presume to use their micro computers for more than manuscript preparation.

Our chief focus will be the indexing of texts and the retrieving of patterns of words and phrases in the text. These patterns, of course can be displayed in 3-D graphics to give a statistical picture of the text structures. All this sounds, prima facie, like a criminal trespass in the area of computer science and the area of computational linguistics, and not something a respectable textual scholar would do. But here it is: humanities types have learned word processing, have gotten to be very good at word processing, and now they may dare to use their glorified typewriters to do what professional programmers, engineers and linguists have been doing for years.

We hope to show that this perception of trespassing dilettantishly in important fields of science is not really valid. We hope to show that these effort pose no real threat to established scientific disciplines; nor is there great danger of grievous errors creeping into literary research or into the continental hermeneutic discussion since Husserl. Quite to the contrary, "Electronic Text Research" or perhaps "Text Computing" arises more from the commercial availability of electronic productivity aids than from an articulated technological (and thus weirdly dangerous) epistemology. Of course we shall flirt with danger when we try to focus - by way of digression - on the question of whether a microscope, for example, should be considered a labor saving device, a productivity tool; or whether, indeed, it revolutionized the study of biology.

For the sake of argument, imagine a researcher trying to convince a biologist of the era before Leeuwenhoek (1632-1723) that the entire field would be revolutionized through the application of the microscope. "Fie'on't - who would study nature by studying motes through a glass." Yet Leeuwenhoek is a classic case of the application of a tool in the advancement of science, although to strengthen the analogy to present day text studies in all disciplines, seventeenth century biology represents more precisely a struggle to attain a scientific perspective, rather than advance an articulated scientific method.

Leeuwenhoek did not have a grand theory of how the single lense microscope would revolutionize the study of nature; he did not even have a suspicion that the study of minute life forms would lead him to discover life cycles quite analogous to life forms observable with the naked eye. However, he was quite clear on the concept that careful observation, careful documentation, and the avoidance of premature conclusion would yield insight into minute structures in the human body and in plants and animals in general. He was the first to describe the capillary system, red corpuscles and spermatozoa. He also drew the first bacteria and discovered the life cycle of weevils. While Leeuwenhoek was obviously not able to synthesize everything he saw under the glass, he was able to lay to rest contemporary theories that weevils spontaneously generated from corruption and that fleas generated from dust or sand.

Today with our knowledge explosion, we are presented with a bewildering mass of descriptive analyses. It is not that this ever growing bibliography cannot be typed and organized, the problem is more that diametrically opposed descriptions of the same artifact have to coexist. Perhaps something can be both the sun and the moon, black and white, liberating and oppressive, great and beneath contempt. I often wonder if we are not as ignorant about texts today as Leeuwenhoek's contemporaries were about, fleas. This is not to say that Leeuwenhoek's contemporaries did not have theories about fleas, thought a lot about fleas, prayed to be rid of fleas and wrote learned treatises about fleas, it is just that they did not have the tools to observe closely the life cycle of fleas.

By the same token, we have theories about literature and we write learned treatises. However, our main tool of observation is still only the memory of the lived experience while reading. Our most widely used aid to analysis is writing notes in the margin. We are surrounded by literature both primary and secondary much as Europeans were by fleas in the 17th century, yet for all our theories, arguments and publications, we cannot say with any degree of certainty if literature is mechanism for oppression or liberation. Nor have we declared the question moot - rather we still delight in destroying models not our own and defending our models from attack by merciless enemies bent on our very distruction. It is one thing to have contentious debate of meaning - or trying out new and radical ideas. It is something else to be fundamentally confused about the material being studied. One could argue a radical skepticism that even our most advanced science ages at a rate that todays effort might seem risible in a few years. In the field of "Textual Studies," taken comprehensively, I sense a pre-scientifc epistemology. One cannot describe what one cannot see. We are surrounded by texts morning noon and night - something has been affixed to the page and thus can be transmitted/shared/preserved - for a time. Much human activity revolves around text, much time is spent learning how to deal with texts. Much good and evil is done in reference to texts. Yet, by comparison to what we know of the purpose and nature of texts, Johann Fischarts (1546-1590) Floehasz seems systematic taxonomy.

Alas, should we claim that computer indexing applied to texts would take us out of a critical doledrum equal to that of the scientific doledrum of biology before the microscope. Well no, the humility of a reasoned position, precludes any hubris of solving everybody's persistent problems. But the implications are clear: computer indexing will affect our study of texts and language as the microscope affected etc. etc... And why not, four generations ago people could not imagine a car. Twenty years ago, people thought computers would always cost millions of dollars. Ten years ago 640Kb RAM was thought to be more than anyone would ever want. Today there are some 100 million micro computers all over the world. That should make a difference in any number of fields, the study of texts not excepted.

However, that may be, we would like to stress that the software industry is really driving this development. There is no clear movement by the leaders in professional literary scholarship or in literary criticism or in philosophical scholarship to embrace indexing tools. Text preparation tools, however, have become universally accepted. The few researchers in the area of indexing studies lead a precarious existence on the periphery of the discipline. In the literary arena, discussions of computer techniques are limited to a few special sessions at the Modern Language Association. Conferences that are dedicated to computers and the humanities generally attract fewer than five-hundred practicioners. In the area of linguistics both the theory and the computational skills are continually increasing; yet the work on bringing both theory and skills to bear in the humanities arena is not generally seen as part of the mission of scientific linguistics. However, our task here is not to indict the majority of myopia, succeeding generations of researchers usually reserve that right. As a first step, it is important to decribe research techniques that are increasingly available to scholars with desk-top computers.


Indexing as a "Tool"

The chief computational concept which underlies this presentation is the indexing of text files and retrieving information from the indexed text in an interface vehicle that is very similar to a word processor. This dense formulation requires some further elucidation. There are three concepts involved here: 1. index 2. retrieval and 3. interface. The indexing of a text is the crucial first step on the way to quick retrieval of information from that text. Indexing is a computer procedure that makes the more familiar searching of a text in a wordprocessor seem slow and cumbersome. Any word processing novice knows that searching through a text to find specific words takes a long time by computer standards, often 1 or more seconds for each word searched in a long file. The reason for this is that word processing text is generally not indexed and the computer has to examine each word in a linear search. If a text has been indexed, ALL references to a word, for example, "hath" in the 37 plays of Shakespeare can be retrieved in a second. The speed has nothing to do with the power of a specific computer, nor is it a function of the size of a text (5Mb for the 37 plays of Shakespeare); the speed is merely a function of indexing. Once the indexing is done, the retrieval is instantaneous.

By retrieval we mean that the computer will dip into its index of the text and in less than a second put the various citations onto a screen. The retrieval will generally allow the display of words, lists of words, phrases and various combinations based on boolean operators.

Now let us turn to the third concept, interface. The retrieval is achieved by means of an interface, which means that any non-computer-expert will be able to achive the retrieval. An interface is a series of screens that has been designed with a user in mind. A good interface, which can also be called a friendly interface, will allow a novice to intuit what keys to press to achieve a certain result. Although interfaces still have to be learned and some subtleties have to be mastered, the whole concept of an interface is quite an advancement in making computers easier to use. Before designers worked out friendly and intuitive interfaces, learning the command syntax of a particular system could be an arduous task requiring some professional training. We are developing a Web-based interface that combines the presentation of information as well as an open-ended research environment where complex data can br mined for as yet unimagined relationships.

The juncture of index, retrieval and friendly network interfaces is the gateway to the vast land of text processing beyond mere manuscript preparation. By analogy one could say that the juncture of inexpensive microcomputers and friendly text handling interfaces caused the wide-spread use of word processing.