Let us follow some of the threads with graphical output:
The graph below shows the themes sounded in the first sentence:
Let us shell out some of these lines so we can get specific juxtapositions.
For example, the graph below
focuses on Selbstbesinnung and Geisteswissenschaften:
The graph below focuses on Geisteswissenschaften and
Naturwissenschaften:
Thematically, the nwiss - gwiss juxtaposition will probably yield many
(as yet undiscovered) patterns of statistical anomaly which will
in turn yield pointers to
(as yet undiscovered)
thematic emphasis. For an immediate payoff we
can take the surprisingly regular profile (beginning-middle-end)
of the item Selbstbesinning. By checking all lexical items in 1.I-1
and in 3-1 we should be able to find any similar profiles. It is, of
course obvious that other "first sentences" of other units will also be
profiled.
Of course, these frequencies are "raw" - which means they are exactly the
number of time the word Geisteswissenschaft occurs in a
particular chapter - for example: zero times in II-1 and 65 times
in I-1 of the Erster Teil. We have no way of knowing if zero or
"65" is an unexpectedly high or low or just average for the text as a whole.
In order to determine that we need to know what percentage of the text
I-1 and II-2 represent: 8% and 6% respectively. Since we know that the
chapters are roughly of similar size we would expect roughly equal
distribution of lexical items - that is unless there is a thematic
emphasis. So the question is: have we discovered a particular shift in
the argument, abandoning completely a theme sounded in 1.II-1 which dies
down only to resurface in the 2. Teil. And by the way - whats up
with all of the Selbstbesinnung at the beginning of 2. Teil
and at the end of 3. Teil? The graph is almost too neat.
Let us review: we must know the relative size of the individual sections
of a text. The formula for calculating that is:
total of a particular chapter divided by total of the whole work
This fraction really represents the percentage of words in a particular
chapter.
The calculation of "expected frequencies" is really of greatest interest.
Take the total of a particulat lexical item and multiply by the
"fraction" calculated above.
total_word times total_chapter divided by total_all_words
This is how expected frequencies are calculated - if you would like to
quarrel with the formula above - I am afraid that this transmission has
become too unstable to continue.
Those who have survived this leap into statistical abstraction are now
ready to reap their rewards.
However, should we even be allowed to ask: Why
does "Wahrheit" occur 28 times in a particular
chapter? What sort of tool is it that makes it
possible to ask such a question and what are
its implication for the study of philosophy?
In connection with the study of
philosophy, in contrast to producing a monograph,
using a computer for more than simple
manuscript preparation and its addenda
is controversial. The
fact is that in philosophy, the
introduction of a "tool," is tantamount to
the introduction of a "method of analysis."
It is difficult to introduce new tools
without engaging in polemic against existing
tools and participating in a Darwinian
process in which the rules are evolving and
unpredictable. The discussion below is
designed to seek an exemption for "electronic
tools" from the usual scrutiny and skepticism
that is
lavished on new methodologies. For this purpose perhaps an
additional distinction between "tool" and
"method" would be helpful. We see the text
processing computer as an ubiquitous
presence - yes it will affect methodology -
but no, one cannot filter out the perspective
gained by electronic representation of text.
If one were to posit a pure form of text or a
pure form of interacting with text that would
be corrupted by the introduction of an
electronic representation, we would find
ourselves in a difficult logical situation.
We would have to hypostatize text in some
form, for example, Hegel's Phenomenology of
Spiritread
in a leather-bound edition hand-crafted circa
1895; or, the Harper Torchbook edition of the same work as I read
it and marked up in a seminar in 1967.
That would be tantamount to arguing one should ban the use of textual
representations of medieval literature in analysis
since much of it was exclusively oral and thus would be distorted by the
written form. Indeed, I would not be surprised if this
may not be a dangerous suggestion since there are enough anthropologists
disguised as literary critics who may see that as a
welcomed unburdening of literary studies from the tedious traditions of
philology.
Let us consider a
more controversial use of the same tool which crosses the line of
advanced word processing routines towards the
realm of "method."