**COMPUTERS, MATHEMATICS, AND COMPUTER SCIENCE**

The discipline of computer science has evolved dynamically since the creation of the device that now generally goes by that name, that is, the electronic digital computer with random-access memory in which instructions and data are stored together and hence the instructions can change as the program is running. The first working computers were the creation not of scientific theory, but of electrical and electronic engineering practice applied to a longstanding effort to mechanize calculation. The first applications of the computer to scientific calculation and electronic data processing were tailored to the machines on which they ran, and in many respects they reflected earlier forms of calculating and tabulating devices. Originally designed as a tool of numerical calculation, in particular for the solution of non-linear differential equations for which no analytical solution was available, the computer led immediately to the new field of numerical analysis. There, speed and efficiency of computation determined the initial agenda, as mathematicians wrestled large calculations into machines with small memories and with basic operations measured in milliseconds.

As the computer left the laboratory in the mid-1950s and entered both the defense industry and the business world as a tool for data processing, for real-time command and control systems, and for operations research, practitioners encountered new problems of non-numerical computation posed by the need to search and sort large bodies of data, to make efficient use of limited (and expensive) computing resources by distributing tasks over several processors, and to automate the work of programmers who, despite rapid growth in numbers, were falling behind the even more quickly growing demand for systems and application software. The emergence during the 1960s of time-sharing operating systems, of computer graphics, of communications between computers, and of artificial intelligence increasingly refocused attention from the physical machine to abstract models of computation as a dynamic process.

Most practitioners viewed those models as mathematical in nature and hence computer science as a mathematical discipline. But it was mathematics with a difference. While insisting that computer science deals with the structures and transformations of information analyzed mathematically, the first Curriculum Committee on Computer Science of the Association for Computing Machinery (ACM) in 1965 emphasized the computer scientists' concern with effective procedures:

The computer scientist is interested in discovering the pragmatic means by which information can be transformed to model and analyze the information transformations in the real world. The pragmatic aspect of this interest leads to inquiry into effective ways to accomplish these at reasonable cost.A report on the state of the field in 1980 reiterated both the comparison with mathematics and the distinction from it:^{(1)}

Mathematics deals with theorems, infinite processes, and static relationships, while computer science emphasizes algorithms, finitary constructions, and dynamic relationships. If accepted, the frequently quoted mathematical aphorism, 'the system is finite, therefore trivial,' dismisses much of computer science.Computer people knew from experience that 'finite' does not mean 'feasible' and hence that the study of algorithms required its own body of principles and techniques, leading in the mid-1960s to the new field of computational complexity. Talk of costs, traditionally associated with engineering rather than science, involved more than money. The currency was time and space, as practitioners strove to identify and contain the exponential demand on both as even seemingly simple algorithms were applied to ever larger bodies of data. Yet, central as algorithms were to computer science, the report continued, they did not exhaust the field, "since there are important organizational, policy, and nondeterministic aspects of computing that do not fit the algorithmic mold."^{(2)}

There have been some notable exceptions to the notion of computing as
essentially a mathematical science. In a widely read letter to the editor
of *Communications of the ACM*, Allan Newell, A.J. Perlis, and Herbert
Simon, each a recipient of the ACM's highest honor, the Turing Award, argued
that the huge gap between what programmers intend and what computers actually
do makes computing an empirical science. Other winners have echoed that
view, among them Marvin Minsky, one of the creators of artificial intelligence,
and Donald Knuth, whose two-volume *Art of Computer Programming* has
long been the standard reference tool for programmers.

Thus, in striving toward theoretical autonomy, computer science has always maintained contact with practical applications, blurring commonly made distinctions among science, engineering, and craft practice. That characteristic makes the field both a resource for the reexamination of these distinctions and an elusive subject to encompass in a short historical account. What follows, therefore, focuses on the core of the search for a theory of computing, namely the effort to express the computer and computation in mathematical terms adequate to practical experience and applicable to it.

In tracing the emergence of a discipline, it is useful to think in terms
of its *agenda*, that is, what practitioners of the discipline agree
ought to be done, a consensus concerning the problems of the field, their
order of importance or priority, the means of solving them, and perhaps
most importantly, what constitute solutions. Becoming a recognized practitioner
of a discipline means learning the agenda and then helping to carry it
out. Knowing what questions to ask is the mark of a full-fledged practitioner,
as is the capacity to distinguish between trivial and profound problems.
Whatever specific meaning may attach to "profound," generally it means
moving the agenda forward. One acquires standing in the field by solving
the problems with high priority, and especially by doing so in a way that
extends or reshapes the agenda, or by posing profound problems. The standing
of the field may be measured by its capacity to set its own agenda. New
disciplines emerge by acquiring that autonomy. Conflicts within a discipline
often come down to disagreements over the agenda: what are the really important
problems? Irresolvable conflict may lead to new disciplines in the form
of separate agendas.

As the shared Latin root indicates, agendas are about action: what is
to be *done*? Since what practitioners do is all but indistinguishable
from the way they go about doing it, it follows that the tools and techniques
of a field embody its agenda. When those tools are employed outside the
field, either by a practitioner or by an outsider borrowing them, they
bring the agenda of the field with them. Using those tools to address another
agenda means reshaping the latter to fit the tools, even if it may also
lead to a redesign of the tools. What gets reshaped, and to what extent,
depends on the relative strengths of the agendas of borrower and borrowed.

**MACHINES THAT COMPUTE**

At the outset, computing had no agenda of its own from which a science might have emerged. As a physical device, the computer was not the product of a scientific theory from which it inherited an agenda. Rather, computers and computing posed a constellation of problems that intersected with the agendas of various fields. As practitioners of those fields took up the problems, applying to them the tools and techniques familiar to them, they gradually defined an agenda for computer science. Or, rather, they defined a variety of agendas, some mutually supportive, some orthogonal to one another, and some even at cross purposes. Theories are about questions, and where the nascent subject of computing could not supply the next question, the agenda of the outside field often provided it.

When the computer came into existence, there were two areas of mathematics concerned with it as a subject in itself rather than as a means for doing mathematics. One was mathematical logic and the theory of computable functions. To make the notion of 'computable' as clear and simple as possible, Alan Turing proposed in 1936 a mechanical model of what a human does when computing:

We may compare a man in the process of computing a real number to a machine which is only capable of a finite number of conditionsTuring imagined, then, a tape divided into cells, each containing one of a finite number of symbols. The tape passes through a machine that can read the contents of a cell, write to it, and move the tape one cell in either direction. What the machine does depends on its current state, which includes a signal to read or write, a signal to move the tape right or left, and a shift to the next state. The number of states is finite, and the set of states corresponds to the computation. Since a state may be described in terms of three symbols (read/write, shift right/left, next state), a computation may itself be expressed as a sequence of symbols, which can also be placed on the tape, thus making possible a universal machine that can read a computation and then carry it out by emulating the machine described by it.q_{1},q_{2}, ...,q_{R}which will be called 'm-configurations.' The machine is supplied with a 'tape' (the analogue of paper) running through it, and divided into sections (called 'squares') each capable of bearing a 'symbol.'^{(3)}

Turing's machine, or rather his monograph, fitted into the current agenda
of mathematical logic. The *Entscheidungsproblem* stemmed from David
Hilbert's program of formalizing mathematics; as stated in the textbook
he wrote with W. Ackermann,

TheHaving written the paper for W.H.A. Newman's senior course at Cambridge, Turing turned for graduate study to Alonzo Church at Princeton, who had recently introduced an equivalent notion of 'computability,' which he called 'effective calculability.' Turing subsequently showed that his machine had the same power as Church's lambda calculus or Stephen Kleene's recursive function theory for determining the range and limitations of axiom systems for mathematics.Entscheidungsproblemis solved when one knows a procedure by which one can decide in a finite number of operations whether a given logical expression is generally valid or is satisfiable. The solution of theEntscheidungsproblemis of fundamental importance for the theory of all fields, the theorems of which are at all capable of logical development from finitely many axioms.^{(4)}

The other area of mathematics pertinent to the computer originated in
a 1938 paper by Claude Shannon of MIT and Bell Telephone Laboratories.^{(6)}
In it Shannon had shown how Boolean algebra could be used to analyze and
design switching circuits. It took some time for the new technique to take
hold, and it came to general attention among electrical engineers only
in the late '40s and early 1950s. The literature grew rapidly in the following
years, but much of it took a standard form. Addressed to an audience of
electrical engineers, most articles included an introduction to Boolean
algebra, emphasizing its affinity to ordinary algebra and the rules of
idempotency and duality that set it apart. Either by way of motivation
at the outset, or of application at the end, the articles generally included
diagrams of circuits and of various forms of switches and relays, the former
to establish the correspondence between the circuits and Boolean expressions,
the latter to illustrate various means of embodying basic Boolean expressions
in concrete electronic units. The articles gave examples of the transformation
of Boolean expressions to equivalent forms, which corresponded to the analysis
of circuits. Then they turned to synthesis by introducing the notion of
a Boolean function and its expression in the normal or canonical form of
a Boolean polynomial. The articles showed how the specification of a switching
circuit in the form of a table of inputs and outputs can be translated
into a polynomial, the terms of which consist of products expressing the
various combinations of inputs. From the polynomial one could then calculate
the number of switches required to realize the circuit and seek various
transformations to reduce that number to a minimum.

As the notion of an algebraic theory of switching circuits took root, authors began shifting their attention to its inadequacies. They were not trivial, especially when applied to synthesis or design of circuits, rather than analysis of given circuits. For circuits of any complexity, representation in Boolean algebra presented large, complicated expressions for reduction to minimal or canonical form, but the algebra lacked any systematic, i.e., algorithmic, procedure for carrying out that reduction. The symbolic tool meant to make circuits more tractable itself became intractable when the number of inputs grew to any appreciable size.

Neither Turing's nor Shannon's mathematics quite fit the actual computer. Turing effectively showed what a computer could not do, even if given infinite time and space to do it. But Turing's model gave little insight into what a finite machine could do. As Michael Rabin and Dana Scott observed in their seminal paper on finite automata:

Turing machines are widely considered to be the abstract prototype of digital computers; workers in the field, however, have felt more and more that the notion of a Turing machine is too general to serve as an accurate model of actual computers. It is well known that even for simple calculations it is impossible to give anBy contrast, Boolean algebra described how the individual steps of a calculation were carried out, but it said nothing about the nature of the computational tasks that could be accomplished by switching circuits. A theory of computation apposite to the electronic digital stored-program computer with random-access memory lay somewhere between the Turing machine and the switching circuit. Computer science arose out of the effort to fill that gap between the infinite and the finite, a region that had hitherto attracted little interest from mathematicians.a prioriupper bound on the amount of tape a Turing machine will need for any given computation. It is precisely this feature that renders Turing's concept unrealistic.^{(7)}

Just what and who would fill it was not self-evident. In a lecture "On a logical and general theory of automata" in 1948, John von Neumann pointed out that automata and hence computing machines had been the province of logicians rather than mathematicians, and a methodological gulf separated the two:

There exists today a very elaborate system of formal logic, and, specifically, of logic as applied to mathematics. This is a discipline with many good sides, but also with certain serious weaknesses. This is not the occasion to enlarge upon the good sides, which I certainly have no intention to belittle. About the inadequacies, however, this may be said: Everybody who has worked in formal logic will confirm that it is one of the technically most refractory parts of mathematics. The reason for this is that it deals with rigid, all-or-none concepts, and has very little contact with the continuous concept of the real or of the complex number, that is, with mathematical analysis. Yet analysis is the technically most successful and best-elaborated part of mathematics. Thus formal logic is, by the nature of its approach, cut off from the best cultivated portions of mathematics, and forced onto the most difficult part of the mathematical terrain, into combinatorics.Von Neumann subsequently made it clear he wanted to pull the theory back toward the realm of analysis, and he did not expand upon the nature of the combinatory mathematics that might be applicable to it.

The theory of automata, of the digital, all-or-none type, as discussed up to now, is certainly a chapter in formal logic. It will have to be, from the mathematical point of view, combinatory rather than analytical.^{(8)}

It was not clear what tools to use in large part because it was not clear what job to do. What should a theory of computing be about? Different answers to that question led to different problems calling for different methods. Or rather, it opened the theory of computing to different approaches, shaped as much by the taste and training of the theorist as by the parameters of the problem. As Figure 1 illustrates, people came to the theory of computing from a variety of disciplinary backgrounds, most of them mathematical in form yet diverse in the particular kind of mathematics they used. Some of them were centrally concerned with computing; others took problems in computing as targets of opportunity for methods devised for other purposes altogether. In the long run, it all seemed to work out so well as to lie in the nature of the subject. But computers had no nature until theorists created one for them, and different theorists had different ideas about the nature of computers.

**THEORY OF AUTOMATA**

As von Neumann's statement makes clear, automata traditionally lay in
the province of logicians, who had contemplated the mechanization of reasoning
at least since Leibniz expounded the resolution of disagreements through
calculation. Since logical reasoning was associated with rational thought,
automata formed common ground for logicians and neurophysiologists. Before
Turing himself made the case for a thinking machine and proposed his famous
test in 1950, his original paper provoked Warren McCulloch and Walter Pitts
in 1943 to model the behavior of connected neurons in the symbolic calculus
of propositions and to show that in principle such 'nervous nets' were
capable of carrying out any task for which complete and unambiguous instructions
could be specified.^{(9)} The behavior of the
nets did not depend on the material of the components but only on their
behavior and connectivity, and hence they could equally well be realized
by switching circuits.

Automata thus brought convergence to the agendas of the neurophysiologists and of the electrical engineers. Shannon's use of Boolean algebra to analyze and design switching circuits spread quickly in the late 1940s and early 1950s. In its original form, the technique applied to combinational circuits, the output of which depends only on the input. Placing a secondary relay in a circuit made its behavior dependent on its internal state as well, which could be viewed as a form of memory. Such sequential circuits could also be represented by Boolean expressions. For both circuits, the basic problem remained the reduction of complex Boolean expressions to their simplest form, which in some cases (but not all) meant the least number of components, a major concern when switches were built from expensive and unreliable vacuum tubes.

The analysis of sequential circuits gave rise then to the notion of a sequential machine, described in terms of a set of states, a set of input symbols, a set of output symbols, and a pair of functions mapping input and current state to next state and output, respectively. Starting in an initial state, a sequential machine read a sequence of symbols and outputted a string of symbols in response to it. The output was not essential to the model; one could restrict one's attention to the final state of the machine after reading the input. 'Black-boxing' the machine shifted attention from the circuits to the input and output, and thus to its description in terms of the inputs it recognizes or the transformations it carries out on them. With the shift came two new sets of questions: first, what can one say about the states of a machine on the basis simply of its input-output behavior and, second, what sorts of sequences can a machine with a finite set of states recognize and what sorts of transformations can it carry out on them?

In 1951, working from the paper by McCulloch and Pitts and stimulated
by the page proofs of von Neumann's lecture quoted above, Stephen C. Kleene
established a fundamental property of finite-state automata, namely that
the sets of tapes recognized by them form a Boolean algebra. That is, if
L_{1} and L_{2} are sets of strings recognized by a finite
automaton, then so too are their union, product, complements, and iterate
or closure.^{(10)} Still working in the context
of nerve nets, Kleene called such sets 'regular events,' but they soon
came to be called 'regular languages,' marking thereby a new area of convergence
with coding theory and linguistics.

By the end of the 1950s a common mathematical model of the finite automaton
had emerged among the convergent agendas, set forth in the seminal paper
by Michael Rabin and Dana Scott mentioned above. Their formalism, aimed
at retaining the flavor of the machine and its ties to Turing machines,
became the standard for the field, beginning, significantly, with tapes
formed by finite sequences of symbols from a finite alphabet . The class
of all such tapes, including the empty tape , is denoted *T*. A finite
automaton *A* over an alphabet is a quadruple (*S*, *M*,
*s*_{0},
*F*), where *S* is a finite set of states,
*M* is a function
which for every pair consisting of a state and a symbol specifies a resulting
state, *s*_{0} is an initial state, and *F* is a subset
of *S* comprising the final states. As defined, the automaton works
sequentially on symbols, but by an obvious extension of *M* to sequences,
it becomes a tape machine, so that
*M*(*s*,*x*) is the state
at which the machine arrives after starting in state *s* and traversing
the tape *x*. In particular, if
*M*(*s*_{0},*x*)
is one of the final states in
*F*, the automaton 'accepts' *x*.
Let *T*(*A*) be the set of tapes in *T* accepted, or defined,
by automaton *A*, and let be the class of all sets of tapes defined
by some automaton.

Rabin and Scott pursued two related sets of questions: first, what can
be determined about the set of tapes *T*(*A*) accepted by a given
automaton *A*, and, second, what is the mathematical structure of
*T*(*A*)
and of the larger class ? Taking a lead from Kleene's results, they established
that constitutes a Boolean algebra of sets and hence that it contains all
finite sets of tapes. Two central decision problems were now immediately
solvable. First, if an automaton *A* accepts any tapes, it accepts
a tape of length less than the number of its internal states, and therefore
an effective procedure exists for determining whether
*T*(*A*)
is empty (the 'emptiness problem'). Second, if *A* is an automaton
with *r* internal states, then *T*(*A*) is infinite if and
only if it contains a tape of length greater than or equal to *r*
and less than or equal to 2*r*; again, an effective procedure follows
for determining whether the set of definable tapes is infinite.

At two points in their study, Rabin and Scott identified the set *T*
with the semigroup, an algebraic structure that had until then had received
very little attention from mathematicians. But they did not make much use
of the concept, approaching their model combinatorially and in essence
constructing automata to meet the conditions of the problem. That reflected
the agenda of mathematical logic, a subject still largely on the periphery
of mathematics, outside its increasingly algebraic agenda.^{(11)}
Yet, as the reference to the semigroup suggests, it was making contact
with algebra. The link had been forged by way of coding theory.

Coding theory arose from another path-breaking paper of Shannon's, his
*Mathematical
Theory of Communication*, first published in 1948. There Shannon offered
a mathematical model of information in terms of sequences of signals transmitted
over a line and posed the fundamental question of how to preserve the integrity
of the sequence in the presence of distortion. The answer to that question
took the form of sequential codes that by their very structure revealed
the presence of errors and enabled their correction.

In a vaguely related way, circuit theory and coding theory verged on a common model from opposite directions. The former viewed a sequential machine as shifting from one internal state to another in response to input; the transition might, but need not involve output, and the final state could be used as a criterion of acceptance of the input. The main question was how to design the optimal circuit for given inputs, a matter of some importance when one worried about the cost and reliability of gates. Coding theory focused on the problem of mapping the set of messages to be transmitted into the set of signals available for transmission in such a way that the latter permit unambiguous retranslation when received. That means, first and foremost, that the stream of signals can be uniquely scanned into the constituents of the message. In addition, the process must take place as it occurs, and it should be immune to noise. That is, the code must be unambiguously decipherable as encountered sequentially, even if garbled in transmission.

It was in the context of coding theory that the agendas grouped around the finite automaton intersected with an important mathematical agenda centered in Paris. In "Une théorie algébrique du codage," Marcel P. Schützenberger, a member of Pierre Dubreil's seminar on algebra and number theory at the Faculté des Sciences in Paris, linked coding to the 'fundamental algebraic structure' of Bourbaki:

It is particularly noteworthy that the fundamental concepts of semigroup theory, introduced by P. Dubreil in 1941 and studied by him and his school from an abstract point of view, should have immediate and important interpretations on the level of the concrete realization of coding machines and transducers.Schützenberger translated the problem of coding into the conceptual structure of abstract semigroup theory by viewing a sequence of symbols as a semigroup and codes as homomorphisms between semigroups. The details are too complex to follow here, but the trick lay in determining that the structural properties of semigroups and their homomorphisms gave insight into the structure of codes.^{(12)}

The resulting semigroup theory of machines thus gave a mathematical
account of sequential machines that pulled together a range of results
and perspectives. More importantly, perhaps, it suggested mathematical
structures for which there was at the time no correlate in automata theory.
Within Schützenberger's mathematical agenda, one moved from basic
structures to their extension by means of other structures. In "Un problème
de la théorie des automates" he set out the semigroup model of the
finite automaton and then looked beyond it, aiming at an algebraic characterization
based on viewing a regular language as a formal sum, or formal power series,
*x*
of the strings it comprises and considering the series as a 'rational'
function of the *x* *X*:

We then generalize that property by showing that if, instead of being finite,That is, the extended sum takes the form <Sis the direct product of a finite set byZ(the additive group of integers) and if the mapping (S,X)S' and the subsetS_{1}are defined in a suitable fashion, then the corresponding sum is in a certain sense 'algebraic.'

Initially, Schützenberger suggested no interpretation for the extension; that is, it corresponded to no new class of automata or of languages accepted by them. In fact, both the languages and the automata were waiting in the wings. They came on stage through the convergence of finite automata with yet another previously independent line of investigation, namely the new mathematical linguistics under development by Noam Chomsky. The convergence of agendas took the concrete form of Schützenberger's collaboration with Chomsky on context-free grammars, then just getting underway. The result was the mathematical theory of formal languages.

**FORMAL LANGUAGES**

Formal language theory similarly rested at the start on an empirical base. Chomsky's seminal article, "Three models of language" (1956), considered grammar as a mapping of finite strings of symbols into all and only the grammatical sentences of the language:

The grammar of a language can be viewed as a theory of the structure of this language. Any scientific theory is based on a certain finite set of observations and, by establishing general laws stated in terms of certain hypothetical constructs, it attempts to account for these observations, to show how they are interrelated, and to predict an indefinite number of new phenomena. A mathematical theory has the additional property that predictions follow rigorously from the body of theory.The theory he was seeking concerned the fact of grammars, rather than grammars themselves. Native speakers of a language extract its grammar from a finite number of experienced utterances and use it to construct new sentences, all of them grammatical, while readily rejecting ungrammatical sequences. Chomsky was looking for a metatheory to explain how to accomplish this feat, or at least to reveal what it entailed. (Ultimately the physical system being explained was the human brain, the material basis of both language and thought.) In 1958 he made the connection with finite automata through a collaborative investigation with George Miller on finite state languages and subsequently joined forces with Schützenberger, then a visitor at MIT.

As powerful as the semigroup seemed to be in capturing the structure
of a finite automaton, the model was too limited by its generality to handle
the questions Chomsky was asking. It recognized languages produced by translations
from tokens to strings of the form *x*, *x* (, non-terminals,
*x*
a string of terminals), by then called 'finite state,' or 'regular' languages,
but not languages generated by recursive productions such as
*uv*,
*x*, by which sequences may be embedded in other sequences an arbitrary
number of times. These were the languages determined by the phrase structure
grammars introduced by Chomsky in 1959, foremost among them the context-free
grammars, and modeling them mathematically required more structure than
the monoid could provide.^{(14)} That was
reflected in the need for internal memory, represented by the addition
of a second (two-way) tape. It was also reflected in the fact, quickly
ascertained, that the languages are not closed under complementation or
intersection and hence cannot be represented by something so tidy as a
Boolean algebra.

What made the context-free grammars so important was the demonstration by Seymour Ginsburg and H.G. Rice in 1961 that large portions of the grammar of the new internationally designed programming language, Algol, as specified in John Backus's new formal notation, BNF (Backus Normal Form, or later Backus-Naur Form), based on the productions of Emil Post, are context-free. When the productions are viewed as set equations, the iterated solutions form lattices, which have as fixpoints the final solutions that correspond to the portions of the language specified by the productions. Of course, were Algol as a whole context-free, then the language so determined would consist of all syntactically valid programs. Here, Schützenberger could bring his model of formal power series to bear. In a group of papers in the early 1960s, he showed that the series he had characterized as 'algebraic' were the fixpoint solutions of context-free grammars. The approach through formal power series also enabled him to show that the grammars were undecidedly ambiguous and that they could be partitioned into two distinct subfamilies, only one of which fitted Chomsky's hierarchy.

Several agendas again converged at this point, as a mix of theoretical
and practical work in both the United States and Germany on parenthesis-free
notation, sequential formula translation for compilers, management of recursive
procedures, and syntactic analysis for machine translation more or less
independently led to the notion of the 'stack' (in German, *Keller*),
or list of elements accessible at one end only on a last in-first out (LIFO)
basis. In the parlance of automata theory, a stack provided a finite automaton
with additional memory in the form of a tape that ran both ways but moved
only one cell at a time and contained only data placed there by the automaton
during operation. Schützenberger and Chomsky showed that such 'pushdown
automata' correspond precisely to context-free languages and embedded the
correspondence in Schützenberger's algebraic model.

The development, hand-in-hand, of automata theory and formal language
theory during the late 1950s and early 1960s gave an empirical aspect to
the construction of a mathematical theory of computation. The matching
of Chomsky's hierarchy of phrase-structure grammars with distinct classes
of automata enlisted such immediate assent precisely because it encompassed
a variety of independent and seemingly disparate agendas, both theoretical
and practical, ranging from machine translation to the specification of
ALGOL and from noise-tolerant codes to natural language. These it united
theoretically in much the same way that Newton's mechanics united not only
terrestrial and celestial phenomena but also a variety of agendas in mathematics,
mechanics, and natural philosophy. By the late 1960s, the theory of automata
and formal languages had assumed canonical shape around an agenda of its
own. In 1969, John Hopcroft and Jeffrey Ullman began their text,
*Formal
Languages and Their Relation to Automata*, by highlighting Chomsky's
mathematical grammar, the context-free definition of ALGOL, syntax-directed
compilation, and the concept of the compiler-compiler.

Since then a considerable flurry of activity has taken place, the results of which have related formal languages and automata theory to such an extent that it is impossible to treat the areas separately. By now, no serious study of computer science would be complete without a knowledge of the techniques and results from language and automata theory.Far from merely complementing the study of computer science, the subject of automata and formal languages became the theoretical core of the curriculum during the 1970s, especially as it was embedded in such tools as lexical analyzers and parser generators. With those tools, what had once required years of effort by teams of programmers became an undergraduate term project.

While the bulk of that new agenda focused on grammars and their application
to programming, part of it pulled the theory back toward the mathematics
from which it had emerged. Increasingly pursued at a level of abstraction
that tended to direct attention away from the physical system, the theory
of automata verged at times on a machine theory of mathematics rather than
a mathematical theory of the machine. "It appeared to me," wrote Samuel
Eilenberg, co-creator of category theory, in the preface of his four-volume
*Automata,
Languages, and Machines* (1974),

that the time is ripe to try and give the subject a coherent mathematical presentation that will bring out its intrinsic aesthetic qualities and bring to the surface many deep results which merit becoming part of mathematics, regardless of any external motivation.Characterized by constructive algorithms rather than proofs of existence, those results constituted, he went on, nothing less than a "new algebra, ... contain[ing] methods and results that are deep and elegant" and that would someday be "regarded as a standard part of algebra." That is, its ties to the computer would disappear.

**FORMAL SEMANTICS**

Yet, even when the computer remained at the focus, some practitioners expressed doubt that formal language theory would suffice as a theory of computation, since computing involved more than the grammar of programming languages could encompass. As noted above, the practical activity of programming created the theoretical space to be filled by a mathematical model of computation. As the size and complexity of the programs expanded, driven by a widely shared sense of the all but unlimited power of the computer as a 'thinking machine,' so too did the expectations of what would constitute a suitable theory. One concept of a "mathematical theory of computation" stems from two papers in the early 1960s by John McCarthy, the creator of LISP and a progenitor of 'artificial intelligence,' yet another agenda of computing, with links both to neurophysiology and to logic. It was McCarthy who invoked Newton as historical precedent for the enterprise:

In this paper I shall discuss the prospects for a mathematical science of computation. In a mathematical science, it is possible to deduce from the basic assumptions, the important properties of the entities treated by the science. Thus, from Newton's law of gravitation and his laws of motion, one can deduce that the planetary orbits obey Kepler's laws.In another version of the paper, he changed the precedent only slightly:

It is reasonable to hope that the relationship between computation and mathematical logic will be as fruitful in the next century as that between analysis and physics in the last. The development of this relationship demands a concern for both applications and for mathematical elegance.The elegance proved easier to achieve than the applications. The trick was to determine just what constituted the entities of computer science.

McCarthy made clear what he expected from a suitable theory in terms of computing: first, a universal programming language along the lines of Algol but with richer data descriptions; second, a theory of the equivalence of computational processes, by which equivalence-preserving transformations would allow a choice of among various forms of an algorithm, adapted to particular circumstances; third, a form of symbolic representation of algorithms that could accommodate significant changes in behavior by simple changes in the symbolic expressions; fourth, a formal way of representing computers along with computation; and finally a quantitative theory of computation along the lines of Shannon's measure of information. What these criteria, recognizably drawn from the variety of agendas mentioned above, came down to in terms of McCarthy's precedents was something akin to traditional mathematical physics: a dynamical representation of a program such that, given the initial values of its parameters, one could determine its state at any later point. Essential to that model was the notion of converging on complexity by perturbations on a fundamental, simple solution, as in the case of a planetary orbit based initially on a two-body, central force configuration. Formal language theory shared with machine translation a similar assumption about converging on natural language through increasingly complex artificial languages.

For McCarthy, that program meant that automata theory would not suffice. Its focus on the structure of the tape belied both the architecture and the complexity of the machine processing the tape. As McCarthy put it,

Computer science must study the various ways elements of data spaces are represented in the memory of the computer and how procedures are represented by computer programs. From this point of view, most of the current work on automata theory is beside the point.It did not suffice to know that a program was syntactically correct. One should be able to give a mathematical account of its semantics, that is, of the meaning of the symbols constituting it in terms of the values assigned to the variables and of the way in which the procedures change those values.

Such an account required a shift in the traditional mathematical view
of a function as a mapping of input values to output values, that is, as
a set of ordered pairs of values. To study how procedures change values,
one must be able to express and transform the structure of the function
as an abstract entity in itself, a sequence of operations by which the
output values are generated from the input values. Seeking an appropriate
tool for that task, McCarthy reached back to mathematical logic in the
1930s to revive Alonzo Church's lambda calculus, originally devised in
1929 in an effort to find a type-free route around Russell's paradox.^{(15)}
The attempt failed, and Church had long since abandoned the approach. The
notation, however, seemed to fit McCarthy's purposes and, moreover, lay
quite close structurally to LISP, which he had been using to explore mechanical
theorem-proving.

The developments leading up to the creation of denotational, or mathematical, semantics in 1970 are too complicated to pursue in detail here. McCarthy's own efforts faltered on the lack of a mathematical model for the lambda calculus itself. It also failed to take account of the peculiar problem posed by the storage of program and data in a common memory. Because computers store programs and data in the same memory, programming languages allow unrestricted procedures which could have unrestricted procedures as values; in particular a procedure can be applied to itself. "To date," Dana Scott claimed in his "Outline of a mathematical theory of computation" in 1970,

no mathematical theory of functions has ever been able to supply conveniently such a free-wheeling notion of function except at the cost of being inconsistent. The mainOne did not need unrestricted procedures to appreciate the problems posed by the self-application facilitated by the design of the computer. Consider the structure of computer memory, representing it mathematically as a mapping of contents to locations. That is, a state is a function mapping each elementmathematicalnovelty of the present study is the creation of a proper mathematical theory of functions which accomplishes these aims (consistently!) and which can be used as the basis for themetamathematicalproject of providing the 'correct' approach to semantics.

Recent work on interval arithmetic suggested to Scott that one might seek justification through a partial ordering of data types and their functions based on the notion of 'approximation' or 'informational content.' With the addition of an undefined element as 'worst approximation' or 'containing no information,' the data types formed a complete lattice, and monotonic functions of them preserved the lattice. They also preserved the limits of sequences of partially ordered data types and hence were continuous. Scott showed that the least upper bound of the lattice, considered as the limit of sequences, was therefore the least fixed point of the function and was determined by the fixed point operator of the lambda calculus. Hence self-applicative functions of the sort needed for computers had a consistent mathematical model. And so too, by the way, did the lambda calculus for the first time in its history. Computer science had come around full circle to the mathematical logic in which it had originated.

**THE LIMITS OF MATHEMATICAL COMPUTER SCIENCE**

Beginning in 1975 and extending over the late 1970s, the Computer Science
and Engineering Research Study, chaired by Bruce Arden of Princeton University,
took stock of the field and its current directions of research and published
the results under the title *What Can Be Automated?* The committee
on theoretical computer science argued forcefully that a process of abstraction
was necessary to understand the complex systems constructed on computers
and that the abstraction "must rest on a mathematical basis" for three
main reasons.

(1) Computers and programs are inherently mathematical objects. They manipulate formal symbols, and their input-output behavior can be described by mathematical functions. The notations we use to represent them strongly resemble the formal notations which are used throughout mathematics and systematically studied in mathematical logic.Defining theoretical computer science as "the field concerned with fundamental mathematical questions about computers, programs, algorithms, and information processing systems in general," the committee acknowledged that those questions tended to follow developments in technology and its application, and hence to aim at a moving target -- strange behavior for mathematical objects.

(2) Programs often accept arbitrarily large amounts of input data; hence, they have a potentially unbounded number of possible inputs. Thus a program embraces, in finite terms, an infinite number of possible computations; and mathematics provides powerful tools for reasoning about infinite numbers of cases.

(3) Solving complex information-processing problems requires mathematical analysis. While some of this analysis is highly problem-dependent and belongs to specific application areas, some constructions and proof methods are broadly applicable, and thus become the subject of theoretical computer science.^{(16)}

Nonetheless, the committee could identify several broad issues of continuing concern, which were being addressed in the areas of computational complexity, data structures and search algorithms, language and automata theory, the logic of computer programming, and mathematical semantics. In each of these areas, it could point to substantial achievements in bringing some form of mathematics to bear on the central questions of computing. Yet, the summaries at the end of each section sounded a repeating chord. For all the depth of results in computational complexity, "the complexity of most computational tasks we are familiar with -- such as sorting, multiplying integers or matrices, or finding shortest paths -- is still unknown."(215) Despite the close ties between mathematics and language theory, "by and large, the more mathematical aspects of language theory have not been applied in practice. Their greatest potential service is probably pedagogic, in codifying and given clear economical form to key ideas for handling formal languages."(234) Efforts to bring mathematical rigor to programming quickly reach a level of complexity that makes the techniques of verification subject to the very concerns that prompted their development.

One might hope that the above ideas, suitably extended and incorporated into verification systems, would enable us to guarantee that programs are correct with absolute certainty. We are about to discuss certain theoretical and philosophical limitations that will prevent this goal from ever being reached. These limitations are inherent in the program verification process, and cannot be surmounted by any technical innovations.(246)Mathematical semantics could show "precisely why [a] nasty surprise can arise from a seemingly well-designed programming language," but not how to eliminate the problems from the outset. As a design tool, mathematical semantics was still far from the goal of correcting the anomalies that gave rise to errors in real programming languages.

If computers and programs were "inherently mathematical objects," the
mathematics of the computers and programs of real practical concern had
so far proved elusive. Although programming languages borrowed the trappings
of mathematical symbolism, they did not translate readily into mathematical
structures that captured the behavior of greatest concern to computing.
To a large extent, theoretical computer science continued the line of inquiry
that had led to the computer in the first place, namely, the establishment
of the limits of what can be computed. While that had some value in directing
research and development away from dead ends, it offered little help in
resolving the problems of computing that lay well within those limits.
For that reason, computer science remains an amalgam of mathematical theory,
engineering practice, and craft skill.

1. "An Undergraduate Program in Computer Science--Preliminary
Recommendations," *Communications of the ACM*, 8,9(1965), 543-552;
at 544.

2. Bruce W. Arden (ed.), *What Can Be Automated?:
The Computer Science and Engineering Research Study (COSERS)* (Cambridge,
MA: MIT Press, 1980), 9. The aphorism was not an exaggeration. For example,
Garrett Birkhoff's lattice theory proved a fundamental tool of the field,
yet he himself had discerned little of interest in its finite structures:
"The two unsolved problems in binary algebra which I have described illustrate
the fact that *genuine applications can suggest simple and natural but
extremely difficult problems*, which are overlooked by pure theorists.
Thus, while working for 30 years (1935-1965) on generalizing Boolean algebra
to lattice theory, I regarded finite Boolean algebras as trivial because
they could all be described up to isomorphism, and completely ignored the
basic 'shortest form' and 'optimal packing' problems described above."

3. "On Computable Numbers, with an Application to
the Entscheidungsproblem,"
*Proceedings of the London Mathematical Society*,
ser.2, vol. 42(1936), 230-265; at 231.

4. D. Hilbert and W. Ackermann, *Grundzüge
der theoretischen Logik* (Berlin: Springer, 1928), 73-4: Das Entscheidungsproblem
ist gelöst, wenn man ein Verfahren kennt, das bei einem vorgelegten
logischen Ausdruck durch endlich viele Operationen die Entscheidung über
die Allgemeingültigkeit bzw. Erfüllbarkeit erlaubt. Die Lösung
des Entscheidungsproblems ist für die Theorie aller Gebiete, deren
Sätze überhaupt einer logischen Entwickelbarkeit aus endlich
vielen Axiomen fähig sind, von grundsätzlicher Wichtigkeit.

5. Stephen C. Kleene, "Origins of Recursive Function
Theory," *AHC* 3,1(1981), 52-67

6. Claude E. Shannon, "A symbolic analysis of relay
and switching circuits,"
*Transactions of the AIEE* 57(1938), 713-23

7. Michael O. Rabin and Dana Scott, "Finite automata
and their decision problems,"
*IBM Journal* (April, 1959), 114-25;
at 114.

8. Published in *Cerebral Mechanisms in Behavior--The
Hixon Symposium*, ed. L.A. Jeffries (New York: Wiley, 1951), 1-31; repr.
in *Papers of John von Neumann on Computing and Computer Theory*,
ed. William Aspray and Arthur Burks (Cambridge, MA/London: MIT Press; Los
Angeles/San Francisco: Tomash Publishers, 1987), 391-431; at 406.

9. Alan M. Turing, "Computing Machinery and Intelligence,"
Warren S. McCulloch and Walter Pitts, "A logical calculus of the ideas
immanent in nervous activity,"
*Bulletin of Mathematical Biophysics*
5(1943), 115-33.

10. "Representation of events in nerve nets and
finite automata," in *Automata Studies*, ed. Claude Shannon and John
McCarthy (Princeton: Princeton University Press, 1956), 3-41. The *union*
of two sets of strings *U* and *V* is the set of strings belonging
to one or the other; the *product* is the set of strings *uv*
such that *u* *U* and *v* *V*. The* complement U
*of
*U* is the set of strings not in *U*, and the *iterate*
of *U* is the union of all products *u*^{n}, where *u*^{0}
is the empty string, and *u*^{k+1} = *uu*^{k};
i.e. the 'power set' of *U*.

11. For the agenda of mathematics see *inter alia*
Garret Birkhoff, "The Rise of Modern Algebra to 1936" and "The Rise of
Modern Algebra, 1936-1950" in his
*Selected Papers on Algebra and Topology*,
ed. Gian-Carlo Rota and Joseph S. Oliveira, 561-605, and, more generally,
Herbert Mehrtens, *Moderne Sprache Mathematik: Eine Geschichte des Streits
um die Grundlagen der Disziplin und des Subjekts formaler Systeme *(Frankfurt
am Main: Suhrkamp, 1990).

12. "Il est particulièrement remarquable
que les concepts fondamentaux de [la théorie des demi-groupes],
introduits par P. Dubreil en 1941, et étudiés par lui-même
et son école du point de vue abstrait, aient des interprétations
immédiates et importantes sur le plan de la réalisation concrète
des machines codeuses ou transcodeuse." M.P.Schützenberger, "Une théorie
algébrique du codage,"
*Séminaire P. Dubreil et C. Pisot*
(Faculté des Sciences de Paris), Année 1955/56, No.15 (dated
27 February 1956); p.15-02. Cf. "On an application of semi groups[!] methods
to some problems in coding," *IRE Transactions in Information Theory*
2,3(1956), 47-60. Pierre Dubreil was a member of Bourbaki and one of the
foremost exponents of semigroups.

13. A semigroup consists of a set of objects and
an associative operation; that is, if *s*_{1} and *s*_{2}
are elements of the set and '' denotes the operation, then *s*_{1}*s*_{2}
is also an element of the set, and *s*_{1}(*s*_{2}*s*_{3})
= (*s*_{1}*s*_{2})*s*_{3}. A homomorphism
is a mapping from one mathematical structure to another that preserves
operations. For example, the logarithmic function is a homomorphism between
positive real numbers under multiplication and real numbers under addition:
*log
AB* = *log A + log B*.

14. Chomsky began his formalization of phrase-structure
grammars by characterizing a grammar *G* as a semigroup generated
by concatenation of strings formed from a finite set *V* of symbols
together with an identity element *I*. *V* is the union of two
disjoint subsets *V*_{T} and *V*_{N}, the former
comprising the terminal symbols, the latter the non-terminal symbols, of
the vocabulary. *V*_{T} includes a boundary element #; *V*_{N}
a symbol *S* denoting a sentence. , meaning 'can be rewritten as,'
is a two-place relation defined on the elements of *G* by four axioms:
(1) is irreflexive, (2) *A* *V*_{N} iff there exist ,,
such that *A* , (3) there are no ,, such that #, and (4) there is
a finite set of pairs (_{1}, _{1}),...,(_{n},
_{n})
such that, for all and , iff there exist _{1}, _{2}, and
*j*
*n* such that =
_{1j2} and = _{1j2}. Chomsky then
defined a -derivation of as a set (_{1},...,_{n}) (*n*
1) such that = _{1}, = _{n}, and _{ii+1} for 1*i*<*n*.
Such a derivation is terminated if it is not a proper initial subsequence
of any derivation. The terminal language *L*_{G} generated
by *G* is the set of strings for which there is a terminated #*S*#-derivation,
and two grammars are equivalent if their terminal languages are equal.
Finally, if there is a -derivation of .

It was from restrictions on these definitions that Chomsky derived the
classes of grammars. Consider a rule in terms of a non-terminal *A*
and symbols _{1}, _{2}, such that = _{1}*A*_{2}
and = _{12}. If one requires only that not be the identity element,
then the condition specifies all rules of the form _{1}*A*_{212},
that is, *A* in the context of _{1} and _{2}, either
of which may be null. If one requires that they both be null, i.e. that
*A* directly, then another class emerges, which Chomsky later termed
'context-free.' If, finally, the rules are limited to the form *AaB*
or *A* *a*, where *a* is a terminal, then the grammar describes
a finite-state language. Labeling the three conditions in order, Chomsky
denoted the corresponding grammars and languages as 'type
*i*.' Type
0 denoted the Turing machine, and type 3 corresponded to deterministic
finite automata. Types 1 and 2 constituted the phrase-structure grammars
of linguistic interest. Whether they too had machine representations of
a similar sort was the next question.

15. As a system of notation the lambda calculus
rewrites the usual functional notation, say, *f*(*x*) = *x*^{2}
+ 1, to abstract *f* itself as an operation on a variable and to designate
the variable(s) bound by the function, e.g. *f* = *x*.[*x*^{2}
+ 1]. *f* acquires a value by applying that operation to an argument,
e.g. *f*(3) = *x*.[*x*^{2} + 1](3) = 10. Lambda
expressions may be applied to other lambda expressions as arguments; indeed,
that is where its power lies as a tool of mathematical logic. Such applications
quickly lead to quite complicated expressions reducible in several different
ways. The original agenda involved demonstrating the existence of normal
forms and the equivalence of reductions to them. One of Turing's first
extensions of his 1936 paper was to show that the lambda calculus and Turing
machines are of equivalent power as models of computability.

16. *What Can Be Automated?*, 139.

**FURTHER READING**

*ACM Turing Award Lectures: The First Twenty Years, 1966 to 1985*
(New York: Reading, Mass.: ACM Press; Addison-Wesley Pub. Co., 1987); lectures
since 1986 published annually in *Communications of the ACM*

William Aspray (ed.), *Computing Before Computers* (Ames: Iowa
State University Press, 1990)

Bruce W. Arden (ed.), *What Can Be Automated?: The Computer Science
and Engineering Research Study (COSERS)* (Cambridge, MA: MIT Press,
1980)

Thomas J. Bergin, Jr. and Richard G. Gibson, Jr. (eds.), *History
of Programming Languages II* (New York : ACM Press; Reading, Mass.:
Addison-Wesley Pub. Co., 1996)

Martin Campbell-Kelly and William Aspray, *Computer: A History of
the Information Machine* (New Yorl: Basic Books, 1996)

Jan van Leeuwen (ed.), *Handbook of Theoretical Computer Science*
(2 vols., Amsterdam; New York: Elsevier; Cambridge, Mass.: MIT Press, 1990)

Herbert Mehrtens, *Moderne Sprache Mathematik: Eine Geschichte des
Streits um die Grundlagen der Disziplin und des Subjekts formaler Systeme
*(Frankfurt
am Main: Suhrkamp, 1990)

Richard L. Wexelblat (ed.) *History of Programming Languages* (New
York : Academic Press, 1981)