The Search for a Mathematical Theory

Michael S. Mahoney

[Published, with abbreviated notes, in John Krige and Dominique Pestre (eds.), Science in the 20th Century (Amsterdam: Harwood Academic Publishers, 1997), Chap. 31]


The discipline of computer science has evolved dynamically since the creation of the device that now generally goes by that name, that is, the electronic digital computer with random-access memory in which instructions and data are stored together and hence the instructions can change as the program is running. The first working computers were the creation not of scientific theory, but of electrical and electronic engineering practice applied to a longstanding effort to mechanize calculation. The first applications of the computer to scientific calculation and electronic data processing were tailored to the machines on which they ran, and in many respects they reflected earlier forms of calculating and tabulating devices. Originally designed as a tool of numerical calculation, in particular for the solution of non-linear differential equations for which no analytical solution was available, the computer led immediately to the new field of numerical analysis. There, speed and efficiency of computation determined the initial agenda, as mathematicians wrestled large calculations into machines with small memories and with basic operations measured in milliseconds.

As the computer left the laboratory in the mid-1950s and entered both the defense industry and the business world as a tool for data processing, for real-time command and control systems, and for operations research, practitioners encountered new problems of non-numerical computation posed by the need to search and sort large bodies of data, to make efficient use of limited (and expensive) computing resources by distributing tasks over several processors, and to automate the work of programmers who, despite rapid growth in numbers, were falling behind the even more quickly growing demand for systems and application software. The emergence during the 1960s of time-sharing operating systems, of computer graphics, of communications between computers, and of artificial intelligence increasingly refocused attention from the physical machine to abstract models of computation as a dynamic process.

Most practitioners viewed those models as mathematical in nature and hence computer science as a mathematical discipline. But it was mathematics with a difference. While insisting that computer science deals with the structures and transformations of information analyzed mathematically, the first Curriculum Committee on Computer Science of the Association for Computing Machinery (ACM) in 1965 emphasized the computer scientists' concern with effective procedures:

The computer scientist is interested in discovering the pragmatic means by which information can be transformed to model and analyze the information transformations in the real world. The pragmatic aspect of this interest leads to inquiry into effective ways to accomplish these at reasonable cost. (1)
1. "An Undergraduate Program in Computer Science--Preliminary Recommendations," Communications of the ACM, 8,9(1965), 543-552; at 544.
A report on the state of the field in 1980 reiterated both the comparison with mathematics and the distinction from it:

Mathematics deals with theorems, infinite processes, and static relationships, while computer science emphasizes algorithms, finitary constructions, and dynamic relationships. If accepted, the frequently quoted mathematical aphorism, 'the system is finite, therefore trivial,' dismisses much of computer science. (2)
Computer people knew from experience that 'finite' does not mean 'feasible' and hence that the study of algorithms required its own body of principles and techniques, leading in the mid-1960s to the new field of computational complexity. Talk of costs, traditionally associated with engineering rather than science, involved more than money. The currency was time and space, as practitioners strove to identify and contain the exponential demand on both as even seemingly simple algorithms were applied to ever larger bodies of data. Yet, central as algorithms were to computer science, the report continued, they did not exhaust the field, "since there are important organizational, policy, and nondeterministic aspects of computing that do not fit the algorithmic mold."

There have been some notable exceptions to the notion of computing as essentially a mathematical science. In a widely read letter to the editor of Communications of the ACM, Allan Newell, A.J. Perlis, and Herbert Simon, each a recipient of the ACM's highest honor, the Turing Award, argued that the huge gap between what programmers intend and what computers actually do makes computing an empirical science. Other winners have echoed that view, among them Marvin Minsky, one of the creators of artificial intelligence, and Donald Knuth, whose two-volume Art of Computer Programming has long been the standard reference tool for programmers.

Thus, in striving toward theoretical autonomy, computer science has always maintained contact with practical applications, blurring commonly made distinctions among science, engineering, and craft practice. That characteristic makes the field both a resource for the reexamination of these distinctions and an elusive subject to encompass in a short historical account. What follows, therefore, focuses on the core of the search for a theory of computing, namely the effort to express the computer and computation in mathematical terms adequate to practical experience and applicable to it.

2. Bruce W. Arden (ed.), What Can Be Automated?: The Computer Science and Engineering Research Study (COSERS) (Cambridge, MA: MIT Press, 1980), 9. The aphorism was not an exaggeration. For example, Garrett Birkhoff's lattice theory proved a fundamental tool of the field, yet he himself had discerned little of interest in its finite structures: "The two unsolved problems in binary algebra which I have described illustrate the fact that genuine applications can suggest simple and natural but extremely difficult problems, which are overlooked by pure theorists. Thus, while working for 30 years (1935-1965) on generalizing Boolean algebra to lattice theory, I regarded finite Boolean algebras as trivial because they could all be described up to isomorphism, and completely ignored the basic 'shortest form' and 'optimal packing' problems described above."
In tracing the emergence of a discipline, it is useful to think in terms of its agenda, that is, what practitioners of the discipline agree ought to be done, a consensus concerning the problems of the field, their order of importance or priority, the means of solving them, and perhaps most importantly, what constitute solutions. Becoming a recognized practitioner of a discipline means learning the agenda and then helping to carry it out. Knowing what questions to ask is the mark of a full-fledged practitioner, as is the capacity to distinguish between trivial and profound problems. Whatever specific meaning may attach to "profound," generally it means moving the agenda forward. One acquires standing in the field by solving the problems with high priority, and especially by doing so in a way that extends or reshapes the agenda, or by posing profound problems.(2) The standing of the field may be measured by its capacity to set its own agenda. New disciplines emerge by acquiring that autonomy. Conflicts within a discipline often come down to disagreements over the agenda: what are the really important problems? Irresolvable conflict may lead to new disciplines in the form of separate agendas.

As the shared Latin root indicates, agendas are about action: what is to be done? Since what practitioners do is all but indistinguishable from the way they go about doing it, it follows that the tools and techniques of a field embody its agenda. When those tools are employed outside the field, either by a practitioner or by an outsider borrowing them, they bring the agenda of the field with them. Using those tools to address another agenda means reshaping the latter to fit the tools, even if it may also lead to a redesign of the tools. What gets reshaped, and to what extent, depends on the relative strengths of the agendas of borrower and borrowed.

2a. (Added 4/21/05) In "The Phenomenology of Mathematical Proof", Gian-Carlo Rota says of Andrew Wiles' famed proof of Fermat's "last theorem": "The actual value of what Wiles and his collaborators did is far greater than the proof of Fermat's conjecture. The point of the proof of Fermat's conjecture is to open up new possibilities for mathematics. Wiles and his collaborators show that the conjectures of Taniyama and Weil do indeed have the power that they were suspected to have. They rekindle our faith in the central role of the theory of elliptic functions in mathematics. They develop a host of new techniques that will lead to further connections between number theory and algebraic geometry. Future mathematicians will discover new applications, they will solve other problems, even problems of great practical interest, by exploiting Wiles' proof and techniques. To sum up, the value of Wiles' proof lies not in what it proves, but in what it has opened up, in what it will make possible." Indiscrete Thoughts (Boston/Basel/Berlin: Birhäuser, 1997), 143-44.


At the outset, computing had no agenda of its own from which a science might have emerged. As a physical device, the computer was not the product of a scientific theory from which it inherited an agenda. Rather, computers and computing posed a constellation of problems that intersected with the agendas of various fields. As practitioners of those fields took up the problems, applying to them the tools and techniques familiar to them, they gradually defined an agenda for computer science. Or, rather, they defined a variety of agendas, some mutually supportive, some orthogonal to one another, and some even at cross purposes. Theories are about questions, and where the nascent subject of computing could not supply the next question, the agenda of the outside field often provided it.

When the computer came into existence, there were two areas of mathematics concerned with it as a subject in itself rather than as a means for doing mathematics. One was mathematical logic and the theory of computable functions. To make the notion of 'computable' as clear and simple as possible, Alan Turing proposed in 1936 a mechanical model of what a human does when computing:

We may compare a man in the process of computing a real number to a machine which is only capable of a finite number of conditions q1, q2, ..., qR which will be called 'm-configurations.' The machine is supplied with a 'tape' (the analogue of paper) running through it, and divided into sections (called 'squares') each capable of bearing a 'symbol.' (3)
3. "On Computable Numbers, with an Application to the Entscheidungsproblem," Proceedings of the London Mathematical Society, ser.2, vol. 42(1936), 230-265; at 231.
Turing imagined, then, a tape divided into cells, each containing one of a finite number of symbols. The tape passes through a machine that can read the contents of a cell, write to it, and move the tape one cell in either direction. What the machine does depends on its current state, which includes a signal to read or write, a signal to move the tape right or left, and a shift to the next state. The number of states is finite, and the set of states corresponds to the computation. Since a state may be described in terms of five symbols (current state, input, output, shift right/left, next state), a computation may itself be expressed as a sequence of symbols, which can also be placed on the tape, thus making possible a universal machine that can read a computation and then carry it out by emulating the machine described by it. [Correction, July 2004] The original article spoke here of "three symbols (read/write, shift left/right, next state)", but as an entry in a state table, the description must begin with the current state itself and the input symbol on which the next three actions depend. Each state will have as many entries as there are input symbols.

Turing's machine, or rather his monograph, fitted into the current agenda of mathematical logic. The Entscheidungsproblem stemmed from David Hilbert's program of formalizing mathematics; as stated in the textbook he wrote with W. Ackermann,

The Entscheidungsproblem is solved when one knows a procedure by which one can decide in a finite number of operations whether a given logical expression is generally valid or is satisfiable. The solution of the Entscheidungsproblem is of fundamental importance for the theory of all fields, the theorems of which are at all capable of logical development from finitely many axioms. (4)

Having written the paper for W.H.A. Newman's senior course at Cambridge, Turing turned for graduate study to Alonzo Church at Princeton, who had recently introduced an equivalent notion of 'computability,' which he called 'effective calculability.' Turing subsequently showed that his machine had the same power as Church's lambda calculus or Stephen Kleene's recursive function theory for determining the range and limitations of axiom systems for mathematics. (5)

The other area of mathematics pertinent to the computer originated in a 1938 paper by Claude Shannon of MIT and Bell Telephone Laboratories. (6) In it Shannon had shown how Boolean algebra could be used to analyze and design switching circuits. It took some time for the new technique to take hold, and it came to general attention among electrical engineers only in the late '40s and early 1950s. The literature grew rapidly in the following years, but much of it took a standard form. Addressed to an audience of electrical engineers, most articles included an introduction to Boolean algebra, emphasizing its affinity to ordinary algebra and the rules of idempotency and duality that set it apart. Either by way of motivation at the outset, or of application at the end, the articles generally included diagrams of circuits and of various forms of switches and relays, the former to establish the correspondence between the circuits and Boolean expressions, the latter to illustrate various means of embodying basic Boolean expressions in concrete electronic units. The articles gave examples of the transformation of Boolean expressions to equivalent forms, which corresponded to the analysis of circuits. Then they turned to synthesis by introducing the notion of a Boolean function and its expression in the normal or canonical form of a Boolean polynomial. The articles showed how the specification of a switching circuit in the form of a table of inputs and outputs can be translated into a polynomial, the terms of which consist of products expressing the various combinations of inputs. From the polynomial one could then calculate the number of switches required to realize the circuit and seek various transformations to reduce that number to a minimum.

4. D. Hilbert and W. Ackermann, Grundzüge der theoretischen Logik (Berlin: Springer, 1928), 73-4: Das Entscheidungsproblem ist gelöst, wenn man ein Verfahren kennt, das bei einem vorgelegten logischen Ausdruck durch endlich viele Operationen die Entscheidung über die Allgemeingültigkeit bzw. Erfüllbarkeit erlaubt. Die Lösung des Entscheidungsproblems ist für die Theorie aller Gebiete, deren Sätze überhaupt einer logischen Entwickelbarkeit aus endlich vielen Axiomen fähig sind, von grundsätzlicher Wichtigkeit.

5. Stephen C. Kleene, "Origins of Recursive Function Theory," AHC 3,1(1981), 52-67

6. Claude E. Shannon, "A symbolic analysis of relay and switching circuits," Transactions of the AIEE 57(1938), 713-23

As the notion of an algebraic theory of switching circuits took root, authors began shifting their attention to its inadequacies. They were not trivial, especially when applied to synthesis or design of circuits, rather than analysis of given circuits. For circuits of any complexity, representation in Boolean algebra presented large, complicated expressions for reduction to minimal or canonical form, but the algebra lacked any systematic, i.e., algorithmic, procedure for carrying out that reduction. The symbolic tool meant to make circuits more tractable itself became intractable when the number of inputs grew to any appreciable size.

Neither Turing's nor Shannon's mathematics quite fit the actual computer. Turing effectively showed what a computer could not do, even if given infinite time and space to do it. But Turing's model gave little insight into what a finite machine could do. As Michael Rabin and Dana Scott observed in their seminal paper on finite automata:

Turing machines are widely considered to be the abstract prototype of digital computers; workers in the field, however, have felt more and more that the notion of a Turing machine is too general to serve as an accurate model of actual computers. It is well known that even for simple calculations it is impossible to give an a priori upper bound on the amount of tape a Turing machine will need for any given computation. It is precisely this feature that renders Turing's concept unrealistic.(7)
7. Michael O. Rabin and Dana Scott, "Finite automata and their decision problems," IBM Journal (April, 1959), 114-25; at 114.
By contrast, Boolean algebra described how the individual steps of a calculation were carried out, but it said nothing about the nature of the computational tasks that could be accomplished by switching circuits. A theory of computation apposite to the electronic digital stored-program computer with random-access memory lay somewhere between the Turing machine and the switching circuit.

Computer science arose out of the effort to fill that gap between the infinite and the finite, a region that had hitherto attracted little interest from mathematicians. Just what and who would fill it was not self-evident. In a lecture "On a logical and general theory of automata" in 1948, John von Neumann pointed out that automata and hence computing machines had been the province of logicians rather than mathematicians, and a methodological gulf separated the two:

There exists today a very elaborate system of formal logic, and, specifically, of logic as applied to mathematics. This is a discipline with many good sides, but also with certain serious weaknesses. This is not the occasion to enlarge upon the good sides, which I certainly have no intention to belittle. About the inadequacies, however, this may be said: Everybody who has worked in formal logic will confirm that it is one of the technically most refractory parts of mathematics. The reason for this is that it deals with rigid, all-or-none concepts, and has very little contact with the continuous concept of the real or of the complex number, that is, with mathematical analysis. Yet analysis is the technically most successful and best-elaborated part of mathematics. Thus formal logic is, by the nature of its approach, cut off from the best cultivated portions of mathematics, and forced onto the most difficult part of the mathematical terrain, into combinatorics.
The theory of automata, of the digital, all-or-none type, as discussed up to now, is certainly a chapter in formal logic. It will have to be, from the mathematical point of view, combinatory rather than analytical. (8)
8. Published in Cerebral Mechanisms in Behavior--The Hixon Symposium, ed. L.A. Jeffries (New York: Wiley, 1951), 1-31; repr. in Papers of John von Neumann on Computing and Computer Theory, ed. William Aspray and Arthur Burks (Cambridge, MA/London: MIT Press; Los Angeles/San Francisco: Tomash Publishers, 1987), 391-431; at 406.
Von Neumann subsequently made it clear he wanted to pull the theory back toward the realm of analysis, and he did not expand upon the nature of the combinatory mathematics that might be applicable to it.

It was not clear what tools to use in large part because it was not clear what job to do. What should a theory of computing be about? Different answers to that question led to different problems calling for different methods. Or rather, it opened the theory of computing to different approaches, shaped as much by the taste and training of the theorist as by the parameters of the problem. As Figure 1 illustrates, people came to the theory of computing from a variety of disciplinary backgrounds, most of them mathematical in form yet diverse in the particular kind of mathematics they used. Some of them were centrally concerned with computing; others took problems in computing as targets of opportunity for methods devised for other purposes altogether. In the long run, it all seemed to work out so well as to lie in the nature of the subject. But computers had no nature until theorists created one for them, and different theorists had different ideas about the nature of computers.


As von Neumann's statement makes clear, automata traditionally lay in the province of logicians, who had contemplated the mechanization of reasoning at least since Leibniz expounded the resolution of disagreements through calculation. Since logical reasoning was associated with rational thought, automata formed common ground for logicians and neurophysiologists. Before Turing himself made the case for a thinking machine and proposed his famous test in 1950, his original paper provoked Warren McCulloch and Walter Pitts in 1943 to model the behavior of connected neurons in the symbolic calculus of propositions and to show that in principle such 'nervous nets' were capable of carrying out any task for which complete and unambiguous instructions could be specified. (9) The behavior of the nets did not depend on the material of the components but only on their behavior and connectivity, and hence they could equally well be realized by switching circuits.

9. Alan M. Turing, "Computing Machinery and Intelligence," Warren S. McCulloch and Walter Pitts, "A logical calculus of the ideas immanent in nervous activity," Bulletin of Mathematical Biophysics 5(1943), 115-33.

Automata thus brought convergence to the agendas of the neurophysiologists and of the electrical engineers. Shannon's use of Boolean algebra to analyze and design switching circuits spread quickly in the late 1940s and early 1950s. In its original form, the technique applied to combinational circuits, the output of which depends only on the input. Placing a secondary relay in a circuit made its behavior dependent on its internal state as well, which could be viewed as a form of memory. Such sequential circuits could also be represented by Boolean expressions. For both circuits, the basic problem remained the reduction of complex Boolean expressions to their simplest form, which in some cases (but not all) meant the least number of components, a major concern when switches were built from expensive and unreliable vacuum tubes.

The analysis of sequential circuits gave rise then to the notion of a sequential machine, described in terms of a set of states, a set of input symbols, a set of output symbols, and a pair of functions mapping input and current state to next state and output, respectively. Starting in an initial state, a sequential machine read a sequence of symbols and outputted a string of symbols in response to it. The output was not essential to the model; one could restrict one's attention to the final state of the machine after reading the input. 'Black-boxing' the machine shifted attention from the circuits to the input and output, and thus to its description in terms of the inputs it recognizes or the transformations it carries out on them. With the shift came two new sets of questions: first, what can one say about the states of a machine on the basis simply of its input-output behavior and, second, what sorts of sequences can a machine with a finite set of states recognize and what sorts of transformations can it carry out on them?

In 1951, working from the paper by McCulloch and Pitts and stimulated by the page proofs of von Neumann's lecture quoted above, Stephen C. Kleene established a fundamental property of finite-state automata, namely that the sets of tapes recognized by them form a Boolean algebra. That is, if L1 and L2 are sets of strings recognized by a finite automaton, then so too are their union, product, complements, and iterate or closure. (10) Still working in the context of nerve nets, Kleene called such sets 'regular events,' but they soon came to be called 'regular languages,' marking thereby a new area of convergence with coding theory and linguistics.

10. "Representation of events in nerve nets and finite automata," in Automata Studies, ed. Claude Shannon and John McCarthy (Princeton: Princeton University Press, 1956), 3-41. The union of two sets of strings U and V is the set of strings belonging to one or the other; the product is the set of strings uv such that u U and v V. The complement U of U is the set of strings not in U, and the iterate of U is the union of all products un, where u0 is the empty string, and uk+1 = uuk; i.e. the 'power set' of U.

By the end of the 1950s a common mathematical model of the finite automaton had emerged among the convergent agendas, set forth in the seminal paper by Michael Rabin and Dana Scott mentioned above. Their formalism, aimed at retaining the flavor of the machine and its ties to Turing machines, became the standard for the field, beginning, significantly, with tapes formed by finite sequences of symbols from a finite alphabet . The class of all such tapes, including the empty tape , is denoted T. A finite automaton A over an alphabet is a quadruple (S, M, s0, F), where S is a finite set of states, M is a function which for every pair consisting of a state and a symbol specifies a resulting state, s0 is an initial state, and F is a subset of S comprising the final states. As defined, the automaton works sequentially on symbols, but by an obvious extension of M to sequences, it becomes a tape machine, so that M(s,x) is the state at which the machine arrives after starting in state s and traversing the tape x. In particular, if M(s0,x) is one of the final states in F, the automaton 'accepts' x. Let T(A) be the set of tapes in T accepted, or defined, by automaton A, and let be the class of all sets of tapes defined by some automaton.

Rabin and Scott pursued two related sets of questions: first, what can be determined about the set of tapes T(A) accepted by a given automaton A, and, second, what is the mathematical structure of T(A) and of the larger class ? Taking a lead from Kleene's results, they established that constitutes a Boolean algebra of sets and hence that it contains all finite sets of tapes. Two central decision problems were now immediately solvable. First, if an automaton A accepts any tapes, it accepts a tape of length less than the number of its internal states, and therefore an effective procedure exists for determining whether T(A) is empty (the 'emptiness problem'). Second, if A is an automaton with r internal states, then T(A) is infinite if and only if it contains a tape of length greater than or equal to r and less than or equal to 2r; again, an effective procedure follows for determining whether the set of definable tapes is infinite.

At two points in their study, Rabin and Scott identified the set T with the semigroup, an algebraic structure that had until then had received very little attention from mathematicians. But they did not make much use of the concept, approaching their model combinatorially and in essence constructing automata to meet the conditions of the problem. That reflected the agenda of mathematical logic, a subject still largely on the periphery of mathematics, outside its increasingly algebraic agenda. (11) Yet, as the reference to the semigroup suggests, it was making contact with algebra. The link had been forged by way of coding theory.

11. For the agenda of mathematics see inter alia Garret Birkhoff, "The Rise of Modern Algebra to 1936" and "The Rise of Modern Algebra, 1936-1950" in his Selected Papers on Algebra and Topology, ed. Gian-Carlo Rota and Joseph S. Oliveira, 561-605, and, more generally, Herbert Mehrtens, Moderne Sprache Mathematik: Eine Geschichte des Streits um die Grundlagen der Disziplin und des Subjekts formaler Systeme (Frankfurt am Main: Suhrkamp, 1990).

Coding theory arose from another path-breaking paper of Shannon's, his Mathematical Theory of Communication, first published in 1948. There Shannon offered a mathematical model of information in terms of sequences of signals transmitted over a line and posed the fundamental question of how to preserve the integrity of the sequence in the presence of distortion. The answer to that question took the form of sequential codes that by their very structure revealed the presence of errors and enabled their correction.

In a vaguely related way, circuit theory and coding theory verged on a common model from opposite directions. The former viewed a sequential machine as shifting from one internal state to another in response to input; the transition might, but need not involve output, and the final state could be used as a criterion of acceptance of the input. The main question was how to design the optimal circuit for given inputs, a matter of some importance when one worried about the cost and reliability of gates. Coding theory focused on the problem of mapping the set of messages to be transmitted into the set of signals available for transmission in such a way that the latter permit unambiguous retranslation when received. That means, first and foremost, that the stream of signals can be uniquely scanned into the constituents of the message. In addition, the process must take place as it occurs, and it should be immune to noise. That is, the code must be unambiguously decipherable as encountered sequentially, even if garbled in transmission.

It was in the context of coding theory that the agendas grouped around the finite automaton intersected with an important mathematical agenda centered in Paris. In "Une théorie algébrique du codage," Marcel P. Schützenberger, a member of Pierre Dubreil's seminar on algebra and number theory at the Faculté des Sciences in Paris, linked coding to the 'fundamental algebraic structure' of Bourbaki:

It is particularly noteworthy that the fundamental concepts of semigroup theory, introduced by P. Dubreil in 1941 and studied by him and his school from an abstract point of view, should have immediate and important interpretations on the level of the concrete realization of coding machines and transducers. (12)

12. "Il est particulièrement remarquable que les concepts fondamentaux de [la théorie des demi-groupes], introduits par P. Dubreil en 1941, et étudiés par lui-même et son école du point de vue abstrait, aient des interprétations immédiates et importantes sur le plan de la réalisation concrète des machines codeuses ou transcodeuse." M.P.Schützenberger, "Une théorie algébrique du codage," Séminaire P. Dubreil et C. Pisot (Faculté des Sciences de Paris), Année 1955/56, No.15 (dated 27 February 1956); p.15-02. Cf. "On an application of semi groups[!] methods to some problems in coding," IRE Transactions in Information Theory 2,3(1956), 47-60. Pierre Dubreil was a member of Bourbaki and one of the foremost exponents of semigroups.
Schützenberger translated the problem of coding into the conceptual structure of abstract semigroup theory by viewing a sequence of symbols as a semigroup and codes as homomorphisms between semigroups. The details are too complex to follow here, but the trick lay in determining that the structural properties of semigroups and their homomorphisms gave insight into the structure of codes. (13) Schützenberger showed that they did and extended his analysis from the specific question of codes to the more general problem of identifying the structures of sequences of symbols with the structure of devices that recognize those structures, that is, to the problem of automata.

13. A semigroup consists of a set of objects and an associative operation; that is, if s1 and s2 are elements of the set and denotes the operation, then s1 s2 is also an element of the set, and s1(s2s3) = (s1s2)s3. A homomorphism is a mapping from one mathematical structure to another that preserves operations. For example, the logarithmic function is a homomorphism between positive real numbers under multiplication and real numbers under addition: log AB = log A + log B.

The resulting semigroup theory of machines thus gave a mathematical account of sequential machines that pulled together a range of results and perspectives. More importantly, perhaps, it suggested mathematical structures for which there was at the time no correlate in automata theory. Within Schützenberger's mathematical agenda, one moved from basic structures to their extension by means of other structures. In "Un problème de la théorie des automates" he set out the semigroup model of the finite automaton and then looked beyond it, aiming at an algebraic characterization based on viewing a regular language as a formal sum, or formal power series, x of the strings it comprises and considering the series as a 'rational' function of the x X:

We then generalize that property by showing that if, instead of being finite, S is the direct product of a finite set by Z (the additive group of integers) and if the mapping (S, X) S' and the subset S1 are defined in a suitable fashion, then the corresponding sum is in a certain sense 'algebraic.'
That is, the extended sum takes the form <z,x>x, where <z,x> is the integer mapped to the string x. The terminology reflected the underlying model. Were the semigroup of tapes commutative, the sums would correspond to ordinary power series, to which the terms 'rational' and 'algebraic' would directly apply. The generalization was aimed at embedding automata in the larger context of the algebra of rings.

Initially, Schützenberger suggested no interpretation for the extension; that is, it corresponded to no new class of automata or of languages accepted by them. In fact, both the languages and the automata were waiting in the wings. They came on stage through the convergence of finite automata with yet another previously independent line of investigation, namely the new mathematical linguistics under development by Noam Chomsky. The convergence of agendas took the concrete form of Schützenberger's collaboration with Chomsky on context-free grammars, then just getting underway. The result was the mathematical theory of formal languages.


Formal language theory similarly rested at the start on an empirical base. Chomsky's seminal article, "Three models of language" (1956), considered grammar as a mapping of finite strings of symbols into all and only the grammatical sentences of the language:

The grammar of a language can be viewed as a theory of the structure of this language. Any scientific theory is based on a certain finite set of observations and, by establishing general laws stated in terms of certain hypothetical constructs, it attempts to account for these observations, to show how they are interrelated, and to predict an indefinite number of new phenomena. A mathematical theory has the additional property that predictions follow rigorously from the body of theory.

The theory he was seeking concerned the fact of grammars, rather than grammars themselves. Native speakers of a language extract its grammar from a finite number of experienced utterances and use it to construct new sentences, all of them grammatical, while readily rejecting ungrammatical sequences. Chomsky was looking for a metatheory to explain how to accomplish this feat, or at least to reveal what it entailed. (Ultimately the physical system being explained was the human brain, the material basis of both language and thought.) In 1958 he made the connection with finite automata through a collaborative investigation with George Miller on finite state languages and subsequently joined forces with Schützenberger, then a visitor at MIT.

As powerful as the semigroup seemed to be in capturing the structure of a finite automaton, the model was too limited by its generality to handle the questions Chomsky was asking. It recognized languages produced by translations from tokens to strings of the form α x, α xβ, (α, β non-terminals, x a string of terminals), by then called 'finite state,' or 'regular' languages, but not languages generated by recursive productions such as α uαv, α x, by which sequences may be embedded in other sequences an arbitrary number of times. These were the languages determined by the phrase structure grammars introduced by Chomsky in 1959, foremost among them the context-free grammars, and modeling them mathematically required more structure than the monoid could provide. (14) That was reflected in the need for internal memory, represented by the addition of a second (two-way) tape. It was also reflected in the fact, quickly ascertained, that the languages are not closed under complementation or intersection and hence cannot be represented by something so tidy as a Boolean algebra.

14. Chomsky's phrase structure grammars.
What made the context-free grammars so important was the demonstration by Seymour Ginsburg and H.G. Rice in 1961 that large portions of the grammar of the new internationally designed programming language, Algol, as specified in John Backus's new formal notation, BNF (Backus Normal Form, or later Backus-Naur Form), based on the productions of Emil Post, are context-free. When the productions are viewed as set equations, the iterated solutions form lattices, which have as fixpoints the final solutions that correspond to the portions of the language specified by the productions. Of course, were Algol as a whole context-free, then the language so determined would consist of all syntactically valid programs. Here, Schützenberger could bring his model of formal power series to bear. In a group of papers in the early 1960s, he showed that the series he had characterized as 'algebraic' were the fixpoint solutions of context-free grammars. The approach through formal power series also enabled him to show that the grammars were undecidably ambiguous and that they could be partitioned into two distinct subfamilies, only one of which fitted Chomsky's hierarchy.

Several agendas again converged at this point, as a mix of theoretical and practical work in both the United States and Germany on parenthesis-free notation, sequential formula translation for compilers, management of recursive procedures, and syntactic analysis for machine translation more or less independently led to the notion of the 'stack' (in German, Keller), or list of elements accessible at one end only on a last in-first out (LIFO) basis. In the parlance of automata theory, a stack provided a finite automaton with additional memory in the form of a tape that ran both ways but moved only one cell at a time and contained only data placed there by the automaton during operation. Schützenberger and Chomsky showed that such 'pushdown automata' correspond precisely to context-free languages and embedded the correspondence in Schützenberger's algebraic model.

The development, hand-in-hand, of automata theory and formal language theory during the late 1950s and early 1960s gave an empirical aspect to the construction of a mathematical theory of computation. The matching of Chomsky's hierarchy of phrase-structure grammars with distinct classes of automata enlisted such immediate assent precisely because it encompassed a variety of independent and seemingly disparate agendas, both theoretical and practical, ranging from machine translation to the specification of ALGOL and from noise-tolerant codes to natural language. These it united theoretically in much the same way that Newton's mechanics united not only terrestrial and celestial phenomena but also a variety of agendas in mathematics, mechanics, and natural philosophy. By the late 1960s, the theory of automata and formal languages had assumed canonical shape around an agenda of its own. In 1969, John Hopcroft and Jeffrey Ullman began their text, Formal Languages and Their Relation to Automata, by highlighting Chomsky's mathematical grammar, the context-free definition of ALGOL, syntax-directed compilation, and the concept of the compiler-compiler.

Since then a considerable flurry of activity has taken place, the results of which have related formal languages and automata theory to such an extent that it is impossible to treat the areas separately. By now, no serious study of computer science would be complete without a knowledge of the techniques and results from language and automata theory.
Far from merely complementing the study of computer science, the subject of automata and formal languages became the theoretical core of the curriculum during the 1970s, especially as it was embedded in such tools as lexical analyzers and parser generators. With those tools, what had once required years of effort by teams of programmers became an undergraduate term project.

While the bulk of that new agenda focused on grammars and their application to programming, part of it pulled the theory back toward the mathematics from which it had emerged. Increasingly pursued at a level of abstraction that tended to direct attention away from the physical system, the theory of automata verged at times on a machine theory of mathematics rather than a mathematical theory of the machine. "It appeared to me," wrote Samuel Eilenberg, co-creator of category theory, in the preface of his four-volume Automata, Languages, and Machines (1974),

that the time is ripe to try and give the subject a coherent mathematical presentation that will bring out its intrinsic aesthetic qualities and bring to the surface many deep results which merit becoming part of mathematics, regardless of any external motivation.
Characterized by constructive algorithms rather than proofs of existence, those results constituted, he went on, nothing less than a "new algebra, ... contain[ing] methods and results that are deep and elegant" and that would someday be "regarded as a standard part of algebra." That is, its ties to the computer would disappear.


Yet, even when the computer remained at the focus, some practitioners expressed doubt that formal language theory would suffice as a theory of computation, since computing involved more than the grammar of programming languages could encompass. As noted above, the practical activity of programming created the theoretical space to be filled by a mathematical model of computation. As the size and complexity of the programs expanded, driven by a widely shared sense of the all but unlimited power of the computer as a 'thinking machine,' so too did the expectations of what would constitute a suitable theory. One concept of a "mathematical theory of computation" stems from two papers in the early 1960s by John McCarthy, the creator of LISP and a progenitor of 'artificial intelligence,' yet another agenda of computing, with links both to neurophysiology and to logic. It was McCarthy who invoked Newton as historical precedent for the enterprise:

In this paper I shall discuss the prospects for a mathematical science of computation. In a mathematical science, it is possible to deduce from the basic assumptions, the important properties of the entities treated by the science. Thus, from Newton's law of gravitation and his laws of motion, one can deduce that the planetary orbits obey Kepler's laws.
In another version of the paper, he changed the precedent only slightly:

It is reasonable to hope that the relationship between computation and mathematical logic will be as fruitful in the next century as that between analysis and physics in the last. The development of this relationship demands a concern for both applications and for mathematical elegance.
The elegance proved easier to achieve than the applications. The trick was to determine just what constituted the entities of computer science.

McCarthy made clear what he expected from a suitable theory in terms of computing: first, a universal programming language along the lines of Algol but with richer data descriptions; second, a theory of the equivalence of computational processes, by which equivalence-preserving transformations would allow a choice of among various forms of an algorithm, adapted to particular circumstances; third, a form of symbolic representation of algorithms that could accommodate significant changes in behavior by simple changes in the symbolic expressions; fourth, a formal way of representing computers along with computation; and finally a quantitative theory of computation along the lines of Shannon's measure of information. What these criteria, recognizably drawn from the variety of agendas mentioned above, came down to in terms of McCarthy's precedents was something akin to traditional mathematical physics: a dynamical representation of a program such that, given the initial values of its parameters, one could determine its state at any later point. Essential to that model was the notion of converging on complexity by perturbations on a fundamental, simple solution, as in the case of a planetary orbit based initially on a two-body, central force configuration. Formal language theory shared with machine translation a similar assumption about converging on natural language through increasingly complex artificial languages.

For McCarthy, that program meant that automata theory would not suffice. Its focus on the structure of the tape belied both the architecture and the complexity of the machine processing the tape.As McCarthy put it,

Computer science must study the various ways elements of data spaces are represented in the memory of the computer and how procedures are represented by computer programs. From this point of view, most of the current work on automata theory is beside the point.

It did not suffice to know that a program was syntactically correct. One should be able to give a mathematical account of its semantics, that is, of the meaning of the symbols constituting it in terms of the values assigned to the variables and of the way in which the procedures change those values.

Such an account required a shift in the traditional mathematical view of a function as a mapping of input values to output values, that is, as a set of ordered pairs of values. To study how procedures change values, one must be able to express and transform the structure of the function as an abstract entity in itself, a sequence of operations by which the output values are generated from the input values. Seeking an appropriate tool for that task, McCarthy reached back to mathematical logic in the 1930s to revive Alonzo Church's lambda calculus, originally devised in 1929 in an effort to find a type-free route around Russell's paradox. (15) The attempt failed, and Church had long since abandoned the approach. The notation, however, seemed to fit McCarthy's purposes and, moreover, lay quite close structurally to LISP, which he had been using to explore mechanical theorem-proving.

15. As a system of notation the lambda calculus rewrites the usual functional notation, say, f(x) = x2 + 1, to abstract f itself as an operation on a variable and to designate the variable(s) bound by the function, e.g. f = λx.[x2 + 1]. f acquires a value by applying that operation to an argument, e.g. f(3) = λx.[x2 + 1](3) = 10. Lambda expressions may be applied to other lambda expressions as arguments; indeed, that is where its power lies as a tool of mathematical logic. Such applications quickly lead to quite complicated expressions reducible in several different ways. The original agenda involved demonstrating the existence of normal forms and the equivalence of reductions to them. One of Turing's first extensions of his 1936 paper was to show that the lambda calculus and Turing machines are of equivalent power as models of computability.

The developments leading up to the creation of denotational, or mathematical, semantics in 1970 are too complicated to pursue in detail here. McCarthy's own efforts faltered on the lack of a mathematical model for the lambda calculus itself. It also failed to take account of the peculiar problem posed by the storage of program and data in a common memory. Because computers store programs and data in the same memory, programming languages allow unrestricted procedures which could have unrestricted procedures as values; in particular a procedure can be applied to itself. "To date," Dana Scott claimed in his "Outline of a mathematical theory of computation" in 1970,

no mathematical theory of functions has ever been able to supply conveniently such a free-wheeling notion of function except at the cost of being inconsistent. The main mathematical novelty of the present study is the creation of a proper mathematical theory of functions which accomplishes these aims (consistently!) and which can be used as the basis for the metamathematical project of providing the 'correct' approach to semantics.

One did not need unrestricted procedures to appreciate the problems posed by the self-application facilitated by the design of the computer. Consider the structure of computer memory, representing it mathematically as a mapping of contents to locations. That is, a state σ is a function mapping each element l of the set L of locations to its value σ(l) in V, the set of allowable values. A command effects a change of state; it is a function from the set of states S into S. Storing a command means that can take the form σ(l), and hence σ(l)(σ) should be well defined. Yet, as Scott insisted in his paper, "[t]his is just an insignificant step away from the self-application problem p(p) for 'unrestricted' procedures p, and it is just as hard to justify mathematically."

Recent work on interval arithmetic suggested to Scott that one might seek justification through a partial ordering of data types and their functions based on the notion of 'approximation' or 'informational content.' With the addition of an undefined element as 'worst approximation' or 'containing no information,' the data types formed a complete lattice, and monotonic functions of them preserved the lattice. They also preserved the limits of sequences of partially ordered data types and hence were continuous. Scott showed that the least upper bound of the lattice, considered as the limit of sequences, was therefore the least fixed point of the function and was determined by the fixed point operator of the lambda calculus. Hence self-applicative functions of the sort needed for computers had a consistent mathematical model. And so too, by the way, did the lambda calculus for the first time in its history. Computer science had come around full circle to the mathematical logic in which it had originated.


Beginning in 1975 and extending over the late 1970s, the Computer Science and Engineering Research Study, chaired by Bruce Arden of Princeton University, took stock of the field and its current directions of research and published the results under the title What Can Be Automated? The committee on theoretical computer science argued forcefully that a process of abstraction was necessary to understand the complex systems constructed on computers and that the abstraction "must rest on a mathematical basis" for three main reasons.

(1) Computers and programs are inherently mathematical objects. They manipulate formal symbols, and their input-output behavior can be described by mathematical functions. The notations we use to represent them strongly resemble the formal notations which are used throughout mathematics and systematically studied in mathematical logic.
(2) Programs often accept arbitrarily large amounts of input data; hence, they have a potentially unbounded number of possible inputs. Thus a program embraces, in finite terms, an infinite number of possible computations; and mathematics provides powerful tools for reasoning about infinite numbers of cases.
(3) Solving complex information-processing problems requires mathematical analysis. While some of this analysis is highly problem-dependent and belongs to specific application areas, some constructions and proof methods are broadly applicable, and thus become the subject of theoretical computer science. (16)
16. What Can Be Automated?, 139.
Defining theoretical computer science as "the field concerned with fundamental mathematical questions about computers, programs, algorithms, and information processing systems in general," the committee acknowledged that those questions tended to follow developments in technology and its application, and hence to aim at a moving target -- strange behavior for mathematical objects.

Nonetheless, the committee could identify several broad issues of continuing concern, which were being addressed in the areas of computational complexity, data structures and search algorithms, language and automata theory, the logic of computer programming, and mathematical semantics. In each of these areas, it could point to substantial achievements in bringing some form of mathematics to bear on the central questions of computing. Yet, the summaries at the end of each section sounded a repeating chord. For all the depth of results in computational complexity, "the complexity of most computational tasks we are familiar with -- such as sorting, multiplying integers or matrices, or finding shortest paths -- is still unknown."(215) Despite the close ties between mathematics and language theory, "by and large, the more mathematical aspects of language theory have not been applied in practice. Their greatest potential service is probably pedagogic, in codifying and given clear economical form to key ideas for handling formal languages."(234) Efforts to bring mathematical rigor to programming quickly reach a level of complexity that makes the techniques of verification subject to the very concerns that prompted their development.

One might hope that the above ideas, suitably extended and incorporated into verification systems, would enable us to guarantee that programs are correct with absolute certainty. We are about to discuss certain theoretical and philosophical limitations that will prevent this goal from ever being reached. These limitations are inherent in the program verification process, and cannot be surmounted by any technical innovations.(246)

Mathematical semantics could show "precisely why [a] nasty surprise can arise from a seemingly well-designed programming language," but not how to eliminate the problems from the outset. As a design tool, mathematical semantics was still far from the goal of correcting the anomalies that gave rise to errors in real programming languages.

If computers and programs were "inherently mathematical objects," the mathematics of the computers and programs of real practical concern had so far proved elusive. Although programming languages borrowed the trappings of mathematical symbolism, they did not translate readily into mathematical structures that captured the behavior of greatest concern to computing. To a large extent, theoretical computer science continued the line of inquiry that had led to the computer in the first place, namely, the establishment of the limits of what can be computed. While that had some value in directing research and development away from dead ends, it offered little help in resolving the problems of computing that lay well within those limits. For that reason, computer science remains an amalgam of mathematical theory, engineering practice, and craft skill.


ACM Turing Award Lectures: The First Twenty Years, 1966 to 1985
(New York: Reading, Mass.: ACM Press; Addison-Wesley Pub. Co., 1987); lectures since 1986 published annually in Communications of the ACM

William Aspray (ed.), Computing Before Computers (Ames: Iowa State University Press, 1990)

Bruce W. Arden (ed.), What Can Be Automated?: The Computer Science and Engineering Research Study (COSERS) (Cambridge, MA: MIT Press, 1980)

Thomas J. Bergin, Jr. and Richard G. Gibson, Jr. (eds.), History of Programming Languages II (New York : ACM Press; Reading, Mass.: Addison-Wesley Pub. Co., 1996)

Martin Campbell-Kelly and William Aspray, Computer: A History of the Information Machine (New Yorl: Basic Books, 1996)

Jan van Leeuwen (ed.), Handbook of Theoretical Computer Science (2 vols., Amsterdam; New York: Elsevier; Cambridge, Mass.: MIT Press, 1990)

Herbert Mehrtens, Moderne Sprache Mathematik: Eine Geschichte des Streits um die Grundlagen der Disziplin und des Subjekts formaler Systeme (Frankfurt am Main: Suhrkamp, 1990)

Richard L. Wexelblat (ed.) History of Programming Languages (New York : Academic Press, 1981)