SOFTWARE: THE SELF-PROGRAMMING MACHINE

Michael S. Mahoney
Princeton University

(Published in From 0 to 1: An Authoritative History of Modern Computing , ed. A. Akera and F. Nebeker, New York: Oxford U.P., 2002)

The Importance of Software

In May 1973 Datamation published a Rand report filed six months earlier by Barry Boehm and based on studies undertaken by the Air Force Systems Command, which was concerned about the growing mismatch between its needs and its resources in the design and development of computer-based systems. Titled "Software and its Impact: A Quantitative Assessment", the article attached numbers to the generally shared sense of malaise in the industry: software was getting more and more costly. Drawing on various empirical studies of programming and programmers undertaken in the late 1960s, Boehm tried to indicate where to look for relief by disaggregating the costs into the major stages of software projects. Perhaps the most striking visualization of the problem was a graph with a flattened logistic curve illustrating the inversion of the relative costs of hardware and software over the thirty-year period 1955-85. Whereas software had constituted less than 20% of the cost of a system in 1955, current trends suggested that it would make up over 90% by 1985. At the time of Boehm's study, software's share already stood at 75%.

Boehm's article belongs to the larger issue of the "software crisis" and the origins of software engineering, to which I shall return presently, but for the moment it also serves to make a historiographical point. Software development has remained a labor-intensive activity, an art rather than a science. Indeed, that is what computer people have found so troublesome and some have tried to remedy. Boehm's figures show that by 1970 some three-quarters of the productive energies of the computer industry were going into software. By then at the latest, the history of computing had become the history of software.

At present the literature of the history of computing does not reflect that fact. Except perhaps for the major programming languages, the story of software has been largely neglected. The history of areas such as operating systems, databases, graphics, real-time and interactive computing still lies in past survey articles, prefaces of textbooks, and retrospectives by the people involved. When one turns from systems software to applications programming, the gap widens. Applications, after all, are what make the computer worth having; without them a computer is of no more utility or value than a television set without broadcasting. James Cortada has provided a start toward a history of applications through his quite useful bibliographic guide, but there are only a few studies of only the largest and most famous programs, e.g. SAGE, SABRE, ERMA, etc. We have practically no historical accounts of how, starting in the early 1950s, government, business, and industry put their operations on the computer. Aside from a few studies with a primarily sociological focus in the 1970s, programming as a new technical activity and programmers as a new labor force have received no historical attention. Except for very recent studies of the origins and development of the Internet, we have no substantial histories of the word processor, the spreadsheet, communications, or the other software on which the personal computer industry and some of the nation's largest personal fortunes rest.

Software, then, presents a huge territory awaiting historical exploration, with only a few guideposts by which to maintain one's bearings. One guiding principle in particular seems clear: if application software is about getting the computer to do something useful in the world, systems software is about getting the computer to do the applications programming. It is the latter theme that I shall mainly pursue here. Eventually, I shall come back to applications programming by way of software engineering, but only insofar as it touches on the main theme.

Programming Computers

Basically, programming is a simple, logical procedure, but as the problems to be solved grow, the labor of programming also increases, and the aid of the computer is enlisted to devise its own programs. (Werner Buchholz, 1953)

The idea of programs that write programs is inherent in the concept of the universal Turing machine, set forth by Alan M. Turing in 1936. He showed that any computation can be described in terms of a machine shifting among a finite number of states in response to a sequence of symbols read and written one at a time on a potentially infinite tape. Since the description of the machine can itself be expressed as a sequence of symbols, Turing went on to describe a universal machine which can read the description of any specific machine and then carry out the computation it describes. The computation in question can very well be a description of a universal Turing machine, a notion which John von Neumann pursued to its logical conclusion in his work on self-replicating automata. As a form of Turing machine, the stored-program computer is in principle a self-programming device, limited in practice by finite memory. That limitation seemed overwhelming at first, but in the mid-1950s, the concept of computer-assisted programming began to meet with striking success in the form of programming languages, programming and operating systems, and databases and report generators.

Indeed, that success emboldened people to think about programming languages and programming environments that would obviate the need for programmers in the long run and in the meantime bring them under increasing effective managerial control. By 1961 Herbert A. Simon was not alone in predicting that

... we can dismiss the notion that computer programmers will become a powerful elite in the automated corporation. It is far more likely that the programming occupation will become extinct (through the further development of self-programming techniques) than that it will become all powerful. More and more, computers will program themselves; and direction will be given to computers through the mediation of compiling systems that will be completely neutral so far as the content of the decision rules is concerned.(1)
Simon was talking about 1985, yet, as we near the millennium, programmers are neither extinct nor even an endangered species. Indeed, old COBOL programmers have recently found renewed life in patching the Y2K problem.

Coincidentally, Simon's remarks were reprinted by John Diebold in 1973, which is just about the point of transition between the successful and the less successful phases of the project of the self-programming computer. By the early '70s, the basic elements of current systems software were in place, and development efforts since then have been aimed largely at their refinement and extension. With few exceptions, the programming languages covered in the two ACM History of Programming Languages conferences in 1978 and 1993 were conceived before 1975. They include the major languages in currently in use for applications and systems programming. In particular, C and Unix both date from the turn of the '70s, as do IBM's current operating systems. The graphical user interfaces (GUIs) of Windows and MacOS rest on foundations laid at Stanford in the 1960s and Xerox PARC in the early and mid-1970s. The seminal innovations in both local and wide-area networking also date from that time. Developments since then have built on those foundations.

By 1973, too, "software engineering" was underway as a conscious effort to resolve the problems of producing software on time, within budget, and to specifications. Among the concepts driving that effort, conceived of as a form of industrial engineering, is the "software factory", either on the Taylorist model of "the one best way" of programming enforced by the programming environment or on the Ford model of the assembly line, where automated programming removes the need for enforcement by severely reducing the role of human judgment. At a conference in August 1996 on the history of software engineering, leading figures of the field agreed only that after almost thirty years, whatever form it might eventually take as an engineering discipline, it wasn't one yet. While software development environments have automated some tasks, programming "in the large" remains a labor-intensive form of craft production.

So we can perhaps usefully break systems software up into programming tools and programming environments on the one hand and software development (or, if you prefer, software engineering) on the other. Both fall under the general theme of getting the computer to do the programming. Both have become prerequisites to getting the computer to do something useful.

Programming Tools

It is a commonplace that a computer can do anything for which precise and unambiguous instructions can be given. The difficulties of programming computers seem to have caught their creators by surprise. Werner Buchholz's optimism is counterbalanced by Maurice Wilkes's realization that he would be spending much of his life debugging programs.(2)On a larger scale, companies that introduced computers into their operations faced the problem of communication between the people who knew how the organization worked and those who knew how the computer worked. IBM had built its electrical accounting machinery (EAM) business in large part by providing that mediation through its sales staff, whose job it was to match IBM's equipment to the customer's business. At first it seemed that computers meant little more than changing the "E" in "EAM" from "Electrical" to "Electronic", but experience soon showed otherwise. Programming the computer proved to be difficult, time-consuming, and error-prone. Even when completed, programs required maintenance in the form of addition of functions not initially specified, adjustment of unanticipated outcomes, and correction of previously undetected mistakes. With each change of computer to a larger or newer model came the need to repeat the programming process from the start, since the old code would not run on the new machine. The situation placed a strain on both the customer and IBM, and together with other manufacturers they therefore shared an interest in means of easing and speeding the task of programming and of making programs compatible with a variety of computers.

In addition to having to work within the confines of the machine's instruction set and hardware protocols, one had to do one's own clerical tasks of assigning variables to memory and of keeping track of the numerical order of the instructions. The last became a systematic problem on Cambridge's Electronic Delay Storage Automatic Calculator (EDSAC) as the notion of a library of subroutines took hold, necessitating the incorporation of the modules at various points in a program. Symbolic assemblers began to appear in the early 1950s, enabling programmers to number instructions provisionally for easy insertion, deletion, and reference and, more important, turning over to the assembler the allocation of memory for symbolically denoted variables.(3) Although symbolic assemblers took over the clerical tasks, they remained tied to the basic instruction set, albeit mnemonic rather than numeric. During the late '50s macro assemblers enabled programmers to group sequences of instructions as functions and procedures (with parameters) in forms closer to their own way of thinking and thus to extend the instruction set.

The first high-level programming languages, perhaps most famously FORTRAN in 1957, followed over the next three years by LISP, COBOL, and Algol, took a quite different approach to programming by differentiating between the language in which humans think about problems and the language by which the machine is addressed. To clerical tasks of the assembler, compilers and interpreters added the functions of parsing the syntax and construing the semantics of the human-oriented programming language and then translating them into the appropriate sequences of assembler or machine instructions. At first, as with FORTRAN, developers of compilers strove for little more than a program that would fit into the target machine and that would produce reasonably efficient code. Once they had established the practicality of compilers, however, they shifted their goals.

In translating human-oriented languages into machine code, compilers separated programming from the machines on which the programs ran: "ALGOL 60 is the name of a notation for expressing computational processes, irrespective of any particular uses or computer implementations," said one of its creators.(4) Subsequently, the design of programming languages increasingly focused on the forms of computational reasoning best suited to various domains of application, while the design of compilers attended to the issues of accurate translation across a range of machines. With that shift of focus at the turn of the 1960s, the development of programming languages and their compilers converged with research in theoretical computer science, first to establish the general principles underlying lexical analysis and the parsing of formal languages, then to implement those principles in general programs for moving from a formal specification of the vocabulary and grammar of a language to the corresponding lexical analyzer and parser, which not only resolved the source program into its constituents and verified its syntactical correctness but also allowed the incorporation of preset blocks of machine code associated with those constituents to produce the compiler itself. By means of such tools, for example lex and yacc in the Unix system, a compiler that in the late 1950s would have required several staff-years became feasible for a pair of undergraduates in a semester. By contrast, automatic generation of code, or the translation of the abstract terms of the programming language into the concrete instruction set of the target machine, proved more resistant to theoretical understanding (formal semantics) and thus to automation, especially in a form that assured semantic invariance across platforms.
 

Systems Software
 

"Problem-oriented languages", as they were called, were designed to facilitate the work of programmers by freeing them from the operational details of the computer or computers on which their programs would run. The more abstract the language, the more it depended on a programming system to supply those details whether through a library of standard routines or through compilers, linkers, and loaders that fitted the program to the mode of operation of the particular computer. Thus software aimed at shielding the programmer from the machine intersected with software, namely operating systems, meant to shield the machine from the programmer.
 

Operating systems emerged in the mid-1950s, largely out of concerns to enhance the efficiency of computer operations by minimizing non-productive time between runs. Rather than allowing programmers to set up and run their jobs one by one, the systems enabled operators to load a batch of programs with accompanying instructions for set-up and turn them over to a supervisory program to run them and to alert the operators when their intervention was required. With improvements in hardware, the systems expanded to include transfer and allocation of tasks among several processors (multiprocessing), in particular separating slower I/O operations from the main computation. At the turn of the 1960s, with the development of techniques for handling communications between processors, multiprogramming systems began running several programs in common memory, switching control back and forth among them according to increasingly sophisticated scheduling algorithms and memory-protection schemes.
 

The development of hardware and software for rapid transfer of data between core and secondary storage essentially removed the limits on the former by mapping it into the latter by segments, or "pages", and swapping them in and out as required by the program currently running. Such a system could then circulate control among a large number of programs, some or all of which could be processes interacting online with users at consoles (time-sharing). With each step in this development, applications programmers moved farther down an expanding hierarchy of layers of control that intervened between them and the computer itself. Only the layer at the top corresponded to a real machine; all the rest were virtual machines requiring translation to the layer above. Indeed, in IBM's OS/360 even the top layer was a virtual machine, translated by microprogrammed firmware into the specific instruction sets of the computers making up System/360. Despite appearances of direct control, this layering of abstract machines was as true of interactive systems as of batch systems. It remains true of current personal computing environments, the development of which has for the most part recapitulated the evolution of mainframe systems, adding to them a new layer of graphical user interfaces (GUIs). For example, Windows NT does not allow any application to communicate directly with the basic I/O system (BIOS), thus disabling some DOS and Windows 9x software.
 

What is important for present purposes about this highly condensed account of a history which remains largely uninvestigated is the extent to which operating systems increasingly realized the ideal of the computer as a self-programming device. In the evolution from the monitors of the mid-'50s to the interactive time-sharing systems of the early '70s, programs themselves became dynamic entities. The programmer specified in abstract terms the structure of the data and the flow of computation. Making those terms concrete became the job of the system software, which in turn relied on increasingly elaborate addressing schemes to vary the specific links in response to run-time conditions. The operating system became the master choreographer in an ever more complex dance of processes, coordinating them to move tightly among one another, singly and in groups, yet without colliding. The task required the development of sophisticated techniques of exception-handling and dynamic data management, but the possibility of carrying it out at all rested ultimately on the computer's capacity to rewrite its own tape.

Software Systems

Having concentrated during the 1960s on programming languages and operating systems as the means of addresssing the problems of programming and software, the computing community shifted, or at least split its attention during the following decade. Participants at the 1968 NATO Conference on Software Engineering reinforced each other's growing sense that the cost overruns, slips in schedule, and failure to meet specifications that plagued the development of large-scale and mission-critical software systems reflected a systemic disorder, to be remedied only by placing "software manufacturer ... on the on the types of theoretical foundations and practical disciplines that are traditional in the established branches of engineering."(5) Different views of the nature of engineering led to different approaches to this goal, but in general they built on developments in systems software, extending programming languages and systems to encompass programming methodogies. Two main strains are of particular interest here: the use of programming environments to constrain the behavior of programmers and the extension of programming systems to encompass and ultimately to automate the entire software development cycle.

By the early 1970s, it seemed clear that, whatever the long-range prospects for automatic programming or at least for programming systems capable of representing large-scale computations in effective operational form, the development of software over the short term would rely on large numbers of programmers. Increasingly, programming systems came to be viewed in terms of disciplining programmers. Structured programming languages, enforced by diagnostic compilers, were aimed at constraining programmers to write clear, self-documenting, machine-independent programs. To place those programmers in a supportive environment, software engineers turned from mathematics and computer science to industrial engineering and project management for models of engineering practice. Arguing that "Economical products of high quality are not possible (in most instances) when one instructs the programmer in good practice and merely hopes that he will make his invisible product according to those rules and standards," R.W. Bemer of GE spoke in 1968 of a "software factory" centered on the computer:

It appears that we have few specific environments (factory facilities) for the economical production of programs. I contend that the production costs are affected far more adversely by the absence of such an environment than by the absence of any tools in the environment (e.g. writing a program in PL/1 is using a tool.)

A factory supplies power, work space, shipping and receiving, labor distribution, and financial controls, etc. Thus a software factory should be a programming environment residing upon and controlled by a computer. Program construction, checkout and usage should be done entirely within this environment. Ideally it should be impossible to produce programs exterior to this environment.(6)

Much of the effort in software engineering during the 1970s and '80s was directed toward the design and implementation of such environments, as the concept of the "software factory" took on a succession of forms. CASE (computer-assisted software engineering) tools are perhaps the best example.

The Grail of Automatic Programming

While some software engineers thought of factories in terms of human workers organized toward efficient use of their labor, others looked to the automated factory first realized by Henry Ford's assembly line, where the product was built into the machines of production, leaving little or nothing to the skill of the worker. One aspect of that system attracted particular attention. Production by means of interchangeable parts was translated into such concepts as "mass-produced software components", modular programming, object-oriented programming, and reusable software. At the same time, in a manner similar to earlier work in compiler theory or indeed as an extension of it, research into formal methods of requirements analysis, specification, and design went hand in hand with the development of corresponding languages aimed at providing a continuous, automatic translation of a system from a description of its intended behavior to a working computer program. These efforts have so far met with only limited success. The production of programs remains in the hands of programmers.


Further Reading

Frederick P. Brooks, The Mythical Man-Month: Essays on Software Engineering (Reading, MA: Addison-Wesley Publishing Co., 1975; 3rd ed. 1995)

Thomas M. Bergin and Richard G. Gibson (eds.) History of Programming Languages II (New York: ACM Press ; Reading, MA: Addison-Wesley Publishing Co., 1996)

Paul W. Oman and Ted G. Lewis (eds.), Milestones in Software Evolution (Los Alamitos, CA: IEEE Computer Society Press, 1990)

Saul Rosen, Programming Systems and Languages (New York: McGraw-Hill, 1967)

Norman Weizer, "A History of Operating Systems", Datamation (January 1981), 119-126

Richard Wexelblat (ed.), History of Programming Languages (NY: Academic Press, 1981)


Notes

1. Herbert A. Simon, "The corporation: will it be managed by machines?", published in 1961 in a volume of essays on Management and Corporation: 1985, ed. Anshen & Bach (McGraw Hill, 1961); reprinted in The World of the Computer, ed.John Diebold (Random House, 1973), p. 154.

2. Maurice V. Wilkes, Memoirs of a Computer Pioneer (Cambridge, MA: MIT Press, 1985), 145.

3. H. Rutishauser wrote in 1967 ("Description of Algol 60", Handbook for Automatic Computation, Berlin: Springer) that "...by 1954 the idea of using a computer for assisting the programmer had been seriously considered in Europe, but apparently none of these early algorithmic languages was ever put to actual use. The situation was quite different in the USA, where an assembly language epoch preceded the introduction of algorithmic languages. To some extent this may have diverted attention and energy from the latter, but on the other hand it helped to make automatic programming popular in the USA." (quoted by Peter Naur in "The European Side of the Last Phase of the Development of Algol 60", History of Programming Languages, 93.)

4. Naur, "European side", 95-6. In designing Algol 60, the members of the committee expressly barred discussions of implementation of the features of the language, albeit on the shared assumption that no one would propose a feature he did not know how to implement, at least in principle.

5. Peter Naur and Brian Randell (eds.), Software Engineering: Report on a conference sponsored by the Nato Science Committee, Garmisch, Germany 7th to 11th October 1968 (Brussels: NATO Scientific Affairs Division, January 1969), 13.

6. R.W. Bemer, "Position Paper for Panel Discussion [on] the Economics of Program Production", Information Processing 68 (Amsterdam: North-Holland Publishing Company, 1969), II, 1626.