A compiler-compiler or compiler generator is a tool that creates a parser, interpreter, or compiler from some form of formal description of a language and machine. The earliest and still most common form of compiler-compiler is a parser generator, whose input is a grammar (usually in BNF) of a programming language, and whose generated output is the source code of a parser often used as a component of a compiler.
The ideal compiler compiler takes a description of a programming language and a target instruction set architecture, and automatically generates a usable compiler from them. In practice, the state of the art has yet to reach this degree of sophistication and most compiler generators are not capable of handling semantic or target architecture information.
A typical parser generator associates executable code with each of the rules of the grammar that should be executed when these rules are applied by the parser. These pieces of code are sometimes referred to as semantic action routines since they define the semantics of the syntactic structure that is analysed by the parser. Depending upon the type of parser that should be generated, these routines may construct a parse tree (or abstract syntax tree), or generate executable code directly.
Some experimental compiler-compilers take as input a formal description of programming language semantics, typically using denotational semantics. This approach is often called 'semantics-based compiling', and was pioneered by Peter Mosses' Semantic Implementation System (SIS) in 1978. However, both the generated compiler and the code it produced were inefficient in time and space. No production compilers are currently built in this way, but research continues.
The Production Quality Compiler-Compiler project at Carnegie-Mellon University does not formalize semantics, but does have a semi-formal framework for machine description.
Compiler-compilers exist in many flavors, including bottom-up rewrite machine generators (see JBurg) used to tile syntax trees according to a rewrite grammar for code generation, and attribute grammar parser generators (e.g. ANTLR can be used for simultaneous type checking, constant propagation, and more during the parsing stage).
Full article ▸