Extended Backus–Naur Form

related topics
{math, number, function}
{language, word, form}
{company, market, business}
{style, bgcolor, rowspan}
{black, white, people}

In computer science, Extended Backus–Naur Form (EBNF) is a family of metasyntax notations used for expressing context-free grammars: that is, a formal way to describe computer programming languages and other formal languages. They are extensions of the basic Backus–Naur Form (BNF) metasyntax notation.

The earliest EBNF was originally developed by Niklaus Wirth. However, many variants of EBNF are in use. The International Organization for Standardization has adopted an EBNF standard (ISO/IEC 14977). This article uses EBNF as specified by the ISO for examples applying to all EBNF:s. Other EBNF variants use somewhat different syntactic conventions.

Contents

Basics

A code, e.g. source code of a computer program consisting of terminal symbols—that is, visible characters, digits, punctuation marks, white space characters, etc.

The EBNF defines production rules where sequences of symbols are respectively assigned to a nonterminal:

digit excluding zero = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
digit                = "0" | digit excluding zero ;

This production rule defines the nonterminal digit which is on the left side of the assignment. The vertical bar represents an alternative and the terminal symbols are enclosed with quotation marks followed by a semicolon as terminating character. Hence a Digit is a 0 or a digit excluding zero that can be 1 or 2 or 3 and so forth until 9.

A production rule can also include a sequence of terminals or nonterminals, each separated by a comma:

twelve                          = "1" , "2" ;
two hundred one                 = "2" , "0" , "1" ;
three hundred twelve            = "3" , twelve ;
twelve thousand two hundred one = twelve , two hundred one ;

Expressions that may be omitted or repeated can be represented through curly braces { ... }:

natural number = digit excluding zero , { digit } ;

In this case, the strings 1, 2, ...,10,...,12345,... are correct expressions. To represent this, everything that is set within the curly braces may be repeated arbitrarily often, including not at all.

An option can be represented through squared brackets [ ... ]. That is, everything that is set within the square brackets may be present just once, or not at all:

integer = "0" | [ "-" ] , natural number ;

Therefore an integer is a zero (0) or a natural number that may be preceded by an optional minus sign.

EBNF also provides, among other things, the syntax to describe repetitions of a specified number of times, to exclude some part of a production, or to insert comments in an EBNF grammar.

[edit] Table of symbols

The following represents a proposed standard.

Usage Notation
definition =
concatenation ,
termination  ;
alternation |
option [ ... ]
repetition { ... }
grouping ( ... )
terminal string " ... "
terminal string ' ... '
comment (* ... *)
special sequence  ? ... ?
exception -

[edit] Another example

A pascal-like programming language that allows only assignments can be defined in EBNF as follows:

(* a simple program syntax in EBNF − Wikipedia *)
program = 'PROGRAM' , white space , identifier , white space ,
           'BEGIN' , white space ,
           { assignment , ";" , white space } ,
           'END.' ;
identifier = alphabetic character , { alphabetic character | digit } ;
number = [ "-" ] , digit , { digit } ;
string = '"' , { all characters − '"' } , '"' ;
assignment = identifier , ":=" , ( number | identifier | string ) ;
alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
                     | "H" | "I" | "J" | "K" | "L" | "M" | "N"
                     | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
                     | "V" | "W" | "X" | "Y" | "Z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
white space = ? white space characters ? ;
all characters = ? all visible characters ? ;

A syntactically correct program then would be:

PROGRAM DEMO1
BEGIN
  A0:=3;
  B:=45;
  H:=-100023;
  C:=A;
  D123:=B34A;
  BABOON:=GIRAFFE;
  TEXT:="Hello world!";
END.

The language can easily be extended with control flows, arithmetical expressions, and Input/Output instructions. Then a small, usable programming language would be developed.

Full article ▸

related documents
Unicity distance
Legendre symbol
Axiom of pairing
Elementary group theory
Associativity
Functional analysis
Lagrange inversion theorem
Richard's paradox
Haar measure
Splitting lemma
Examples of groups
Extended real number line
Referential transparency (computer science)
Statistical independence
Consistency
Quotient group
Meromorphic function
Mathematical model
Assignment problem
Nial
Ring (mathematics)
Tree (graph theory)
Queue (data structure)
B-tree
Idempotence
Presburger arithmetic
Coprime
Preorder
XSL Transformations
Oracle machine