S-expressions or sexps (for "symbolic expression") are list-based data structures that represent semi-structured data. An S-expression may be a nested list of smaller S-expressions. S-expressions are probably best known for their use in the Lisp family of programming languages. They are typically represented in text by parenthesized, whitespace-separated sequences of character strings, as in
(= 4 (+ 2 2)), which represents the Boolean expression commonly written as
4==2+2 in C and its related programming languages.
In Lisp, the first element of every S-expression is an operator and any remaining elements are treated as data. This is called "prefix notation" or "Cambridge Polish notation". Due to the close association between S-expressions and written representations of data, the term "S-expression" in casual conversation often refers to the written representation of an S-expression.
Other uses of S-expressions are in Lisp-derived languages such as DSSSL, and as mark-up in communications protocols like IMAP and John McCarthy's CBCL. The details of the syntax and supported data types vary in the different languages, but the most common feature among these languages is the use of S-expressions and prefix notation.
In Lisp, S-expressions are used to store both source code and data (see McCarthy Recursive Functions of Symbolic Expressions). S-expressions were originally intended only for data to be manipulated by M-expressions, but the first implementation of Lisp was an interpreter of S-expression encodings of M-expressions, and Lisp programmers soon became accustomed to using S-expressions for both code and data. This means that Lisp is homoiconic, that is, the primary representation of programs is also a data structure in a primitive type of the language itself.
S-expressions are defined recursively as either single data objects called "atoms" or lists of S-expressions. Numbers, arrays, character strings, and symbols are all atoms.
An S-expression may be a list of other S-expressions, which may also be lists, so S-expressions may be arbitrarily nested to any depth. In Lisp, S-expression lists are constructed from small data structures called cons pairs, written as
(x . y). The first element of the cons (x) is the first element of the list, and the second element (y) is the remainder of the list. Longer lists are made up of nested cons pairs, for example
(1 . (2 . (3 . nil))). The special symbol
nil marks the end of the list. Cons-based lists are usually written more compactly as
(1 2 3). Nested lists can also be written as S-expressions:
((milk juice) (honey marmalade)) is a two-element S-expression whose elements are also two-element S-expressions. The whitespace-separated notation used in Lisp (and this article) is typical. Line breaks (newline characters) usually qualify as separators.
Full article ▸