


LEX(1)                   USER COMMANDS                     LEX(1)



NAME
     lex - lexical analysis program generator

SYNOPSIS
     lex [ -fntv ] [ _f_i_l_e_n_a_m_e ] ...

DESCRIPTION
     lex generates programs to be used in simple lexical analysis
     of text.  Each _f_i_l_e_n_a_m_e (the standard input by default) con-
     tains regular expressions to search for, and actions written
     in C to be executed when expressions are found.

     A C source program, lex.yy.c is generated, to be compiled as
     follows:

          cc lex.yy.c -ll

     This program, when run, copies unrecognized portions of  the
     input  to  the  output, and executes the associated C action
     for each regular expression that is recognized.  The  actual
     string  matched  is  left  in  yytext, an external character
     array.

     Matching is done in order of the strings in the  file.   The
     strings  may  contain  square  braces  to indicate character
     classes, as in [abx-z] to indicate a, b, x, y,  and  z;  and
     the operators *, + and ?, which mean, respectively, any non-
     negative number, any positive number, or either zero or  one
     occurrences  of  the  previous character or character-class.
     The "dot" character (`.') is the class of all ASCII  charac-
     ters except NEWLINE.

     Parentheses for grouping and vertical  bar  for  alternation
     are also supported.  The notation _r{_d,_e} in a rule indicates
     instances of regular expression _r between _d and _e.  It has a
     higher precedence than |, but lower than that of *, ?, +, or
     concatenation.  The ^ (carat character) at the beginning  of
     an  expression  permits  a successful match only immediately
     after a NEWLINE, and the  $  character  at  the  end  of  an
     expression requires a trailing NEWLINE.

     The / character in an expression indicates trailing context;
     only  the part of the expression up to the slash is returned
     in _y_y_t_e_x_t, although the remainder  of  the  expression  must
     follow in the input stream.

     An operator character may be used as an ordinary  symbol  if
     it is within `"' symbols or preceded by `\'.

     Three subroutines defined as macros are expected: input() to
     read  a character; unput(_c) to replace a character read; and
     output(_c) to place an output character.  They are defined in



Sun Release 4.1   Last change: 1 December 1988                  1






LEX(1)                   USER COMMANDS                     LEX(1)



     terms  of  the  standard streams, but you can override them.
     The program generated is named yylex(), and the library con-
     tains  a  main()  which  calls it.  The action REJECT on the
     right side of the rule rejects this match and  executes  the
     next suitable match; the function yymore() accumulates addi-
     tional characters into the same  yytext;  and  the  function
     yyless(_n)  where  _n is the number of characters to retain in
     yytext.  The macros _i_n_p_u_t and  _o_u_t_p_u_t  use  files  yyin  and
     yyout  to  read  from  and  write to, defaulted to stdin and
     stdout, respectively.

     In a lex program, any line beginning with a blank is assumed
     to  contain  only C text and is copied; if it precedes %% it
     is copied into the external definition area of the  lex.yy.c
     file.   All  rules  should  follow  a  %%, as in YACC. Lines
     preceding %% which begin with a  nonblank  character  define
     the  string  on the left to be the remainder of the line; it
     can be used later by surrounding it with  {}.   Note:  curly
     brackets  do not imply parentheses; only string substitution
     is done.

     The external names generated by lex all begin with the  pre-
     fix yy or YY.

     Certain table sizes for the resulting  finite-state  machine
     can be set in the definitions section:

          %p _n number of positions is _n (default 2000)

          %n _n number of states is _n (500)

          %t _n number of parse tree nodes is _n (1000)

          %a _n number of transitions is _n (3000)

     The use of one or more of the  above  automatically  implies
     the -v option, unless the -n option is used.

OPTIONS
     -f   Faster compilation. Do not bother to pack the resulting
          tables; limited to small programs.

     -n   Opposite of -v; -n is default.

     -t   Place the result on the standard output instead  of  in
          file lex.yy.c.

     -v   Print a one-line summary of statistics of the generated
          analyzer.

EXAMPLES




Sun Release 4.1   Last change: 1 December 1988                  2






LEX(1)                   USER COMMANDS                     LEX(1)



     The following command line:
          lex lexcommands

     would draw lex instructions from the file  lexcommands,  and
     place the output in lex.yy.c.

     The following:

          %% [A-Z]     putchar (yytext[0]+'a'-'A'); [ ]+$     ; [
          ]+ putchar(' ');

     is an example of a lex program.  It converts upper  case  to
     lower, removes blanks at the end of lines, and replaces mul-
     tiple blanks by single blanks.

          D       [0-9]
          %%
          if      printf("IF statement\n");
          [a-z]+  printf("tag, value %s\n",yytext);
          0{D}+   printf("octal number %s\n",yytext);
          {D}+    printf("decimal number %s\n",yytext);
          "++"    printf("unary op\n");
          "+"     printf("binary op\n");
          "/*"    {       loop:
                          while (input() != '*');
                          switch (input())
                                  {
                                  case '/': break;
                                  case '*': unput('*');
                                  default: go to loop;
                                  }
                          }

FILES
     lex.yy.c

SEE ALSO
     sed(1V), yacc(1)

     _P_r_o_g_r_a_m_m_i_n_g _U_t_i_l_i_t_i_e_s _a_n_d _L_i_b_r_a_r_i_e_s

NOTES
     The lex command is  not  changed  to  support  8-bit  symbol
     names,  as  this  would  produce lex source code that is not
     portable between systems.










Sun Release 4.1   Last change: 1 December 1988                  3



