usr.bin/lex/lex.1

   1 .\" $FreeBSD$
   2 .\"
   3 .TH FLEX 1 "May 21, 2013" "Version 2.5.37"
   4 .SH NAME
   5 flex, lex \- fast lexical analyzer generator
   6 .SH SYNOPSIS
   7 .B flex
   8 .B [\-bcdfhilnpstvwBFILTV78+? \-C[aefFmr] \-ooutput \-Pprefix \-Sskeleton]
   9 .B [\-\-help \-\-version]
  10 .I [filename ...]
  11 .SH OVERVIEW
  12 This manual describes
  13 .I flex,
  14 a tool for generating programs that perform pattern-matching on text.
  15 The manual includes both tutorial and reference sections:
  16 .nf
  17
  18     Description
  19         a brief overview of the tool
  20
  21     Some Simple Examples
  22
  23     Format Of The Input File
  24
  25     Patterns
  26         the extended regular expressions used by flex
  27
  28     How The Input Is Matched
  29         the rules for determining what has been matched
  30
  31     Actions
  32         how to specify what to do when a pattern is matched
  33
  34     The Generated Scanner
  35         details regarding the scanner that flex produces;
  36         how to control the input source
  37
  38     Start Conditions
  39         introducing context into your scanners, and
  40         managing "mini-scanners"
  41
  42     Multiple Input Buffers
  43         how to manipulate multiple input sources; how to
  44         scan from strings instead of files
  45
  46     End-of-file Rules
  47         special rules for matching the end of the input
  48
  49     Miscellaneous Macros
  50         a summary of macros available to the actions
  51
  52     Values Available To The User
  53         a summary of values available to the actions
  54
  55     Interfacing With Yacc
  56         connecting flex scanners together with yacc parsers
  57
  58     Options
  59         flex command-line options, and the "%option"
  60         directive
  61
  62     Performance Considerations
  63         how to make your scanner go as fast as possible
  64
  65     Generating C++ Scanners
  66         the (experimental) facility for generating C++
  67         scanner classes
  68
  69     Incompatibilities With Lex And POSIX
  70         how flex differs from AT&T lex and the POSIX lex
  71         standard
  72
  73     Diagnostics
  74         those error messages produced by flex (or scanners
  75         it generates) whose meanings might not be apparent
  76
  77     Files
  78         files used by flex
  79
  80     Deficiencies / Bugs
  81         known problems with flex
  82
  83     See Also
  84         other documentation, related tools
  85
  86     Author
  87         includes contact information
  88
  89 .fi
  90 .SH DESCRIPTION
  91 .I flex
  92 is a tool for generating
  93 .I scanners:
  94 programs which recognize lexical patterns in text.
  95 .I flex
  96 reads
  97 the given input files, or its standard input if no file names are given,
  98 for a description of a scanner to generate.
  99 The description is in the form of pairs
 100 of regular expressions and C code, called
 101 .I rules.
 102 .I flex
 103 generates as output a C source file,
 104 .B lex.yy.c,
 105 which defines a routine
 106 .B yylex().
 107 This file is compiled and linked with the
 108 .B \-ll
 109 library to produce an executable.
 110 When the executable is run,
 111 it analyzes its input for occurrences
 112 of the regular expressions.
 113 Whenever it finds one, it executes
 114 the corresponding C code.
 115 .SH SOME SIMPLE EXAMPLES
 116 First some simple examples to get the flavor of how one uses
 117 .I flex.
 118 The following
 119 .I flex
 120 input specifies a scanner which whenever it encounters the string
 121 "username" will replace it with the user's login name:
 122 .nf
 123
 124     %%
 125     username    printf( "%s", getlogin() );
 126
 127 .fi
 128 By default, any text not matched by a
 129 .I flex
 130 scanner
 131 is copied to the output, so the net effect of this scanner is
 132 to copy its input file to its output with each occurrence
 133 of "username" expanded.
 134 In this input, there is just one rule.
 135 "username" is the
 136 .I pattern
 137 and the "printf" is the
 138 .I action.
 139 The "%%" marks the beginning of the rules.
 140 .PP
 141 Here's another simple example:
 142 .nf
 143
 144     %{
 145             int num_lines = 0, num_chars = 0;
 146     %}
 147
 148     %%
 149     \\n      ++num_lines; ++num_chars;
 150     .       ++num_chars;
 151
 152     %%
 153     main()
 154             {
 155             yylex();
 156             printf( "# of lines = %d, # of chars = %d\\n",
 157                     num_lines, num_chars );
 158             }
 159
 160 .fi
 161 This scanner counts the number of characters and the number
 162 of lines in its input (it produces no output other than the
 163 final report on the counts).
 164 The first line
 165 declares two globals, "num_lines" and "num_chars", which are accessible
 166 both inside
 167 .B yylex()
 168 and in the
 169 .B main()
 170 routine declared after the second "%%".
 171 There are two rules, one
 172 which matches a newline ("\\n") and increments both the line count and
 173 the character count, and one which matches any character other than
 174 a newline (indicated by the "." regular expression).
 175 .PP
 176 A somewhat more complicated example:
 177 .nf
 178
 179     /* scanner for a toy Pascal-like language */
 180
 181     %{
 182     /* need this for the call to atof() below */
 183     #include <math.h>
 184     %}
 185
 186     DIGIT    [0-9]
 187     ID       [a-z][a-z0-9]*
 188
 189     %%
 190
 191     {DIGIT}+    {
 192                 printf( "An integer: %s (%d)\\n", yytext,
 193                         atoi( yytext ) );
 194                 }
 195
 196     {DIGIT}+"."{DIGIT}*        {
 197                 printf( "A float: %s (%g)\\n", yytext,
 198                         atof( yytext ) );
 199                 }
 200
 201     if|then|begin|end|procedure|function        {
 202                 printf( "A keyword: %s\\n", yytext );
 203                 }
 204
 205     {ID}        printf( "An identifier: %s\\n", yytext );
 206
 207     "+"|"-"|"*"|"/"   printf( "An operator: %s\\n", yytext );
 208
 209     "{"[^}\\n]*"}"     /* eat up one-line comments */
 210
 211     [ \\t\\n]+          /* eat up whitespace */
 212
 213     .           printf( "Unrecognized character: %s\\n", yytext );
 214
 215     %%
 216
 217     main( argc, argv )
 218     int argc;
 219     char **argv;
 220         {
 221         ++argv, --argc;  /* skip over program name */
 222         if ( argc > 0 )
 223                 yyin = fopen( argv[0], "r" );
 224         else
 225                 yyin = stdin;
 226
 227         yylex();
 228         }
 229
 230 .fi
 231 This is the beginnings of a simple scanner for a language like
 232 Pascal.
 233 It identifies different types of
 234 .I tokens
 235 and reports on what it has seen.
 236 .PP
 237 The details of this example will be explained in the following
 238 sections.
 239 .SH FORMAT OF THE INPUT FILE
 240 The
 241 .I flex
 242 input file consists of three sections, separated by a line with just
 243 .B %%
 244 in it:
 245 .nf
 246
 247     definitions
 248     %%
 249     rules
 250     %%
 251     user code
 252
 253 .fi
 254 The
 255 .I definitions
 256 section contains declarations of simple
 257 .I name
 258 definitions to simplify the scanner specification, and declarations of
 259 .I start conditions,
 260 which are explained in a later section.
 261 .PP
 262 Name definitions have the form:
 263 .nf
 264
 265     name definition
 266
 267 .fi
 268 The "name" is a word beginning with a letter or an underscore ('_')
 269 followed by zero or more letters, digits, '_', or '-' (dash).
 270 The definition is taken to begin at the first non-white-space character
 271 following the name and continuing to the end of the line.
 272 The definition can subsequently be referred to using "{name}", which
 273 will expand to "(definition)".
 274 For example,
 275 .nf
 276
 277     DIGIT    [0-9]
 278     ID       [a-z][a-z0-9]*
 279
 280 .fi
 281 defines "DIGIT" to be a regular expression which matches a
 282 single digit, and
 283 "ID" to be a regular expression which matches a letter
 284 followed by zero-or-more letters-or-digits.
 285 A subsequent reference to
 286 .nf
 287
 288     {DIGIT}+"."{DIGIT}*
 289
 290 .fi
 291 is identical to
 292 .nf
 293
 294     ([0-9])+"."([0-9])*
 295
 296 .fi
 297 and matches one-or-more digits followed by a '.' followed
 298 by zero-or-more digits.
 299 .PP
 300 The
 301 .I rules
 302 section of the
 303 .I flex
 304 input contains a series of rules of the form:
 305 .nf
 306
 307     pattern   action
 308
 309 .fi
 310 where the pattern must be unindented and the action must begin
 311 on the same line.
 312 .PP
 313 See below for a further description of patterns and actions.
 314 .PP
 315 Finally, the user code section is simply copied to
 316 .B lex.yy.c
 317 verbatim.
 318 It is used for companion routines which call or are called
 319 by the scanner.
 320 The presence of this section is optional;
 321 if it is missing, the second
 322 .B %%
 323 in the input file may be skipped, too.
 324 .PP
 325 In the definitions and rules sections, any
 326 .I indented
 327 text or text enclosed in
 328 .B %{
 329 and
 330 .B %}
 331 is copied verbatim to the output (with the %{}'s removed).
 332 The %{}'s must appear unindented on lines by themselves.
 333 .PP
 334 In the rules section,
 335 any indented or %{} text appearing before the
 336 first rule may be used to declare variables
 337 which are local to the scanning routine and (after the declarations)
 338 code which is to be executed whenever the scanning routine is entered.
 339 Other indented or %{} text in the rule section is still copied to the output,
 340 but its meaning is not well-defined and it may well cause compile-time
 341 errors (this feature is present for
 342 .I POSIX
 343 compliance; see below for other such features).
 344 .PP
 345 In the definitions section (but not in the rules section),
 346 an unindented comment (i.e., a line
 347 beginning with "/*") is also copied verbatim to the output up
 348 to the next "*/".
 349 .SH PATTERNS
 350 The patterns in the input are written using an extended set of regular
 351 expressions.
 352 These are:
 353 .nf
 354
 355     x          match the character 'x'
 356     .          any character (byte) except newline
 357     [xyz]      a "character class"; in this case, the pattern
 358                  matches either an 'x', a 'y', or a 'z'
 359     [abj-oZ]   a "character class" with a range in it; matches
 360                  an 'a', a 'b', any letter from 'j' through 'o',
 361                  or a 'Z'
 362     [^A-Z]     a "negated character class", i.e., any character
 363                  but those in the class.  In this case, any
 364                  character EXCEPT an uppercase letter.
 365     [^A-Z\\n]   any character EXCEPT an uppercase letter or
 366                  a newline
 367     r*         zero or more r's, where r is any regular expression
 368     r+         one or more r's
 369     r?         zero or one r's (that is, "an optional r")
 370     r{2,5}     anywhere from two to five r's
 371     r{2,}      two or more r's
 372     r{4}       exactly 4 r's
 373     {name}     the expansion of the "name" definition
 374                (see above)
 375     "[xyz]\\"foo"
 376                the literal string: [xyz]"foo
 377     \\X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
 378                  then the ANSI-C interpretation of \\x.
 379                  Otherwise, a literal 'X' (used to escape
 380                  operators such as '*')
 381     \\0         a NUL character (ASCII code 0)
 382     \\123       the character with octal value 123
 383     \\x2a       the character with hexadecimal value 2a
 384     (r)        match an r; parentheses are used to override
 385                  precedence (see below)
 386
 387
 388     rs         the regular expression r followed by the
 389                  regular expression s; called "concatenation"
 390
 391
 392     r|s        either an r or an s
 393
 394
 395     r/s        an r but only if it is followed by an s.  The
 396                  text matched by s is included when determining
 397                  whether this rule is the "longest match",
 398                  but is then returned to the input before
 399                  the action is executed.  So the action only
 400                  sees the text matched by r.  This type
 401                  of pattern is called trailing context".
 402                  (There are some combinations of r/s that flex
 403                  cannot match correctly; see notes in the
 404                  Deficiencies / Bugs section below regarding
 405                  "dangerous trailing context".)
 406     ^r         an r, but only at the beginning of a line (i.e.,
 407                  when just starting to scan, or right after a
 408                  newline has been scanned).
 409     r$         an r, but only at the end of a line (i.e., just
 410                  before a newline).  Equivalent to "r/\\n".
 411
 412                Note that flex's notion of "newline" is exactly
 413                whatever the C compiler used to compile flex
 414                interprets '\\n' as; in particular, on some DOS
 415                systems you must either filter out \\r's in the
 416                input yourself, or explicitly use r/\\r\\n for "r$".
 417
 418
 419     <s>r       an r, but only in start condition s (see
 420                  below for discussion of start conditions)
 421     <s1,s2,s3>r
 422                same, but in any of start conditions s1,
 423                  s2, or s3
 424     <*>r       an r in any start condition, even an exclusive one.
 425
 426
 427     <<EOF>>    an end-of-file
 428     <s1,s2><<EOF>>
 429                an end-of-file when in start condition s1 or s2
 430
 431 .fi
 432 Note that inside of a character class, all regular expression operators
 433 lose their special meaning except escape ('\\') and the character class
 434 operators, '-', ']', and, at the beginning of the class, '^'.
 435 .PP
 436 The regular expressions listed above are grouped according to
 437 precedence, from highest precedence at the top to lowest at the bottom.
 438 Those grouped together have equal precedence.
 439 For example,
 440 .nf
 441
 442     foo|bar*
 443
 444 .fi
 445 is the same as
 446 .nf
 447
 448     (foo)|(ba(r*))
 449
 450 .fi
 451 since the '*' operator has higher precedence than concatenation,
 452 and concatenation higher than alternation ('|').
 453 This pattern
 454 therefore matches
 455 .I either
 456 the string "foo"
 457 .I or
 458 the string "ba" followed by zero-or-more r's.
 459 To match "foo" or zero-or-more "bar"'s, use:
 460 .nf
 461
 462     foo|(bar)*
 463
 464 .fi
 465 and to match zero-or-more "foo"'s-or-"bar"'s:
 466 .nf
 467
 468     (foo|bar)*
 469
 470 .fi
 471 .PP
 472 In addition to characters and ranges of characters, character classes
 473 can also contain character class
 474 .I expressions.
 475 These are expressions enclosed inside
 476 .B [:
 477 and
 478 .B :]
 479 delimiters (which themselves must appear between the '[' and ']' of the
 480 character class; other elements may occur inside the character class, too).
 481 The valid expressions are:
 482 .nf
 483
 484     [:alnum:] [:alpha:] [:blank:]
 485     [:cntrl:] [:digit:] [:graph:]
 486     [:lower:] [:print:] [:punct:]
 487     [:space:] [:upper:] [:xdigit:]
 488
 489 .fi
 490 These expressions all designate a set of characters equivalent to
 491 the corresponding standard C
 492 .B isXXX
 493 function.
 494 For example,
 495 .B [:alnum:]
 496 designates those characters for which
 497 .B isalnum()
 498 returns true - i.e., any alphabetic or numeric.
 499 Some systems don't provide
 500 .B isblank(),
 501 so flex defines
 502 .B [:blank:]
 503 as a blank or a tab.
 504 .PP
 505 For example, the following character classes are all equivalent:
 506 .nf
 507
 508     [[:alnum:]]
 509     [[:alpha:][:digit:]]
 510     [[:alpha:]0-9]
 511     [a-zA-Z0-9]
 512
 513 .fi
 514 If your scanner is case-insensitive (the
 515 .B \-i
 516 flag), then
 517 .B [:upper:]
 518 and
 519 .B [:lower:]
 520 are equivalent to
 521 .B [:alpha:].
 522 .PP
 523 Some notes on patterns:
 524 .IP -
 525 A negated character class such as the example "[^A-Z]"
 526 above
 527 .I will match a newline
 528 unless "\\n" (or an equivalent escape sequence) is one of the
 529 characters explicitly present in the negated character class
 530 (e.g., "[^A-Z\\n]").
 531 This is unlike how many other regular
 532 expression tools treat negated character classes, but unfortunately
 533 the inconsistency is historically entrenched.
 534 Matching newlines means that a pattern like [^"]* can match the entire
 535 input unless there's another quote in the input.
 536 .IP -
 537 A rule can have at most one instance of trailing context (the '/' operator
 538 or the '$' operator).
 539 The start condition, '^', and "<<EOF>>" patterns
 540 can only occur at the beginning of a pattern, and, as well as with '/' and '$',
 541 cannot be grouped inside parentheses.
 542 A '^' which does not occur at
 543 the beginning of a rule or a '$' which does not occur at the end of
 544 a rule loses its special properties and is treated as a normal character.
 545 .IP
 546 The following are illegal:
 547 .nf
 548
 549     foo/bar$
 550     <sc1>foo<sc2>bar
 551
 552 .fi
 553 Note that the first of these, can be written "foo/bar\\n".
 554 .IP
 555 The following will result in '$' or '^' being treated as a normal character:
 556 .nf
 557
 558     foo|(bar$)
 559     foo|^bar
 560
 561 .fi
 562 If what's wanted is a "foo" or a bar-followed-by-a-newline, the following
 563 could be used (the special '|' action is explained below):
 564 .nf
 565
 566     foo      |
 567     bar$     /* action goes here */
 568
 569 .fi
 570 A similar trick will work for matching a foo or a
 571 bar-at-the-beginning-of-a-line.
 572 .SH HOW THE INPUT IS MATCHED
 573 When the generated scanner is run, it analyzes its input looking
 574 for strings which match any of its patterns.
 575 If it finds more than
 576 one match, it takes the one matching the most text (for trailing
 577 context rules, this includes the length of the trailing part, even
 578 though it will then be returned to the input).
 579 If it finds two
 580 or more matches of the same length, the
 581 rule listed first in the
 582 .I flex
 583 input file is chosen.
 584 .PP
 585 Once the match is determined, the text corresponding to the match
 586 (called the
 587 .I token)
 588 is made available in the global character pointer
 589 .B yytext,
 590 and its length in the global integer
 591 .B yyleng.
 592 The
 593 .I action
 594 corresponding to the matched pattern is then executed (a more
 595 detailed description of actions follows), and then the remaining
 596 input is scanned for another match.
 597 .PP
 598 If no match is found, then the
 599 .I default rule
 600 is executed: the next character in the input is considered matched and
 601 copied to the standard output.
 602 Thus, the simplest legal
 603 .I flex
 604 input is:
 605 .nf
 606
 607     %%
 608
 609 .fi
 610 which generates a scanner that simply copies its input (one character
 611 at a time) to its output.
 612 .PP
 613 Note that
 614 .B yytext
 615 can be defined in two different ways: either as a character
 616 .I pointer
 617 or as a character
 618 .I array.
 619 You can control which definition
 620 .I flex
 621 uses by including one of the special directives
 622 .B %pointer
 623 or
 624 .B %array
 625 in the first (definitions) section of your flex input.
 626 The default is
 627 .B %pointer,
 628 unless you use the
 629 .B -l
 630 lex compatibility option, in which case
 631 .B yytext
 632 will be an array.
 633 The advantage of using
 634 .B %pointer
 635 is substantially faster scanning and no buffer overflow when matching
 636 very large tokens (unless you run out of dynamic memory).
 637 The disadvantage
 638 is that you are restricted in how your actions can modify
 639 .B yytext
 640 (see the next section), and calls to the
 641 .B unput()
 642 function destroys the present contents of
 643 .B yytext,
 644 which can be a considerable porting headache when moving between different
 645 .I lex
 646 versions.
 647 .PP
 648 The advantage of
 649 .B %array
 650 is that you can then modify
 651 .B yytext
 652 to your heart's content, and calls to
 653 .B unput()
 654 do not destroy
 655 .B yytext
 656 (see below).
 657 Furthermore, existing
 658 .I lex
 659 programs sometimes access
 660 .B yytext
 661 externally using declarations of the form:
 662 .nf
 663     extern char yytext[];
 664 .fi
 665 This definition is erroneous when used with
 666 .B %pointer,
 667 but correct for
 668 .B %array.
 669 .PP
 670 .B %array
 671 defines
 672 .B yytext
 673 to be an array of
 674 .B YYLMAX
 675 characters, which defaults to a fairly large value.
 676 You can change
 677 the size by simply #define'ing
 678 .B YYLMAX
 679 to a different value in the first section of your
 680 .I flex
 681 input.
 682 As mentioned above, with
 683 .B %pointer
 684 yytext grows dynamically to accommodate large tokens.
 685 While this means your
 686 .B %pointer
 687 scanner can accommodate very large tokens (such as matching entire blocks
 688 of comments), bear in mind that each time the scanner must resize
 689 .B yytext
 690 it also must rescan the entire token from the beginning, so matching such
 691 tokens can prove slow.
 692 .B yytext
 693 presently does
 694 .I not
 695 dynamically grow if a call to
 696 .B unput()
 697 results in too much text being pushed back; instead, a run-time error results.
 698 .PP
 699 Also note that you cannot use
 700 .B %array
 701 with C++ scanner classes
 702 (the
 703 .B c++
 704 option; see below).
 705 .SH ACTIONS
 706 Each pattern in a rule has a corresponding action, which can be any
 707 arbitrary C statement.
 708 The pattern ends at the first non-escaped
 709 whitespace character; the remainder of the line is its action.
 710 If the
 711 action is empty, then when the pattern is matched the input token
 712 is simply discarded.
 713 For example, here is the specification for a program
 714 which deletes all occurrences of "zap me" from its input:
 715 .nf
 716
 717     %%
 718     "zap me"
 719
 720 .fi
 721 (It will copy all other characters in the input to the output since
 722 they will be matched by the default rule.)
 723 .PP
 724 Here is a program which compresses multiple blanks and tabs down to
 725 a single blank, and throws away whitespace found at the end of a line:
 726 .nf
 727
 728     %%
 729     [ \\t]+        putchar( ' ' );
 730     [ \\t]+$       /* ignore this token */
 731
 732 .fi
 733 .PP
 734 If the action contains a '{', then the action spans till the balancing '}'
 735 is found, and the action may cross multiple lines.
 736 .I flex
 737 knows about C strings and comments and won't be fooled by braces found
 738 within them, but also allows actions to begin with
 739 .B %{
 740 and will consider the action to be all the text up to the next
 741 .B %}
 742 (regardless of ordinary braces inside the action).
 743 .PP
 744 An action consisting solely of a vertical bar ('|') means "same as
 745 the action for the next rule."  See below for an illustration.
 746 .PP
 747 Actions can include arbitrary C code, including
 748 .B return
 749 statements to return a value to whatever routine called
 750 .B yylex().
 751 Each time
 752 .B yylex()
 753 is called it continues processing tokens from where it last left
 754 off until it either reaches
 755 the end of the file or executes a return.
 756 .PP
 757 Actions are free to modify
 758 .B yytext
 759 except for lengthening it (adding
 760 characters to its end--these will overwrite later characters in the
 761 input stream).
 762 This however does not apply when using
 763 .B %array
 764 (see above); in that case,
 765 .B yytext
 766 may be freely modified in any way.
 767 .PP
 768 Actions are free to modify
 769 .B yyleng
 770 except they should not do so if the action also includes use of
 771 .B yymore()
 772 (see below).
 773 .PP
 774 There are a number of special directives which can be included within
 775 an action:
 776 .IP -
 777 .B ECHO
 778 copies yytext to the scanner's output.
 779 .IP -
 780 .B BEGIN
 781 followed by the name of a start condition places the scanner in the
 782 corresponding start condition (see below).
 783 .IP -
 784 .B REJECT
 785 directs the scanner to proceed on to the "second best" rule which matched the
 786 input (or a prefix of the input).
 787 The rule is chosen as described
 788 above in "How the Input is Matched", and
 789 .B yytext
 790 and
 791 .B yyleng
 792 set up appropriately.
 793 It may either be one which matched as much text
 794 as the originally chosen rule but came later in the
 795 .I flex
 796 input file, or one which matched less text.
 797 For example, the following will both count the
 798 words in the input and call the routine special() whenever "frob" is seen:
 799 .nf
 800
 801             int word_count = 0;
 802     %%
 803
 804     frob        special(); REJECT;
 805     [^ \\t\\n]+   ++word_count;
 806
 807 .fi
 808 Without the
 809 .B REJECT,
 810 any "frob"'s in the input would not be counted as words, since the
 811 scanner normally executes only one action per token.
 812 Multiple
 813 .B REJECT's
 814 are allowed, each one finding the next best choice to the currently
 815 active rule.
 816 For example, when the following scanner scans the token
 817 "abcd", it will write "abcdabcaba" to the output:
 818 .nf
 819
 820     %%
 821     a        |
 822     ab       |
 823     abc      |
 824     abcd     ECHO; REJECT;
 825     .|\\n     /* eat up any unmatched character */
 826
 827 .fi
 828 (The first three rules share the fourth's action since they use
 829 the special '|' action.)
 830 .B REJECT
 831 is a particularly expensive feature in terms of scanner performance;
 832 if it is used in
 833 .I any
 834 of the scanner's actions it will slow down
 835 .I all
 836 of the scanner's matching.
 837 Furthermore,
 838 .B REJECT
 839 cannot be used with the
 840 .I -Cf
 841 or
 842 .I -CF
 843 options (see below).
 844 .IP
 845 Note also that unlike the other special actions,
 846 .B REJECT
 847 is a
 848 .I branch;
 849 code immediately following it in the action will
 850 .I not
 851 be executed.
 852 .IP -
 853 .B yymore()
 854 tells the scanner that the next time it matches a rule, the corresponding
 855 token should be
 856 .I appended
 857 onto the current value of
 858 .B yytext
 859 rather than replacing it.
 860 For example, given the input "mega-kludge"
 861 the following will write "mega-mega-kludge" to the output:
 862 .nf
 863
 864     %%
 865     mega-    ECHO; yymore();
 866     kludge   ECHO;
 867
 868 .fi
 869 First "mega-" is matched and echoed to the output.
 870 Then "kludge"
 871 is matched, but the previous "mega-" is still hanging around at the
 872 beginning of
 873 .B yytext
 874 so the
 875 .B ECHO
 876 for the "kludge" rule will actually write "mega-kludge".
 877 .PP
 878 Two notes regarding use of
 879 .B yymore().
 880 First,
 881 .B yymore()
 882 depends on the value of
 883 .I yyleng
 884 correctly reflecting the size of the current token, so you must not
 885 modify
 886 .I yyleng
 887 if you are using
 888 .B yymore().
 889 Second, the presence of
 890 .B yymore()
 891 in the scanner's action entails a minor performance penalty in the
 892 scanner's matching speed.
 893 .IP -
 894 .B yyless(n)
 895 returns all but the first
 896 .I n
 897 characters of the current token back to the input stream, where they
 898 will be rescanned when the scanner looks for the next match.
 899 .B yytext
 900 and
 901 .B yyleng
 902 are adjusted appropriately (e.g.,
 903 .B yyleng
 904 will now be equal to
 905 .I n
 906 ).
 907 For example, on the input "foobar" the following will write out
 908 "foobarbar":
 909 .nf
 910
 911     %%
 912     foobar    ECHO; yyless(3);
 913     [a-z]+    ECHO;
 914
 915 .fi
 916 An argument of 0 to
 917 .B yyless
 918 will cause the entire current input string to be scanned again.
 919 Unless you've
 920 changed how the scanner will subsequently process its input (using
 921 .B BEGIN,
 922 for example), this will result in an endless loop.
 923 .PP
 924 Note that
 925 .B yyless
 926 is a macro and can only be used in the flex input file, not from
 927 other source files.
 928 .IP -
 929 .B unput(c)
 930 puts the character
 931 .I c
 932 back onto the input stream.
 933 It will be the next character scanned.
 934 The following action will take the current token and cause it
 935 to be rescanned enclosed in parentheses.
 936 .nf
 937
 938     {
 939     int i;
 940     /* Copy yytext because unput() trashes yytext */
 941     char *yycopy = strdup( yytext );
 942     unput( ')' );
 943     for ( i = yyleng - 1; i >= 0; --i )
 944         unput( yycopy[i] );
 945     unput( '(' );
 946     free( yycopy );
 947     }
 948
 949 .fi
 950 Note that since each
 951 .B unput()
 952 puts the given character back at the
 953 .I beginning
 954 of the input stream, pushing back strings must be done back-to-front.
 955 .PP
 956 An important potential problem when using
 957 .B unput()
 958 is that if you are using
 959 .B %pointer
 960 (the default), a call to
 961 .B unput()
 962 .I destroys
 963 the contents of
 964 .I yytext,
 965 starting with its rightmost character and devouring one character to
 966 the left with each call.
 967 If you need the value of yytext preserved
 968 after a call to
 969 .B unput()
 970 (as in the above example),
 971 you must either first copy it elsewhere, or build your scanner using
 972 .B %array
 973 instead (see How The Input Is Matched).
 974 .PP
 975 Finally, note that you cannot put back
 976 .B EOF
 977 to attempt to mark the input stream with an end-of-file.
 978 .IP -
 979 .B input()
 980 reads the next character from the input stream.
 981 For example,
 982 the following is one way to eat up C comments:
 983 .nf
 984
 985     %%
 986     "/*"        {
 987                 int c;
 988
 989                 for ( ; ; )
 990                     {
 991                     while ( (c = input()) != '*' &&
 992                             c != EOF )
 993                         ;    /* eat up text of comment */
 994
 995                     if ( c == '*' )
 996                         {
 997                         while ( (c = input()) == '*' )
 998                             ;
 999                         if ( c == '/' )
1000                             break;    /* found the end */
1001                         }
1002
1003                     if ( c == EOF )
1004                         {
1005                         error( "EOF in comment" );
1006                         break;
1007                         }
1008                     }
1009                 }
1010
1011 .fi
1012 (Note that if the scanner is compiled using
1013 .B C++,
1014 then
1015 .B input()
1016 is instead referred to as
1017 .B yyinput(),
1018 in order to avoid a name clash with the
1019 .B C++
1020 stream by the name of
1021 .I input.)
1022 .IP -
1023 .B YY_FLUSH_BUFFER
1024 flushes the scanner's internal buffer
1025 so that the next time the scanner attempts to match a token, it will
1026 first refill the buffer using
1027 .B YY_INPUT
1028 (see The Generated Scanner, below).
1029 This action is a special case
1030 of the more general
1031 .B yy_flush_buffer()
1032 function, described below in the section Multiple Input Buffers.
1033 .IP -
1034 .B yyterminate()
1035 can be used in lieu of a return statement in an action.
1036 It terminates
1037 the scanner and returns a 0 to the scanner's caller, indicating "all done".
1038 By default,
1039 .B yyterminate()
1040 is also called when an end-of-file is encountered.
1041 It is a macro and may be redefined.
1042 .SH THE GENERATED SCANNER
1043 The output of
1044 .I flex
1045 is the file
1046 .B lex.yy.c,
1047 which contains the scanning routine
1048 .B yylex(),
1049 a number of tables used by it for matching tokens, and a number
1050 of auxiliary routines and macros.
1051 By default,
1052 .B yylex()
1053 is declared as follows:
1054 .nf
1055
1056     int yylex()
1057         {
1058         ... various definitions and the actions in here ...
1059         }
1060
1061 .fi
1062 (If your environment supports function prototypes, then it will
1063 be "int yylex( void )".)  This definition may be changed by defining
1064 the "YY_DECL" macro.
1065 For example, you could use:
1066 .nf
1067
1068     #define YY_DECL float lexscan( a, b ) float a, b;
1069
1070 .fi
1071 to give the scanning routine the name
1072 .I lexscan,
1073 returning a float, and taking two floats as arguments.
1074 Note that
1075 if you give arguments to the scanning routine using a
1076 K&R-style/non-prototyped function declaration, you must terminate
1077 the definition with a semi-colon (;).
1078 .PP
1079 Whenever
1080 .B yylex()
1081 is called, it scans tokens from the global input file
1082 .I yyin
1083 (which defaults to stdin).
1084 It continues until it either reaches
1085 an end-of-file (at which point it returns the value 0) or
1086 one of its actions executes a
1087 .I return
1088 statement.
1089 .PP
1090 If the scanner reaches an end-of-file, subsequent calls are undefined
1091 unless either
1092 .I yyin
1093 is pointed at a new input file (in which case scanning continues from
1094 that file), or
1095 .B yyrestart()
1096 is called.
1097 .B yyrestart()
1098 takes one argument, a
1099 .B FILE *
1100 pointer (which can be nil, if you've set up
1101 .B YY_INPUT
1102 to scan from a source other than
1103 .I yyin),
1104 and initializes
1105 .I yyin
1106 for scanning from that file.
1107 Essentially there is no difference between
1108 just assigning
1109 .I yyin
1110 to a new input file or using
1111 .B yyrestart()
1112 to do so; the latter is available for compatibility with previous versions
1113 of
1114 .I flex,
1115 and because it can be used to switch input files in the middle of scanning.
1116 It can also be used to throw away the current input buffer, by calling
1117 it with an argument of
1118 .I yyin;
1119 but better is to use
1120 .B YY_FLUSH_BUFFER
1121 (see above).
1122 Note that
1123 .B yyrestart()
1124 does
1125 .I not
1126 reset the start condition to
1127 .B INITIAL
1128 (see Start Conditions, below).
1129 .PP
1130 If
1131 .B yylex()
1132 stops scanning due to executing a
1133 .I return
1134 statement in one of the actions, the scanner may then be called again and it
1135 will resume scanning where it left off.
1136 .PP
1137 By default (and for purposes of efficiency), the scanner uses
1138 block-reads rather than simple
1139 .I getc()
1140 calls to read characters from
1141 .I yyin.
1142 The nature of how it gets its input can be controlled by defining the
1143 .B YY_INPUT
1144 macro.
1145 YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)".
1146 Its action is to place up to
1147 .I max_size
1148 characters in the character array
1149 .I buf
1150 and return in the integer variable
1151 .I result
1152 either the
1153 number of characters read or the constant YY_NULL (0 on Unix systems)
1154 to indicate EOF.
1155 The default YY_INPUT reads from the
1156 global file-pointer "yyin".
1157 .PP
1158 A sample definition of YY_INPUT (in the definitions
1159 section of the input file):
1160 .nf
1161
1162     %{
1163     #define YY_INPUT(buf,result,max_size) \\
1164         { \\
1165         int c = getchar(); \\
1166         result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
1167         }
1168     %}
1169
1170 .fi
1171 This definition will change the input processing to occur
1172 one character at a time.
1173 .PP
1174 When the scanner receives an end-of-file indication from YY_INPUT,
1175 it then checks the
1176 .B yywrap()
1177 function.
1178 If
1179 .B yywrap()
1180 returns false (zero), then it is assumed that the
1181 function has gone ahead and set up
1182 .I yyin
1183 to point to another input file, and scanning continues.
1184 If it returns
1185 true (non-zero), then the scanner terminates, returning 0 to its
1186 caller.
1187 Note that in either case, the start condition remains unchanged;
1188 it does
1189 .I not
1190 revert to
1191 .B INITIAL.
1192 .PP
1193 If you do not supply your own version of
1194 .B yywrap(),
1195 then you must either use
1196 .B %option noyywrap
1197 (in which case the scanner behaves as though
1198 .B yywrap()
1199 returned 1), or you must link with
1200 .B \-ll
1201 to obtain the default version of the routine, which always returns 1.
1202 .PP
1203 Three routines are available for scanning from in-memory buffers rather
1204 than files:
1205 .B yy_scan_string(), yy_scan_bytes(),
1206 and
1207 .B yy_scan_buffer().
1208 See the discussion of them below in the section Multiple Input Buffers.
1209 .PP
1210 The scanner writes its
1211 .B ECHO
1212 output to the
1213 .I yyout
1214 global (default, stdout), which may be redefined by the user simply
1215 by assigning it to some other
1216 .B FILE
1217 pointer.
1218 .SH START CONDITIONS
1219 .I flex
1220 provides a mechanism for conditionally activating rules.
1221 Any rule
1222 whose pattern is prefixed with "<sc>" will only be active when
1223 the scanner is in the start condition named "sc".
1224 For example,
1225 .nf
1226
1227     <STRING>[^"]*        { /* eat up the string body ... */
1228                 ...
1229                 }
1230
1231 .fi
1232 will be active only when the scanner is in the "STRING" start
1233 condition, and
1234 .nf
1235
1236     <INITIAL,STRING,QUOTE>\\.        { /* handle an escape ... */
1237                 ...
1238                 }
1239
1240 .fi
1241 will be active only when the current start condition is
1242 either "INITIAL", "STRING", or "QUOTE".
1243 .PP
1244 Start conditions
1245 are declared in the definitions (first) section of the input
1246 using unindented lines beginning with either
1247 .B %s
1248 or
1249 .B %x
1250 followed by a list of names.
1251 The former declares
1252 .I inclusive
1253 start conditions, the latter
1254 .I exclusive
1255 start conditions.
1256 A start condition is activated using the
1257 .B BEGIN
1258 action.
1259 Until the next
1260 .B BEGIN
1261 action is executed, rules with the given start
1262 condition will be active and
1263 rules with other start conditions will be inactive.
1264 If the start condition is
1265 .I inclusive,
1266 then rules with no start conditions at all will also be active.
1267 If it is
1268 .I exclusive,
1269 then
1270 .I only
1271 rules qualified with the start condition will be active.
1272 A set of rules contingent on the same exclusive start condition
1273 describe a scanner which is independent of any of the other rules in the
1274 .I flex
1275 input.
1276 Because of this,
1277 exclusive start conditions make it easy to specify "mini-scanners"
1278 which scan portions of the input that are syntactically different
1279 from the rest (e.g., comments).
1280 .PP
1281 If the distinction between inclusive and exclusive start conditions
1282 is still a little vague, here's a simple example illustrating the
1283 connection between the two.
1284 The set of rules:
1285 .nf
1286
1287     %s example
1288     %%
1289
1290     <example>foo   do_something();
1291
1292     bar            something_else();
1293
1294 .fi
1295 is equivalent to
1296 .nf
1297
1298     %x example
1299     %%
1300
1301     <example>foo   do_something();
1302
1303     <INITIAL,example>bar    something_else();
1304
1305 .fi
1306 Without the
1307 .B <INITIAL,example>
1308 qualifier, the
1309 .I bar
1310 pattern in the second example wouldn't be active (i.e., couldn't match)
1311 when in start condition
1312 .B example.
1313 If we just used
1314 .B <example>
1315 to qualify
1316 .I bar,
1317 though, then it would only be active in
1318 .B example
1319 and not in
1320 .B INITIAL,
1321 while in the first example it's active in both, because in the first
1322 example the
1323 .B example
1324 start condition is an
1325 .I inclusive
1326 .B (%s)
1327 start condition.
1328 .PP
1329 Also note that the special start-condition specifier
1330 .B <*>
1331 matches every start condition.
1332 Thus, the above example could also have been written;
1333 .nf
1334
1335     %x example
1336     %%
1337
1338     <example>foo   do_something();
1339
1340     <*>bar    something_else();
1341
1342 .fi
1343 .PP
1344 The default rule (to
1345 .B ECHO
1346 any unmatched character) remains active in start conditions.
1347 It
1348 is equivalent to:
1349 .nf
1350
1351     <*>.|\\n     ECHO;
1352
1353 .fi
1354 .PP
1355 .B BEGIN(0)
1356 returns to the original state where only the rules with
1357 no start conditions are active.
1358 This state can also be
1359 referred to as the start-condition "INITIAL", so
1360 .B BEGIN(INITIAL)
1361 is equivalent to
1362 .B BEGIN(0).
1363 (The parentheses around the start condition name are not required but
1364 are considered good style.)
1365 .PP
1366 .B BEGIN
1367 actions can also be given as indented code at the beginning
1368 of the rules section.
1369 For example, the following will cause
1370 the scanner to enter the "SPECIAL" start condition whenever
1371 .B yylex()
1372 is called and the global variable
1373 .I enter_special
1374 is true:
1375 .nf
1376
1377             int enter_special;
1378
1379     %x SPECIAL
1380     %%
1381             if ( enter_special )
1382                 BEGIN(SPECIAL);
1383
1384     <SPECIAL>blahblahblah
1385     ...more rules follow...
1386
1387 .fi
1388 .PP
1389 To illustrate the uses of start conditions,
1390 here is a scanner which provides two different interpretations
1391 of a string like "123.456".
1392 By default it will treat it as
1393 three tokens, the integer "123", a dot ('.'), and the integer "456".
1394 But if the string is preceded earlier in the line by the string
1395 "expect-floats"
1396 it will treat it as a single token, the floating-point number
1397 123.456:
1398 .nf
1399
1400     %{
1401     #include <math.h>
1402     %}
1403     %s expect
1404
1405     %%
1406     expect-floats        BEGIN(expect);
1407
1408     <expect>[0-9]+"."[0-9]+      {
1409                 printf( "found a float, = %f\\n",
1410                         atof( yytext ) );
1411                 }
1412     <expect>\\n           {
1413                 /* that's the end of the line, so
1414                  * we need another "expect-number"
1415                  * before we'll recognize any more
1416                  * numbers
1417                  */
1418                 BEGIN(INITIAL);
1419                 }
1420
1421     [0-9]+      {
1422                 printf( "found an integer, = %d\\n",
1423                         atoi( yytext ) );
1424                 }
1425
1426     "."         printf( "found a dot\\n" );
1427
1428 .fi
1429 Here is a scanner which recognizes (and discards) C comments while
1430 maintaining a count of the current input line.
1431 .nf
1432
1433     %x comment
1434     %%
1435             int line_num = 1;
1436
1437     "/*"         BEGIN(comment);
1438
1439     <comment>[^*\\n]*        /* eat anything that's not a '*' */
1440     <comment>"*"+[^*/\\n]*   /* eat up '*'s not followed by '/'s */
1441     <comment>\\n             ++line_num;
1442     <comment>"*"+"/"        BEGIN(INITIAL);
1443
1444 .fi
1445 This scanner goes to a bit of trouble to match as much
1446 text as possible with each rule.
1447 In general, when attempting to write
1448 a high-speed scanner try to match as much possible in each rule, as
1449 it's a big win.
1450 .PP
1451 Note that start-conditions names are really integer values and
1452 can be stored as such.
1453 Thus, the above could be extended in the
1454 following fashion:
1455 .nf
1456
1457     %x comment foo
1458     %%
1459             int line_num = 1;
1460             int comment_caller;
1461
1462     "/*"         {
1463                  comment_caller = INITIAL;
1464                  BEGIN(comment);
1465                  }
1466
1467     ...
1468
1469     <foo>"/*"    {
1470                  comment_caller = foo;
1471                  BEGIN(comment);
1472                  }
1473
1474     <comment>[^*\\n]*        /* eat anything that's not a '*' */
1475     <comment>"*"+[^*/\\n]*   /* eat up '*'s not followed by '/'s */
1476     <comment>\\n             ++line_num;
1477     <comment>"*"+"/"        BEGIN(comment_caller);
1478
1479 .fi
1480 Furthermore, you can access the current start condition using
1481 the integer-valued
1482 .B YY_START
1483 macro.
1484 For example, the above assignments to
1485 .I comment_caller
1486 could instead be written
1487 .nf
1488
1489     comment_caller = YY_START;
1490
1491 .fi
1492 Flex provides
1493 .B YYSTATE
1494 as an alias for
1495 .B YY_START
1496 (since that is what's used by AT&T
1497 .I lex).
1498 .PP
1499 Note that start conditions do not have their own name-space; %s's and %x's
1500 declare names in the same fashion as #define's.
1501 .PP
1502 Finally, here's an example of how to match C-style quoted strings using
1503 exclusive start conditions, including expanded escape sequences (but
1504 not including checking for a string that's too long):
1505 .nf
1506
1507     %x str
1508
1509     %%
1510             char string_buf[MAX_STR_CONST];
1511             char *string_buf_ptr;
1512
1513
1514     \\"      string_buf_ptr = string_buf; BEGIN(str);
1515
1516     <str>\\"        { /* saw closing quote - all done */
1517             BEGIN(INITIAL);
1518             *string_buf_ptr = '\\0';
1519             /* return string constant token type and
1520              * value to parser
1521              */
1522             }
1523
1524     <str>\\n        {
1525             /* error - unterminated string constant */
1526             /* generate error message */
1527             }
1528
1529     <str>\\\\[0-7]{1,3} {
1530             /* octal escape sequence */
1531             int result;
1532
1533             (void) sscanf( yytext + 1, "%o", &result );
1534
1535             if ( result > 0xff )
1536                     /* error, constant is out-of-bounds */
1537
1538             *string_buf_ptr++ = result;
1539             }
1540
1541     <str>\\\\[0-9]+ {
1542             /* generate error - bad escape sequence; something
1543              * like '\\48' or '\\0777777'
1544              */
1545             }
1546
1547     <str>\\\\n  *string_buf_ptr++ = '\\n';
1548     <str>\\\\t  *string_buf_ptr++ = '\\t';
1549     <str>\\\\r  *string_buf_ptr++ = '\\r';
1550     <str>\\\\b  *string_buf_ptr++ = '\\b';
1551     <str>\\\\f  *string_buf_ptr++ = '\\f';
1552
1553     <str>\\\\(.|\\n)  *string_buf_ptr++ = yytext[1];
1554
1555     <str>[^\\\\\\n\\"]+        {
1556             char *yptr = yytext;
1557
1558             while ( *yptr )
1559                     *string_buf_ptr++ = *yptr++;
1560             }
1561
1562 .fi
1563 .PP
1564 Often, such as in some of the examples above, you wind up writing a
1565 whole bunch of rules all preceded by the same start condition(s).
1566 Flex makes this a little easier and cleaner by introducing a notion of
1567 start condition
1568 .I scope.
1569 A start condition scope is begun with:
1570 .nf
1571
1572     <SCs>{
1573
1574 .fi
1575 where
1576 .I SCs
1577 is a list of one or more start conditions.
1578 Inside the start condition
1579 scope, every rule automatically has the prefix
1580 .I <SCs>
1581 applied to it, until a
1582 .I '}'
1583 which matches the initial
1584 .I '{'.
1585 So, for example,
1586 .nf
1587
1588     <ESC>{
1589         "\\\\n"   return '\\n';
1590         "\\\\r"   return '\\r';
1591         "\\\\f"   return '\\f';
1592         "\\\\0"   return '\\0';
1593     }
1594
1595 .fi
1596 is equivalent to:
1597 .nf
1598
1599     <ESC>"\\\\n"  return '\\n';
1600     <ESC>"\\\\r"  return '\\r';
1601     <ESC>"\\\\f"  return '\\f';
1602     <ESC>"\\\\0"  return '\\0';
1603
1604 .fi
1605 Start condition scopes may be nested.
1606 .PP
1607 Three routines are available for manipulating stacks of start conditions:
1608 .TP
1609 .B void yy_push_state(int new_state)
1610 pushes the current start condition onto the top of the start condition
1611 stack and switches to
1612 .I new_state
1613 as though you had used
1614 .B BEGIN new_state
1615 (recall that start condition names are also integers).
1616 .TP
1617 .B void yy_pop_state()
1618 pops the top of the stack and switches to it via
1619 .B BEGIN.
1620 .TP
1621 .B int yy_top_state()
1622 returns the top of the stack without altering the stack's contents.
1623 .PP
1624 The start condition stack grows dynamically and so has no built-in
1625 size limitation.
1626 If memory is exhausted, program execution aborts.
1627 .PP
1628 To use start condition stacks, your scanner must include a
1629 .B %option stack
1630 directive (see Options below).
1631 .SH MULTIPLE INPUT BUFFERS
1632 Some scanners (such as those which support "include" files)
1633 require reading from several input streams.
1634 As
1635 .I flex
1636 scanners do a large amount of buffering, one cannot control
1637 where the next input will be read from by simply writing a
1638 .B YY_INPUT
1639 which is sensitive to the scanning context.
1640 .B YY_INPUT
1641 is only called when the scanner reaches the end of its buffer, which
1642 may be a long time after scanning a statement such as an "include"
1643 which requires switching the input source.
1644 .PP
1645 To negotiate these sorts of problems,
1646 .I flex
1647 provides a mechanism for creating and switching between multiple
1648 input buffers.
1649 An input buffer is created by using:
1650 .nf
1651
1652     YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
1653
1654 .fi
1655 which takes a
1656 .I FILE
1657 pointer and a size and creates a buffer associated with the given
1658 file and large enough to hold
1659 .I size
1660 characters (when in doubt, use
1661 .B YY_BUF_SIZE
1662 for the size).
1663 It returns a
1664 .B YY_BUFFER_STATE
1665 handle, which may then be passed to other routines (see below).
1666 The
1667 .B YY_BUFFER_STATE
1668 type is a pointer to an opaque
1669 .B struct yy_buffer_state
1670 structure, so you may safely initialize YY_BUFFER_STATE variables to
1671 .B ((YY_BUFFER_STATE) 0)
1672 if you wish, and also refer to the opaque structure in order to
1673 correctly declare input buffers in source files other than that
1674 of your scanner.
1675 Note that the
1676 .I FILE
1677 pointer in the call to
1678 .B yy_create_buffer
1679 is only used as the value of
1680 .I yyin
1681 seen by
1682 .B YY_INPUT;
1683 if you redefine
1684 .B YY_INPUT
1685 so it no longer uses
1686 .I yyin,
1687 then you can safely pass a nil
1688 .I FILE
1689 pointer to
1690 .B yy_create_buffer.
1691 You select a particular buffer to scan from using:
1692 .nf
1693
1694     void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
1695
1696 .fi
1697 switches the scanner's input buffer so subsequent tokens will
1698 come from
1699 .I new_buffer.
1700 Note that
1701 .B yy_switch_to_buffer()
1702 may be used by yywrap() to set things up for continued scanning, instead
1703 of opening a new file and pointing
1704 .I yyin
1705 at it.
1706 Note also that switching input sources via either
1707 .B yy_switch_to_buffer()
1708 or
1709 .B yywrap()
1710 does
1711 .I not
1712 change the start condition.
1713 .nf
1714
1715     void yy_delete_buffer( YY_BUFFER_STATE buffer )
1716
1717 .fi
1718 is used to reclaim the storage associated with a buffer.
1719 (
1720 .B buffer
1721 can be nil, in which case the routine does nothing.)
1722 You can also clear the current contents of a buffer using:
1723 .nf
1724
1725     void yy_flush_buffer( YY_BUFFER_STATE buffer )
1726
1727 .fi
1728 This function discards the buffer's contents,
1729 so the next time the scanner attempts to match a token from the
1730 buffer, it will first fill the buffer anew using
1731 .B YY_INPUT.
1732 .PP
1733 .B yy_new_buffer()
1734 is an alias for
1735 .B yy_create_buffer(),
1736 provided for compatibility with the C++ use of
1737 .I new
1738 and
1739 .I delete
1740 for creating and destroying dynamic objects.
1741 .PP
1742 Finally, the
1743 .B YY_CURRENT_BUFFER
1744 macro returns a
1745 .B YY_BUFFER_STATE
1746 handle to the current buffer.
1747 .PP
1748 Here is an example of using these features for writing a scanner
1749 which expands include files (the
1750 .B <<EOF>>
1751 feature is discussed below):
1752 .nf
1753
1754     /* the "incl" state is used for picking up the name
1755      * of an include file
1756      */
1757     %x incl
1758
1759     %{
1760     #define MAX_INCLUDE_DEPTH 10
1761     YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
1762     int include_stack_ptr = 0;
1763     %}
1764
1765     %%
1766     include             BEGIN(incl);
1767
1768     [a-z]+              ECHO;
1769     [^a-z\\n]*\\n?        ECHO;
1770
1771     <incl>[ \\t]*      /* eat the whitespace */
1772     <incl>[^ \\t\\n]+   { /* got the include file name */
1773             if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
1774                 {
1775                 fprintf( stderr, "Includes nested too deeply" );
1776                 exit( 1 );
1777                 }
1778
1779             include_stack[include_stack_ptr++] =
1780                 YY_CURRENT_BUFFER;
1781
1782             yyin = fopen( yytext, "r" );
1783
1784             if ( ! yyin )
1785                 error( ... );
1786
1787             yy_switch_to_buffer(
1788                 yy_create_buffer( yyin, YY_BUF_SIZE ) );
1789
1790             BEGIN(INITIAL);
1791             }
1792
1793     <<EOF>> {
1794             if ( --include_stack_ptr < 0 )
1795                 {
1796                 yyterminate();
1797                 }
1798
1799             else
1800                 {
1801                 yy_delete_buffer( YY_CURRENT_BUFFER );
1802                 yy_switch_to_buffer(
1803                      include_stack[include_stack_ptr] );
1804                 }
1805             }
1806
1807 .fi
1808 Three routines are available for setting up input buffers for
1809 scanning in-memory strings instead of files.
1810 All of them create
1811 a new input buffer for scanning the string, and return a corresponding
1812 .B YY_BUFFER_STATE
1813 handle (which you should delete with
1814 .B yy_delete_buffer()
1815 when done with it).
1816 They also switch to the new buffer using
1817 .B yy_switch_to_buffer(),
1818 so the next call to
1819 .B yylex()
1820 will start scanning the string.
1821 .TP
1822 .B yy_scan_string(const char *str)
1823 scans a NUL-terminated string.
1824 .TP
1825 .B yy_scan_bytes(const char *bytes, int len)
1826 scans
1827 .I len
1828 bytes (including possibly NUL's)
1829 starting at location
1830 .I bytes.
1831 .PP
1832 Note that both of these functions create and scan a
1833 .I copy
1834 of the string or bytes.
1835 (This may be desirable, since
1836 .B yylex()
1837 modifies the contents of the buffer it is scanning.)  You can avoid the
1838 copy by using:
1839 .TP
1840 .B yy_scan_buffer(char *base, yy_size_t size)
1841 which scans in place the buffer starting at
1842 .I base,
1843 consisting of
1844 .I size
1845 bytes, the last two bytes of which
1846 .I must
1847 be
1848 .B YY_END_OF_BUFFER_CHAR
1849 (ASCII NUL).
1850 These last two bytes are not scanned; thus, scanning
1851 consists of
1852 .B base[0]
1853 through
1854 .B base[size-2],
1855 inclusive.
1856 .IP
1857 If you fail to set up
1858 .I base
1859 in this manner (i.e., forget the final two
1860 .B YY_END_OF_BUFFER_CHAR
1861 bytes), then
1862 .B yy_scan_buffer()
1863 returns a nil pointer instead of creating a new input buffer.
1864 .IP
1865 The type
1866 .B yy_size_t
1867 is an integral type to which you can cast an integer expression
1868 reflecting the size of the buffer.
1869 .SH END-OF-FILE RULES
1870 The special rule "<<EOF>>" indicates
1871 actions which are to be taken when an end-of-file is
1872 encountered and yywrap() returns non-zero (i.e., indicates
1873 no further files to process).
1874 The action must finish
1875 by doing one of four things:
1876 .IP -
1877 assigning
1878 .I yyin
1879 to a new input file (in previous versions of flex, after doing the
1880 assignment you had to call the special action
1881 .B YY_NEW_FILE;
1882 this is no longer necessary);
1883 .IP -
1884 executing a
1885 .I return
1886 statement;
1887 .IP -
1888 executing the special
1889 .B yyterminate()
1890 action;
1891 .IP -
1892 or, switching to a new buffer using
1893 .B yy_switch_to_buffer()
1894 as shown in the example above.
1895 .PP
1896 <<EOF>> rules may not be used with other
1897 patterns; they may only be qualified with a list of start
1898 conditions.
1899 If an unqualified <<EOF>> rule is given, it
1900 applies to
1901 .I all
1902 start conditions which do not already have <<EOF>> actions.
1903 To
1904 specify an <<EOF>> rule for only the initial start condition, use
1905 .nf
1906
1907     <INITIAL><<EOF>>
1908
1909 .fi
1910 .PP
1911 These rules are useful for catching things like unclosed comments.
1912 An example:
1913 .nf
1914
1915     %x quote
1916     %%
1917
1918     ...other rules for dealing with quotes...
1919
1920     <quote><<EOF>>   {
1921              error( "unterminated quote" );
1922              yyterminate();
1923              }
1924     <<EOF>>  {
1925              if ( *++filelist )
1926                  yyin = fopen( *filelist, "r" );
1927              else
1928                 yyterminate();
1929              }
1930
1931 .fi
1932 .SH MISCELLANEOUS MACROS
1933 The macro
1934 .B YY_USER_ACTION
1935 can be defined to provide an action
1936 which is always executed prior to the matched rule's action.
1937 For example,
1938 it could be #define'd to call a routine to convert yytext to lower-case.
1939 When
1940 .B YY_USER_ACTION
1941 is invoked, the variable
1942 .I yy_act
1943 gives the number of the matched rule (rules are numbered starting with 1).
1944 Suppose you want to profile how often each of your rules is matched.
1945 The following would do the trick:
1946 .nf
1947
1948     #define YY_USER_ACTION ++ctr[yy_act]
1949
1950 .fi
1951 where
1952 .I ctr
1953 is an array to hold the counts for the different rules.
1954 Note that the macro
1955 .B YY_NUM_RULES
1956 gives the total number of rules (including the default rule, even if
1957 you use
1958 .B \-s),
1959 so a correct declaration for
1960 .I ctr
1961 is:
1962 .nf
1963
1964     int ctr[YY_NUM_RULES];
1965
1966 .fi
1967 .PP
1968 The macro
1969 .B YY_USER_INIT
1970 may be defined to provide an action which is always executed before
1971 the first scan (and before the scanner's internal initializations are done).
1972 For example, it could be used to call a routine to read
1973 in a data table or open a logging file.
1974 .PP
1975 The macro
1976 .B yy_set_interactive(is_interactive)
1977 can be used to control whether the current buffer is considered
1978 .I interactive.
1979 An interactive buffer is processed more slowly,
1980 but must be used when the scanner's input source is indeed
1981 interactive to avoid problems due to waiting to fill buffers
1982 (see the discussion of the
1983 .B \-I
1984 flag below).
1985 A non-zero value
1986 in the macro invocation marks the buffer as interactive, a zero
1987 value as non-interactive.
1988 Note that use of this macro overrides
1989 .B %option interactive ,
1990 .B %option always-interactive
1991 or
1992 .B %option never-interactive
1993 (see Options below).
1994 .B yy_set_interactive()
1995 must be invoked prior to beginning to scan the buffer that is
1996 (or is not) to be considered interactive.
1997 .PP
1998 The macro
1999 .B yy_set_bol(at_bol)
2000 can be used to control whether the current buffer's scanning
2001 context for the next token match is done as though at the
2002 beginning of a line.
2003 A non-zero macro argument makes rules anchored with
2004 '^' active, while a zero argument makes '^' rules inactive.
2005 .PP
2006 The macro
2007 .B YY_AT_BOL()
2008 returns true if the next token scanned from the current buffer
2009 will have '^' rules active, false otherwise.
2010 .PP
2011 In the generated scanner, the actions are all gathered in one large
2012 switch statement and separated using
2013 .B YY_BREAK,
2014 which may be redefined.
2015 By default, it is simply a "break", to separate
2016 each rule's action from the following rule's.
2017 Redefining
2018 .B YY_BREAK
2019 allows, for example, C++ users to
2020 #define YY_BREAK to do nothing (while being very careful that every
2021 rule ends with a "break" or a "return"!) to avoid suffering from
2022 unreachable statement warnings where because a rule's action ends with
2023 "return", the
2024 .B YY_BREAK
2025 is inaccessible.
2026 .SH VALUES AVAILABLE TO THE USER
2027 This section summarizes the various values available to the user
2028 in the rule actions.
2029 .IP -
2030 .B char *yytext
2031 holds the text of the current token.
2032 It may be modified but not lengthened
2033 (you cannot append characters to the end).
2034 .IP
2035 If the special directive
2036 .B %array
2037 appears in the first section of the scanner description, then
2038 .B yytext
2039 is instead declared
2040 .B char yytext[YYLMAX],
2041 where
2042 .B YYLMAX
2043 is a macro definition that you can redefine in the first section
2044 if you don't like the default value (generally 8KB).
2045 Using
2046 .B %array
2047 results in somewhat slower scanners, but the value of
2048 .B yytext
2049 becomes immune to calls to
2050 .I input()
2051 and
2052 .I unput(),
2053 which potentially destroy its value when
2054 .B yytext
2055 is a character pointer.
2056 The opposite of
2057 .B %array
2058 is
2059 .B %pointer,
2060 which is the default.
2061 .IP
2062 You cannot use
2063 .B %array
2064 when generating C++ scanner classes
2065 (the
2066 .B \-+
2067 flag).
2068 .IP -
2069 .B int yyleng
2070 holds the length of the current token.
2071 .IP -
2072 .B FILE *yyin
2073 is the file which by default
2074 .I flex
2075 reads from.
2076 It may be redefined but doing so only makes sense before
2077 scanning begins or after an EOF has been encountered.
2078 Changing it in the midst of scanning will have unexpected results since
2079 .I flex
2080 buffers its input; use
2081 .B yyrestart()
2082 instead.
2083 Once scanning terminates because an end-of-file
2084 has been seen, you can assign
2085 .I yyin
2086 at the new input file and then call the scanner again to continue scanning.
2087 .IP -
2088 .B void yyrestart( FILE *new_file )
2089 may be called to point
2090 .I yyin
2091 at the new input file.
2092 The switch-over to the new file is immediate
2093 (any previously buffered-up input is lost).
2094 Note that calling
2095 .B yyrestart()
2096 with
2097 .I yyin
2098 as an argument thus throws away the current input buffer and continues
2099 scanning the same input file.
2100 .IP -
2101 .B FILE *yyout
2102 is the file to which
2103 .B ECHO
2104 actions are done.
2105 It can be reassigned by the user.
2106 .IP -
2107 .B YY_CURRENT_BUFFER
2108 returns a
2109 .B YY_BUFFER_STATE
2110 handle to the current buffer.
2111 .IP -
2112 .B YY_START
2113 returns an integer value corresponding to the current start
2114 condition.
2115 You can subsequently use this value with
2116 .B BEGIN
2117 to return to that start condition.
2118 .SH INTERFACING WITH YACC
2119 One of the main uses of
2120 .I flex
2121 is as a companion to the
2122 .I yacc
2123 parser-generator.
2124 .I yacc
2125 parsers expect to call a routine named
2126 .B yylex()
2127 to find the next input token.
2128 The routine is supposed to
2129 return the type of the next token as well as putting any associated
2130 value in the global
2131 .B yylval.
2132 To use
2133 .I flex
2134 with
2135 .I yacc,
2136 one specifies the
2137 .B \-d
2138 option to
2139 .I yacc
2140 to instruct it to generate the file
2141 .B y.tab.h
2142 containing definitions of all the
2143 .B %tokens
2144 appearing in the
2145 .I yacc
2146 input.
2147 This file is then included in the
2148 .I flex
2149 scanner.
2150 For example, if one of the tokens is "TOK_NUMBER",
2151 part of the scanner might look like:
2152 .nf
2153
2154     %{
2155     #include "y.tab.h"
2156     %}
2157
2158     %%
2159
2160     [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
2161
2162 .fi
2163 .SH OPTIONS
2164 .I flex
2165 has the following options:
2166 .TP
2167 .B \-b, --backup
2168 Generate backing-up information to
2169 .I lex.backup.
2170 This is a list of scanner states which require backing up
2171 and the input characters on which they do so.
2172 By adding rules one
2173 can remove backing-up states.
2174 If
2175 .I all
2176 backing-up states are eliminated and
2177 .B \-Cf
2178 or
2179 .B \-CF
2180 is used, the generated scanner will run faster (see the
2181 .B \-p
2182 flag).
2183 Only users who wish to squeeze every last cycle out of their
2184 scanners need worry about this option.
2185 (See the section on Performance Considerations below.)
2186 .TP
2187 .B \-c
2188 is a do-nothing, deprecated option included for POSIX compliance.
2189 .TP
2190 .B \-d, \-\-debug
2191 makes the generated scanner run in
2192 .I debug
2193 mode.
2194 Whenever a pattern is recognized and the global
2195 .B yy_flex_debug
2196 is non-zero (which is the default),
2197 the scanner will write to
2198 .I stderr
2199 a line of the form:
2200 .nf
2201
2202     --accepting rule at line 53 ("the matched text")
2203
2204 .fi
2205 The line number refers to the location of the rule in the file
2206 defining the scanner (i.e., the file that was fed to flex).
2207 Messages are also generated when the scanner backs up, accepts the
2208 default rule, reaches the end of its input buffer (or encounters
2209 a NUL; at this point, the two look the same as far as the scanner's concerned),
2210 or reaches an end-of-file.
2211 .TP
2212 .B \-f, \-\-full
2213 specifies
2214 .I fast scanner.
2215 No table compression is done and stdio is bypassed.
2216 The result is large but fast.
2217 This option is equivalent to
2218 .B \-Cfr
2219 (see below).
2220 .TP
2221 .B \-h, \-\-help
2222 generates a "help" summary of
2223 .I flex's
2224 options to
2225 .I stdout
2226 and then exits.
2227 .B \-?
2228 and
2229 .B \-\-help
2230 are synonyms for
2231 .B \-h.
2232 .TP
2233 .B \-i, \-\-case-insensitive
2234 instructs
2235 .I flex
2236 to generate a
2237 .I case-insensitive
2238 scanner.
2239 The case of letters given in the
2240 .I flex
2241 input patterns will
2242 be ignored, and tokens in the input will be matched regardless of case.
2243 The matched text given in
2244 .I yytext
2245 will have the preserved case (i.e., it will not be folded).
2246 .TP
2247 .B \-l, \-\-lex\-compat
2248 turns on maximum compatibility with the original AT&T
2249 .I lex
2250 implementation.
2251 Note that this does not mean
2252 .I full
2253 compatibility.
2254 Use of this option costs a considerable amount of
2255 performance, and it cannot be used with the
2256 .B \-+, -f, -F, -Cf,
2257 or
2258 .B -CF
2259 options.
2260 For details on the compatibilities it provides, see the section
2261 "Incompatibilities With Lex And POSIX" below.
2262 This option also results
2263 in the name
2264 .B YY_FLEX_LEX_COMPAT
2265 being #define'd in the generated scanner.
2266 .TP
2267 .B \-n
2268 is another do-nothing, deprecated option included only for
2269 POSIX compliance.
2270 .TP
2271 .B \-p, \-\-perf\-report
2272 generates a performance report to stderr.
2273 The report consists of comments regarding features of the
2274 .I flex
2275 input file which will cause a serious loss of performance in the resulting
2276 scanner.
2277 If you give the flag twice, you will also get comments regarding
2278 features that lead to minor performance losses.
2279 .IP
2280 Note that the use of
2281 .B REJECT,
2282 .B %option yylineno,
2283 and variable trailing context (see the Deficiencies / Bugs section below)
2284 entails a substantial performance penalty; use of
2285 .I yymore(),
2286 the
2287 .B ^
2288 operator,
2289 and the
2290 .B \-I
2291 flag entail minor performance penalties.
2292 .TP
2293 .B \-s, \-\-no\-default
2294 causes the
2295 .I default rule
2296 (that unmatched scanner input is echoed to
2297 .I stdout)
2298 to be suppressed.
2299 If the scanner encounters input that does not
2300 match any of its rules, it aborts with an error.
2301 This option is
2302 useful for finding holes in a scanner's rule set.
2303 .TP
2304 .B \-t, \-\-stdout
2305 instructs
2306 .I flex
2307 to write the scanner it generates to standard output instead
2308 of
2309 .B lex.yy.c.
2310 .TP
2311 .B \-v, \-\-verbose
2312 specifies that
2313 .I flex
2314 should write to
2315 .I stderr
2316 a summary of statistics regarding the scanner it generates.
2317 Most of the statistics are meaningless to the casual
2318 .I flex
2319 user, but the first line identifies the version of
2320 .I flex
2321 (same as reported by
2322 .B \-V),
2323 and the next line the flags used when generating the scanner, including
2324 those that are on by default.
2325 .TP
2326 .B \-w, \-\-nowarn
2327 suppresses warning messages.
2328 .TP
2329 .B \-B, \-\-batch
2330 instructs
2331 .I flex
2332 to generate a
2333 .I batch
2334 scanner, the opposite of
2335 .I interactive
2336 scanners generated by
2337 .B \-I
2338 (see below).
2339 In general, you use
2340 .B \-B
2341 when you are
2342 .I certain
2343 that your scanner will never be used interactively, and you want to
2344 squeeze a
2345 .I little
2346 more performance out of it.
2347 If your goal is instead to squeeze out a
2348 .I lot
2349 more performance, you should be using the
2350 .B \-Cf
2351 or
2352 .B \-CF
2353 options (discussed below), which turn on
2354 .B \-B
2355 automatically anyway.
2356 .TP
2357 .B \-F, \-\-fast
2358 specifies that the
2359 .ul
2360 fast
2361 scanner table representation should be used (and stdio
2362 bypassed).
2363 This representation is about as fast as the full table representation
2364 .B (-f),
2365 and for some sets of patterns will be considerably smaller (and for
2366 others, larger).
2367 In general, if the pattern set contains both "keywords"
2368 and a catch-all, "identifier" rule, such as in the set:
2369 .nf
2370
2371     "case"    return TOK_CASE;
2372     "switch"  return TOK_SWITCH;
2373     ...
2374     "default" return TOK_DEFAULT;
2375     [a-z]+    return TOK_ID;
2376
2377 .fi
2378 then you're better off using the full table representation.
2379 If only
2380 the "identifier" rule is present and you then use a hash table or some such
2381 to detect the keywords, you're better off using
2382 .B -F.
2383 .IP
2384 This option is equivalent to
2385 .B \-CFr
2386 (see below).
2387 It cannot be used with
2388 .B \-+.
2389 .TP
2390 .B \-I, \-\-interactive
2391 instructs
2392 .I flex
2393 to generate an
2394 .I interactive
2395 scanner.
2396 An interactive scanner is one that only looks ahead to decide
2397 what token has been matched if it absolutely must.
2398 It turns out that
2399 always looking one extra character ahead, even if the scanner has already
2400 seen enough text to disambiguate the current token, is a bit faster than
2401 only looking ahead when necessary.
2402 But scanners that always look ahead
2403 give dreadful interactive performance; for example, when a user types
2404 a newline, it is not recognized as a newline token until they enter
2405 .I another
2406 token, which often means typing in another whole line.
2407 .IP
2408 .I Flex
2409 scanners default to
2410 .I interactive
2411 unless you use the
2412 .B \-Cf
2413 or
2414 .B \-CF
2415 table-compression options (see below).
2416 That's because if you're looking
2417 for high-performance you should be using one of these options, so if you
2418 didn't,
2419 .I flex
2420 assumes you'd rather trade off a bit of run-time performance for intuitive
2421 interactive behavior.
2422 Note also that you
2423 .I cannot
2424 use
2425 .B \-I
2426 in conjunction with
2427 .B \-Cf
2428 or
2429 .B \-CF.
2430 Thus, this option is not really needed; it is on by default for all those
2431 cases in which it is allowed.
2432 .IP
2433 Note that if
2434 .B isatty()
2435 returns false for the scanner input, flex will revert to batch mode, even if
2436 .B \-I
2437 was specified.
2438 To force interactive mode no matter what, use
2439 .B %option always-interactive
2440 (see Options below).
2441 .IP
2442 You can force a scanner to
2443 .I not
2444 be interactive by using
2445 .B \-B
2446 (see above).
2447 .TP
2448 .B \-L, \-\-noline
2449 instructs
2450 .I flex
2451 not to generate
2452 .B #line
2453 directives.
2454 Without this option,
2455 .I flex
2456 peppers the generated scanner
2457 with #line directives so error messages in the actions will be correctly
2458 located with respect to either the original
2459 .I flex
2460 input file (if the errors are due to code in the input file), or
2461 .B lex.yy.c
2462 (if the errors are
2463 .I flex's
2464 fault -- you should report these sorts of errors to the email address
2465 given below).
2466 .TP
2467 .B \-T, \-\-trace
2468 makes
2469 .I flex
2470 run in
2471 .I trace
2472 mode.
2473 It will generate a lot of messages to
2474 .I stderr
2475 concerning
2476 the form of the input and the resultant non-deterministic and deterministic
2477 finite automata.
2478 This option is mostly for use in maintaining
2479 .I flex.
2480 .TP
2481 .B \-V, \-\-version
2482 prints the version number to
2483 .I stdout
2484 and exits.
2485 .B \-\-version
2486 is a synonym for
2487 .B \-V.
2488 .TP
2489 .B \-7, \-\-7bit
2490 instructs
2491 .I flex
2492 to generate a 7-bit scanner, i.e., one which can only recognize 7-bit
2493 characters in its input.
2494 The advantage of using
2495 .B \-7
2496 is that the scanner's tables can be up to half the size of those generated
2497 using the
2498 .B \-8
2499 option (see below).
2500 The disadvantage is that such scanners often hang
2501 or crash if their input contains an 8-bit character.
2502 .IP
2503 Note, however, that unless you generate your scanner using the
2504 .B \-Cf
2505 or
2506 .B \-CF
2507 table compression options, use of
2508 .B \-7
2509 will save only a small amount of table space, and make your scanner
2510 considerably less portable.
2511 .I Flex's
2512 default behavior is to generate an 8-bit scanner unless you use the
2513 .B \-Cf
2514 or
2515 .B \-CF,
2516 in which case
2517 .I flex
2518 defaults to generating 7-bit scanners unless your site was always
2519 configured to generate 8-bit scanners (as will often be the case
2520 with non-USA sites).
2521 You can tell whether flex generated a 7-bit
2522 or an 8-bit scanner by inspecting the flag summary in the
2523 .B \-v
2524 output as described above.
2525 .IP
2526 Note that if you use
2527 .B \-Cfe
2528 or
2529 .B \-CFe
2530 (those table compression options, but also using equivalence classes as
2531 discussed see below), flex still defaults to generating an 8-bit
2532 scanner, since usually with these compression options full 8-bit tables
2533 are not much more expensive than 7-bit tables.
2534 .TP
2535 .B \-8, \-\-8bit
2536 instructs
2537 .I flex
2538 to generate an 8-bit scanner, i.e., one which can recognize 8-bit
2539 characters.
2540 This flag is only needed for scanners generated using
2541 .B \-Cf
2542 or
2543 .B \-CF,
2544 as otherwise flex defaults to generating an 8-bit scanner anyway.
2545 .IP
2546 See the discussion of
2547 .B \-7
2548 above for flex's default behavior and the tradeoffs between 7-bit
2549 and 8-bit scanners.
2550 .TP
2551 .B \-+, \-\-c++
2552 specifies that you want flex to generate a C++
2553 scanner class.
2554 See the section on Generating C++ Scanners below for
2555 details.
2556 .TP
2557 .B \-C[aefFmr]
2558 controls the degree of table compression and, more generally, trade-offs
2559 between small scanners and fast scanners.
2560 .IP
2561 .B \-Ca, \-\-align
2562 ("align") instructs flex to trade off larger tables in the
2563 generated scanner for faster performance because the elements of
2564 the tables are better aligned for memory access and computation.
2565 On some
2566 RISC architectures, fetching and manipulating longwords is more efficient
2567 than with smaller-sized units such as shortwords.
2568 This option can
2569 double the size of the tables used by your scanner.
2570 .IP
2571 .B \-Ce, \-\-ecs
2572 directs
2573 .I flex
2574 to construct
2575 .I equivalence classes,
2576 i.e., sets of characters
2577 which have identical lexical properties (for example, if the only
2578 appearance of digits in the
2579 .I flex
2580 input is in the character class
2581 "[0-9]" then the digits '0', '1', ..., '9' will all be put
2582 in the same equivalence class).
2583 Equivalence classes usually give
2584 dramatic reductions in the final table/object file sizes (typically
2585 a factor of 2-5) and are pretty cheap performance-wise (one array
2586 look-up per character scanned).
2587 .IP
2588 .B \-Cf
2589 specifies that the
2590 .I full
2591 scanner tables should be generated -
2592 .I flex
2593 should not compress the
2594 tables by taking advantages of similar transition functions for
2595 different states.
2596 .IP
2597 .B \-CF
2598 specifies that the alternative fast scanner representation (described
2599 above under the
2600 .B \-F
2601 flag)
2602 should be used.
2603 This option cannot be used with
2604 .B \-+.
2605 .IP
2606 .B \-Cm, \-\-meta-ecs
2607 directs
2608 .I flex
2609 to construct
2610 .I meta-equivalence classes,
2611 which are sets of equivalence classes (or characters, if equivalence
2612 classes are not being used) that are commonly used together.
2613 Meta-equivalence
2614 classes are often a big win when using compressed tables, but they
2615 have a moderate performance impact (one or two "if" tests and one
2616 array look-up per character scanned).
2617 .IP
2618 .B \-Cr, \-\-read
2619 causes the generated scanner to
2620 .I bypass
2621 use of the standard I/O library (stdio) for input.
2622 Instead of calling
2623 .B fread()
2624 or
2625 .B getc(),
2626 the scanner will use the
2627 .B read()
2628 system call, resulting in a performance gain which varies from system
2629 to system, but in general is probably negligible unless you are also using
2630 .B \-Cf
2631 or
2632 .B \-CF.
2633 Using
2634 .B \-Cr
2635 can cause strange behavior if, for example, you read from
2636 .I yyin
2637 using stdio prior to calling the scanner (because the scanner will miss
2638 whatever text your previous reads left in the stdio input buffer).
2639 .IP
2640 .B \-Cr
2641 has no effect if you define
2642 .B YY_INPUT
2643 (see The Generated Scanner above).
2644 .IP
2645 A lone
2646 .B \-C
2647 specifies that the scanner tables should be compressed but neither
2648 equivalence classes nor meta-equivalence classes should be used.
2649 .IP
2650 The options
2651 .B \-Cf
2652 or
2653 .B \-CF
2654 and
2655 .B \-Cm
2656 do not make sense together - there is no opportunity for meta-equivalence
2657 classes if the table is not being compressed.
2658 Otherwise the options
2659 may be freely mixed, and are cumulative.
2660 .IP
2661 The default setting is
2662 .B \-Cem,
2663 which specifies that
2664 .I flex
2665 should generate equivalence classes
2666 and meta-equivalence classes.
2667 This setting provides the highest degree of table compression.
2668 You can trade off
2669 faster-executing scanners at the cost of larger tables with
2670 the following generally being true:
2671 .nf
2672
2673     slowest & smallest
2674           -Cem
2675           -Cm
2676           -Ce
2677           -C
2678           -C{f,F}e
2679           -C{f,F}
2680           -C{f,F}a
2681     fastest & largest
2682
2683 .fi
2684 Note that scanners with the smallest tables are usually generated and
2685 compiled the quickest, so
2686 during development you will usually want to use the default, maximal
2687 compression.
2688 .IP
2689 .B \-Cfe
2690 is often a good compromise between speed and size for production
2691 scanners.
2692 .TP
2693 .B \-ooutput, \-\-outputfile=FILE
2694 directs flex to write the scanner to the file
2695 .B output
2696 instead of
2697 .B lex.yy.c.
2698 If you combine
2699 .B \-o
2700 with the
2701 .B \-t
2702 option, then the scanner is written to
2703 .I stdout
2704 but its
2705 .B #line
2706 directives (see the
2707 .B \\-L
2708 option above) refer to the file
2709 .B output.
2710 .TP
2711 .B \-Pprefix, \-\-prefix=STRING
2712 changes the default
2713 .I "yy"
2714 prefix used by
2715 .I flex
2716 for all globally-visible variable and function names to instead be
2717 .I prefix.
2718 For example,
2719 .B \-Pfoo
2720 changes the name of
2721 .B yytext
2722 to
2723 .B footext.
2724 It also changes the name of the default output file from
2725 .B lex.yy.c
2726 to
2727 .B lex.foo.c.
2728 Here are all of the names affected:
2729 .nf
2730
2731     yy_create_buffer
2732     yy_delete_buffer
2733     yy_flex_debug
2734     yy_init_buffer
2735     yy_flush_buffer
2736     yy_load_buffer_state
2737     yy_switch_to_buffer
2738     yyin
2739     yyleng
2740     yylex
2741     yylineno
2742     yyout
2743     yyrestart
2744     yytext
2745     yywrap
2746
2747 .fi
2748 (If you are using a C++ scanner, then only
2749 .B yywrap
2750 and
2751 .B yyFlexLexer
2752 are affected.)
2753 Within your scanner itself, you can still refer to the global variables
2754 and functions using either version of their name; but externally, they
2755 have the modified name.
2756 .IP
2757 This option lets you easily link together multiple
2758 .I flex
2759 programs into the same executable.
2760 Note, though, that using this option also renames
2761 .B yywrap(),
2762 so you now
2763 .I must
2764 either
2765 provide your own (appropriately-named) version of the routine for your
2766 scanner, or use
2767 .B %option noyywrap,
2768 as linking with
2769 .B \-ll
2770 no longer provides one for you by default.
2771 .TP
2772 .B \-Sskeleton_file, \-\-skel=FILE
2773 overrides the default skeleton file from which
2774 .I flex
2775 constructs its scanners.
2776 You'll never need this option unless you are doing
2777 .I flex
2778 maintenance or development.
2779 .TP
2780 .B \-X, \-\-posix\-compat
2781 maximal compatibility with POSIX lex.
2782 .TP
2783 .B \-\-yylineno
2784 track line count in yylineno.
2785 .TP
2786 .B \-\-yyclass=NAME
2787 name of C++ class.
2788 .TP
2789 .B \-\-header\-file=FILE
2790 create a C header file in addition to the scanner.
2791 .TP
2792 .B \-\-tables\-file[=FILE]
2793 write tables to FILE.
2794 .TP
2795 .B \\-Dmacro[=defn]
2796 #define macro defn (default defn is '1').
2797 .TP
2798 .B \-R,  \-\-reentrant
2799 generate a reentrant C scanner
2800 .TP
2801 .B \-\-bison\-bridge
2802 scanner for bison pure parser.
2803 .TP
2804 .B \-\-bison\-locations
2805 include yylloc support.
2806 .TP
2807 .B \-\-stdinit
2808 initialize yyin/yyout to stdin/stdout.
2809 .TP
2810 .B \-\-noansi\-definitions old\-style function definitions.
2811 .TP
2812 .B \-\-noansi\-prototypes
2813 empty parameter list in prototypes.
2814 .TP
2815 .B \-\-nounistd
2816 do not include <unistd.h>.
2817 .TP
2818 .B \-\-noFUNCTION
2819 do not generate a particular FUNCTION.
2820 .PP
2821 .I flex
2822 also provides a mechanism for controlling options within the
2823 scanner specification itself, rather than from the flex command-line.
2824 This is done by including
2825 .B %option
2826 directives in the first section of the scanner specification.
2827 You can specify multiple options with a single
2828 .B %option
2829 directive, and multiple directives in the first section of your flex input
2830 file.
2831 .PP
2832 Most options are given simply as names, optionally preceded by the
2833 word "no" (with no intervening whitespace) to negate their meaning.
2834 A number are equivalent to flex flags or their negation:
2835 .nf
2836
2837     7bit            -7 option
2838     8bit            -8 option
2839     align           -Ca option
2840     backup          -b option
2841     batch           -B option
2842     c++             -+ option
2843
2844     caseful or
2845     case-sensitive  opposite of -i (default)
2846
2847     case-insensitive or
2848     caseless        -i option
2849
2850     debug           -d option
2851     default         opposite of -s option
2852     ecs             -Ce option
2853     fast            -F option
2854     full            -f option
2855     interactive     -I option
2856     lex-compat      -l option
2857     meta-ecs        -Cm option
2858     perf-report     -p option
2859     read            -Cr option
2860     stdout          -t option
2861     verbose         -v option
2862     warn            opposite of -w option
2863                     (use "%option nowarn" for -w)
2864
2865     array           equivalent to "%array"
2866     pointer         equivalent to "%pointer" (default)
2867
2868 .fi
2869 Some
2870 .B %option's
2871 provide features otherwise not available:
2872 .TP
2873 .B always-interactive
2874 instructs flex to generate a scanner which always considers its input
2875 "interactive".
2876 Normally, on each new input file the scanner calls
2877 .B isatty()
2878 in an attempt to determine whether
2879 the scanner's input source is interactive and thus should be read a
2880 character at a time.
2881 When this option is used, however, then no
2882 such call is made.
2883 .TP
2884 .B main
2885 directs flex to provide a default
2886 .B main()
2887 program for the scanner, which simply calls
2888 .B yylex().
2889 This option implies
2890 .B noyywrap
2891 (see below).
2892 .TP
2893 .B never-interactive
2894 instructs flex to generate a scanner which never considers its input
2895 "interactive" (again, no call made to
2896 .B isatty()).
2897 This is the opposite of
2898 .B always-interactive.
2899 .TP
2900 .B stack
2901 enables the use of start condition stacks (see Start Conditions above).
2902 .TP
2903 .B stdinit
2904 if set (i.e.,
2905 .B %option stdinit)
2906 initializes
2907 .I yyin
2908 and
2909 .I yyout
2910 to
2911 .I stdin
2912 and
2913 .I stdout,
2914 instead of the default of
2915 .I nil.
2916 Some existing
2917 .I lex
2918 programs depend on this behavior, even though it is not compliant with
2919 ANSI C, which does not require
2920 .I stdin
2921 and
2922 .I stdout
2923 to be compile-time constant.
2924 .TP
2925 .B yylineno
2926 directs
2927 .I flex
2928 to generate a scanner that maintains the number of the current line
2929 read from its input in the global variable
2930 .B yylineno.
2931 This option is implied by
2932 .B %option lex-compat.
2933 .TP
2934 .B yywrap
2935 if unset (i.e.,
2936 .B %option noyywrap),
2937 makes the scanner not call
2938 .B yywrap()
2939 upon an end-of-file, but simply assume that there are no more
2940 files to scan (until the user points
2941 .I yyin
2942 at a new file and calls
2943 .B yylex()
2944 again).
2945 .PP
2946 .I flex
2947 scans your rule actions to determine whether you use the
2948 .B REJECT
2949 or
2950 .B yymore()
2951 features.
2952 The
2953 .B reject
2954 and
2955 .B yymore
2956 options are available to override its decision as to whether you use the
2957 options, either by setting them (e.g.,
2958 .B %option reject)
2959 to indicate the feature is indeed used, or
2960 unsetting them to indicate it actually is not used
2961 (e.g.,
2962 .B %option noyymore).
2963 .PP
2964 Three options take string-delimited values, offset with '=':
2965 .nf
2966
2967     %option outfile="ABC"
2968
2969 .fi
2970 is equivalent to
2971 .B -oABC,
2972 and
2973 .nf
2974
2975     %option prefix="XYZ"
2976
2977 .fi
2978 is equivalent to
2979 .B -PXYZ.
2980 Finally,
2981 .nf
2982
2983     %option yyclass="foo"
2984
2985 .fi
2986 only applies when generating a C++ scanner (
2987 .B \-+
2988 option).
2989 It informs
2990 .I flex
2991 that you have derived
2992 .B foo
2993 as a subclass of
2994 .B yyFlexLexer,
2995 so
2996 .I flex
2997 will place your actions in the member function
2998 .B foo::yylex()
2999 instead of
3000 .B yyFlexLexer::yylex().
3001 It also generates a
3002 .B yyFlexLexer::yylex()
3003 member function that emits a run-time error (by invoking
3004 .B yyFlexLexer::LexerError())
3005 if called.
3006 See Generating C++ Scanners, below, for additional information.
3007 .PP
3008 A number of options are available for lint purists who want to suppress
3009 the appearance of unneeded routines in the generated scanner.
3010 Each of the following, if unset
3011 (e.g.,
3012 .B %option nounput
3013 ), results in the corresponding routine not appearing in
3014 the generated scanner:
3015 .nf
3016
3017     input, unput
3018     yy_push_state, yy_pop_state, yy_top_state
3019     yy_scan_buffer, yy_scan_bytes, yy_scan_string
3020
3021 .fi
3022 (though
3023 .B yy_push_state()
3024 and friends won't appear anyway unless you use
3025 .B %option stack).
3026 .SH PERFORMANCE CONSIDERATIONS
3027 The main design goal of
3028 .I flex
3029 is that it generate high-performance scanners.
3030 It has been optimized
3031 for dealing well with large sets of rules.
3032 Aside from the effects on scanner speed of the table compression
3033 .B \-C
3034 options outlined above,
3035 there are a number of options/actions which degrade performance.
3036 These are, from most expensive to least:
3037 .nf
3038
3039     REJECT
3040     %option yylineno
3041     arbitrary trailing context
3042
3043     pattern sets that require backing up
3044     %array
3045     %option interactive
3046     %option always-interactive
3047
3048     '^' beginning-of-line operator
3049     yymore()
3050
3051 .fi
3052 with the first three all being quite expensive and the last two
3053 being quite cheap.
3054 Note also that
3055 .B unput()
3056 is implemented as a routine call that potentially does quite a bit of
3057 work, while
3058 .B yyless()
3059 is a quite-cheap macro; so if just putting back some excess text you
3060 scanned, use
3061 .B yyless().
3062 .PP
3063 .B REJECT
3064 should be avoided at all costs when performance is important.
3065 It is a particularly expensive option.
3066 .PP
3067 Getting rid of backing up is messy and often may be an enormous
3068 amount of work for a complicated scanner.
3069 In principal, one begins by using the
3070 .B \-b
3071 flag to generate a
3072 .I lex.backup
3073 file.
3074 For example, on the input
3075 .nf
3076
3077     %%
3078     foo        return TOK_KEYWORD;
3079     foobar     return TOK_KEYWORD;
3080
3081 .fi
3082 the file looks like:
3083 .nf
3084
3085     State #6 is non-accepting -
3086      associated rule line numbers:
3087            2       3
3088      out-transitions: [ o ]
3089      jam-transitions: EOF [ \\001-n  p-\\177 ]
3090
3091     State #8 is non-accepting -
3092      associated rule line numbers:
3093            3
3094      out-transitions: [ a ]
3095      jam-transitions: EOF [ \\001-`  b-\\177 ]
3096
3097     State #9 is non-accepting -
3098      associated rule line numbers:
3099            3
3100      out-transitions: [ r ]
3101      jam-transitions: EOF [ \\001-q  s-\\177 ]
3102
3103     Compressed tables always back up.
3104
3105 .fi
3106 The first few lines tell us that there's a scanner state in
3107 which it can make a transition on an 'o' but not on any other
3108 character, and that in that state the currently scanned text does not match
3109 any rule.
3110 The state occurs when trying to match the rules found
3111 at lines 2 and 3 in the input file.
3112 If the scanner is in that state and then reads
3113 something other than an 'o', it will have to back up to find
3114 a rule which is matched.
3115 With a bit of headscratching one can see that this must be the
3116 state it's in when it has seen "fo".
3117 When this has happened,
3118 if anything other than another 'o' is seen, the scanner will
3119 have to back up to simply match the 'f' (by the default rule).
3120 .PP
3121 The comment regarding State #8 indicates there's a problem
3122 when "foob" has been scanned.
3123 Indeed, on any character other
3124 than an 'a', the scanner will have to back up to accept "foo".
3125 Similarly, the comment for State #9 concerns when "fooba" has
3126 been scanned and an 'r' does not follow.
3127 .PP
3128 The final comment reminds us that there's no point going to
3129 all the trouble of removing backing up from the rules unless
3130 we're using
3131 .B \-Cf
3132 or
3133 .B \-CF,
3134 since there's no performance gain doing so with compressed scanners.
3135 .PP
3136 The way to remove the backing up is to add "error" rules:
3137 .nf
3138
3139     %%
3140     foo         return TOK_KEYWORD;
3141     foobar      return TOK_KEYWORD;
3142
3143     fooba       |
3144     foob        |
3145     fo          {
3146                 /* false alarm, not really a keyword */
3147                 return TOK_ID;
3148                 }
3149
3150 .fi
3151 .PP
3152 Eliminating backing up among a list of keywords can also be
3153 done using a "catch-all" rule:
3154 .nf
3155
3156     %%
3157     foo         return TOK_KEYWORD;
3158     foobar      return TOK_KEYWORD;
3159
3160     [a-z]+      return TOK_ID;
3161
3162 .fi
3163 This is usually the best solution when appropriate.
3164 .PP
3165 Backing up messages tend to cascade.
3166 With a complicated set of rules it's not uncommon to get hundreds
3167 of messages.
3168 If one can decipher them, though, it often
3169 only takes a dozen or so rules to eliminate the backing up (though
3170 it's easy to make a mistake and have an error rule accidentally match
3171 a valid token.
3172 A possible future
3173 .I flex
3174 feature will be to automatically add rules to eliminate backing up).
3175 .PP
3176 It's important to keep in mind that you gain the benefits of eliminating
3177 backing up only if you eliminate
3178 .I every
3179 instance of backing up.
3180 Leaving just one means you gain nothing.
3181 .PP
3182 .I Variable
3183 trailing context (where both the leading and trailing parts do not have
3184 a fixed length) entails almost the same performance loss as
3185 .B REJECT
3186 (i.e., substantial).
3187 So when possible a rule like:
3188 .nf
3189
3190     %%
3191     mouse|rat/(cat|dog)   run();
3192
3193 .fi
3194 is better written:
3195 .nf
3196
3197     %%
3198     mouse/cat|dog         run();
3199     rat/cat|dog           run();
3200
3201 .fi
3202 or as
3203 .nf
3204
3205     %%
3206     mouse|rat/cat         run();
3207     mouse|rat/dog         run();
3208
3209 .fi
3210 Note that here the special '|' action does
3211 .I not
3212 provide any savings, and can even make things worse (see
3213 Deficiencies / Bugs below).
3214 .LP
3215 Another area where the user can increase a scanner's performance
3216 (and one that's easier to implement) arises from the fact that
3217 the longer the tokens matched, the faster the scanner will run.
3218 This is because with long tokens the processing of most input
3219 characters takes place in the (short) inner scanning loop, and
3220 does not often have to go through the additional work of setting up
3221 the scanning environment (e.g.,
3222 .B yytext)
3223 for the action.
3224 Recall the scanner for C comments:
3225 .nf
3226
3227     %x comment
3228     %%
3229             int line_num = 1;
3230
3231     "/*"         BEGIN(comment);
3232
3233     <comment>[^*\\n]*
3234     <comment>"*"+[^*/\\n]*
3235     <comment>\\n             ++line_num;
3236     <comment>"*"+"/"        BEGIN(INITIAL);
3237
3238 .fi
3239 This could be sped up by writing it as:
3240 .nf
3241
3242     %x comment
3243     %%
3244             int line_num = 1;
3245
3246     "/*"         BEGIN(comment);
3247
3248     <comment>[^*\\n]*
3249     <comment>[^*\\n]*\\n      ++line_num;
3250     <comment>"*"+[^*/\\n]*
3251     <comment>"*"+[^*/\\n]*\\n ++line_num;
3252     <comment>"*"+"/"        BEGIN(INITIAL);
3253
3254 .fi
3255 Now instead of each newline requiring the processing of another
3256 action, recognizing the newlines is "distributed" over the other rules
3257 to keep the matched text as long as possible.
3258 Note that
3259 .I adding
3260 rules does
3261 .I not
3262 slow down the scanner!  The speed of the scanner is independent
3263 of the number of rules or (modulo the considerations given at the
3264 beginning of this section) how complicated the rules are with
3265 regard to operators such as '*' and '|'.
3266 .PP
3267 A final example in speeding up a scanner: suppose you want to scan
3268 through a file containing identifiers and keywords, one per line
3269 and with no other extraneous characters, and recognize all the
3270 keywords.
3271 A natural first approach is:
3272 .nf
3273
3274     %%
3275     asm      |
3276     auto     |
3277     break    |
3278     ... etc ...
3279     volatile |
3280     while    /* it's a keyword */
3281
3282     .|\\n     /* it's not a keyword */
3283
3284 .fi
3285 To eliminate the back-tracking, introduce a catch-all rule:
3286 .nf
3287
3288     %%
3289     asm      |
3290     auto     |
3291     break    |
3292     ... etc ...
3293     volatile |
3294     while    /* it's a keyword */
3295
3296     [a-z]+   |
3297     .|\\n     /* it's not a keyword */
3298
3299 .fi
3300 Now, if it's guaranteed that there's exactly one word per line,
3301 then we can reduce the total number of matches by a half by
3302 merging in the recognition of newlines with that of the other
3303 tokens:
3304 .nf
3305
3306     %%
3307     asm\\n    |
3308     auto\\n   |
3309     break\\n  |
3310     ... etc ...
3311     volatile\\n |
3312     while\\n  /* it's a keyword */
3313
3314     [a-z]+\\n |
3315     .|\\n     /* it's not a keyword */
3316
3317 .fi
3318 One has to be careful here, as we have now reintroduced backing up
3319 into the scanner.
3320 In particular, while
3321 .I we
3322 know that there will never be any characters in the input stream
3323 other than letters or newlines,
3324 .I flex
3325 can't figure this out, and it will plan for possibly needing to back up
3326 when it has scanned a token like "auto" and then the next character
3327 is something other than a newline or a letter.
3328 Previously it would
3329 then just match the "auto" rule and be done, but now it has no "auto"
3330 rule, only an "auto\\n" rule.
3331 To eliminate the possibility of backing up,
3332 we could either duplicate all rules but without final newlines, or,
3333 since we never expect to encounter such an input and therefore don't
3334 how it's classified, we can introduce one more catch-all rule, this
3335 one which doesn't include a newline:
3336 .nf
3337
3338     %%
3339     asm\\n    |
3340     auto\\n   |
3341     break\\n  |
3342     ... etc ...
3343     volatile\\n |
3344     while\\n  /* it's a keyword */
3345
3346     [a-z]+\\n |
3347     [a-z]+   |
3348     .|\\n     /* it's not a keyword */
3349
3350 .fi
3351 Compiled with
3352 .B \-Cf,
3353 this is about as fast as one can get a
3354 .I flex
3355 scanner to go for this particular problem.
3356 .PP
3357 A final note:
3358 .I flex
3359 is slow when matching NUL's, particularly when a token contains
3360 multiple NUL's.
3361 It's best to write rules which match
3362 .I short
3363 amounts of text if it's anticipated that the text will often include NUL's.
3364 .PP
3365 Another final note regarding performance: as mentioned above in the section
3366 How the Input is Matched, dynamically resizing
3367 .B yytext
3368 to accommodate huge tokens is a slow process because it presently requires that
3369 the (huge) token be rescanned from the beginning.
3370 Thus if performance is
3371 vital, you should attempt to match "large" quantities of text but not
3372 "huge" quantities, where the cutoff between the two is at about 8K
3373 characters/token.
3374 .SH GENERATING C++ SCANNERS
3375 .I flex
3376 provides two different ways to generate scanners for use with C++.
3377 The first way is to simply compile a scanner generated by
3378 .I flex
3379 using a C++ compiler instead of a C compiler.
3380 You should not encounter
3381 any compilations errors (please report any you find to the email address
3382 given in the Author section below).
3383 You can then use C++ code in your rule actions instead of C code.
3384 Note that the default input source for your scanner remains
3385 .I yyin,
3386 and default echoing is still done to
3387 .I yyout.
3388 Both of these remain
3389 .I FILE *
3390 variables and not C++
3391 .I streams.
3392 .PP
3393 You can also use
3394 .I flex
3395 to generate a C++ scanner class, using the
3396 .B \-+
3397 option (or, equivalently,
3398 .B %option c++),
3399 which is automatically specified if the name of the flex
3400 executable ends in a '+', such as
3401 .I flex++.
3402 When using this option, flex defaults to generating the scanner to the file
3403 .B lex.yy.cc
3404 instead of
3405 .B lex.yy.c.
3406 The generated scanner includes the header file
3407 .I FlexLexer.h,
3408 which defines the interface to two C++ classes.
3409 .PP
3410 The first class,
3411 .B FlexLexer,
3412 provides an abstract base class defining the general scanner class
3413 interface.
3414 It provides the following member functions:
3415 .TP
3416 .B const char* YYText()
3417 returns the text of the most recently matched token, the equivalent of
3418 .B yytext.
3419 .TP
3420 .B int YYLeng()
3421 returns the length of the most recently matched token, the equivalent of
3422 .B yyleng.
3423 .TP
3424 .B int lineno() const
3425 returns the current input line number
3426 (see
3427 .B %option yylineno),
3428 or
3429 .B 1
3430 if
3431 .B %option yylineno
3432 was not used.
3433 .TP
3434 .B void set_debug( int flag )
3435 sets the debugging flag for the scanner, equivalent to assigning to
3436 .B yy_flex_debug
3437 (see the Options section above).
3438 Note that you must build the scanner using
3439 .B %option debug
3440 to include debugging information in it.
3441 .TP
3442 .B int debug() const
3443 returns the current setting of the debugging flag.
3444 .PP
3445 Also provided are member functions equivalent to
3446 .B yy_switch_to_buffer(),
3447 .B yy_create_buffer()
3448 (though the first argument is an
3449 .B std::istream*
3450 object pointer and not a
3451 .B FILE*),
3452 .B yy_flush_buffer(),
3453 .B yy_delete_buffer(),
3454 and
3455 .B yyrestart()
3456 (again, the first argument is a
3457 .B std::istream*
3458 object pointer).
3459 .PP
3460 The second class defined in
3461 .I FlexLexer.h
3462 is
3463 .B yyFlexLexer,
3464 which is derived from
3465 .B FlexLexer.
3466 It defines the following additional member functions:
3467 .TP
3468 .B
3469 yyFlexLexer( std::istream* arg_yyin = 0, std::ostream* arg_yyout = 0 )
3470 constructs a
3471 .B yyFlexLexer
3472 object using the given streams for input and output.
3473 If not specified, the streams default to
3474 .B cin
3475 and
3476 .B cout,
3477 respectively.
3478 .TP
3479 .B virtual int yylex()
3480 performs the same role is
3481 .B yylex()
3482 does for ordinary flex scanners: it scans the input stream, consuming
3483 tokens, until a rule's action returns a value.
3484 If you derive a subclass
3485 .B S
3486 from
3487 .B yyFlexLexer
3488 and want to access the member functions and variables of
3489 .B S
3490 inside
3491 .B yylex(),
3492 then you need to use
3493 .B %option yyclass="S"
3494 to inform
3495 .I flex
3496 that you will be using that subclass instead of
3497 .B yyFlexLexer.
3498 In this case, rather than generating
3499 .B yyFlexLexer::yylex(),
3500 .I flex
3501 generates
3502 .B S::yylex()
3503 (and also generates a dummy
3504 .B yyFlexLexer::yylex()
3505 that calls
3506 .B yyFlexLexer::LexerError()
3507 if called).
3508 .TP
3509 .B
3510 virtual void switch_streams(std::istream* new_in = 0,
3511 .B
3512 std::ostream* new_out = 0)
3513 reassigns
3514 .B yyin
3515 to
3516 .B new_in
3517 (if non-nil)
3518 and
3519 .B yyout
3520 to
3521 .B new_out
3522 (ditto), deleting the previous input buffer if
3523 .B yyin
3524 is reassigned.
3525 .TP
3526 .B
3527 int yylex( std::istream* new_in, std::ostream* new_out = 0 )
3528 first switches the input streams via
3529 .B switch_streams( new_in, new_out )
3530 and then returns the value of
3531 .B yylex().
3532 .PP
3533 In addition,
3534 .B yyFlexLexer
3535 defines the following protected virtual functions which you can redefine
3536 in derived classes to tailor the scanner:
3537 .TP
3538 .B
3539 virtual int LexerInput( char* buf, int max_size )
3540 reads up to
3541 .B max_size
3542 characters into
3543 .B buf
3544 and returns the number of characters read.
3545 To indicate end-of-input, return 0 characters.
3546 Note that "interactive" scanners (see the
3547 .B \-B
3548 and
3549 .B \-I
3550 flags) define the macro
3551 .B YY_INTERACTIVE.
3552 If you redefine
3553 .B LexerInput()
3554 and need to take different actions depending on whether or not
3555 the scanner might be scanning an interactive input source, you can
3556 test for the presence of this name via
3557 .B #ifdef.
3558 .TP
3559 .B
3560 virtual void LexerOutput( const char* buf, int size )
3561 writes out
3562 .B size
3563 characters from the buffer
3564 .B buf,
3565 which, while NUL-terminated, may also contain "internal" NUL's if
3566 the scanner's rules can match text with NUL's in them.
3567 .TP
3568 .B
3569 virtual void LexerError( const char* msg )
3570 reports a fatal error message.
3571 The default version of this function writes the message to the stream
3572 .B cerr
3573 and exits.
3574 .PP
3575 Note that a
3576 .B yyFlexLexer
3577 object contains its
3578 .I entire
3579 scanning state.
3580 Thus you can use such objects to create reentrant scanners.
3581 You can instantiate multiple instances of the same
3582 .B yyFlexLexer
3583 class, and you can also combine multiple C++ scanner classes together
3584 in the same program using the
3585 .B \-P
3586 option discussed above.
3587 .PP
3588 Finally, note that the
3589 .B %array
3590 feature is not available to C++ scanner classes; you must use
3591 .B %pointer
3592 (the default).
3593 .PP
3594 Here is an example of a simple C++ scanner:
3595 .nf
3596
3597         // An example of using the flex C++ scanner class.
3598
3599     %{
3600     int mylineno = 0;
3601     %}
3602
3603     string  \\"[^\\n"]+\\"
3604
3605     ws      [ \\t]+
3606
3607     alpha   [A-Za-z]
3608     dig     [0-9]
3609     name    ({alpha}|{dig}|\\$)({alpha}|{dig}|[_.\\-/$])*
3610     num1    [-+]?{dig}+\\.?([eE][-+]?{dig}+)?
3611     num2    [-+]?{dig}*\\.{dig}+([eE][-+]?{dig}+)?
3612     number  {num1}|{num2}
3613
3614     %%
3615
3616     {ws}    /* skip blanks and tabs */
3617
3618     "/*"    {
3619             int c;
3620
3621             while((c = yyinput()) != 0)
3622                 {
3623                 if(c == '\\n')
3624                     ++mylineno;
3625
3626                 else if(c == '*')
3627                     {
3628                     if((c = yyinput()) == '/')
3629                         break;
3630                     else
3631                         unput(c);
3632                     }
3633                 }
3634             }
3635
3636     {number}  cout << "number " << YYText() << '\\n';
3637
3638     \\n        mylineno++;
3639
3640     {name}    cout << "name " << YYText() << '\\n';
3641
3642     {string}  cout << "string " << YYText() << '\\n';
3643
3644     %%
3645
3646     int main( int /* argc */, char** /* argv */ )
3647         {
3648         FlexLexer* lexer = new yyFlexLexer;
3649         while(lexer->yylex() != 0)
3650             ;
3651         return 0;
3652         }
3653 .fi
3654 If you want to create multiple (different) lexer classes, you use the
3655 .B \-P
3656 flag (or the
3657 .B prefix=
3658 option) to rename each
3659 .B yyFlexLexer
3660 to some other
3661 .B xxFlexLexer.
3662 You then can include
3663 .B <FlexLexer.h>
3664 in your other sources once per lexer class, first renaming
3665 .B yyFlexLexer
3666 as follows:
3667 .nf
3668
3669     #undef yyFlexLexer
3670     #define yyFlexLexer xxFlexLexer
3671     #include <FlexLexer.h>
3672
3673     #undef yyFlexLexer
3674     #define yyFlexLexer zzFlexLexer
3675     #include <FlexLexer.h>
3676
3677 .fi
3678 if, for example, you used
3679 .B %option prefix="xx"
3680 for one of your scanners and
3681 .B %option prefix="zz"
3682 for the other.
3683 .PP
3684 IMPORTANT: the present form of the scanning class is
3685 .I experimental
3686 and may change considerably between major releases.
3687 .SH INCOMPATIBILITIES WITH LEX AND POSIX
3688 .I flex
3689 is a rewrite of the AT&T Unix
3690 .I lex
3691 tool (the two implementations do not share any code, though),
3692 with some extensions and incompatibilities, both of which
3693 are of concern to those who wish to write scanners acceptable
3694 to either implementation.
3695 Flex is fully compliant with the POSIX
3696 .I lex
3697 specification, except that when using
3698 .B %pointer
3699 (the default), a call to
3700 .B unput()
3701 destroys the contents of
3702 .B yytext,
3703 which is counter to the POSIX specification.
3704 .PP
3705 In this section we discuss all of the known areas of incompatibility
3706 between flex, AT&T lex, and the POSIX specification.
3707 .PP
3708 .I flex's
3709 .B \-l
3710 option turns on maximum compatibility with the original AT&T
3711 .I lex
3712 implementation, at the cost of a major loss in the generated scanner's
3713 performance.
3714 We note below which incompatibilities can be overcome
3715 using the
3716 .B \-l
3717 option.
3718 .PP
3719 .I flex
3720 is fully compatible with
3721 .I lex
3722 with the following exceptions:
3723 .IP -
3724 The undocumented
3725 .I lex
3726 scanner internal variable
3727 .B yylineno
3728 is not supported unless
3729 .B \-l
3730 or
3731 .B %option yylineno
3732 is used.
3733 .IP
3734 .B yylineno
3735 should be maintained on a per-buffer basis, rather than a per-scanner
3736 (single global variable) basis.
3737 .IP
3738 .B yylineno
3739 is not part of the POSIX specification.
3740 .IP -
3741 The
3742 .B input()
3743 routine is not redefinable, though it may be called to read characters
3744 following whatever has been matched by a rule.
3745 If
3746 .B input()
3747 encounters an end-of-file the normal
3748 .B yywrap()
3749 processing is done.
3750 A ``real'' end-of-file is returned by
3751 .B input()
3752 as
3753 .I EOF.
3754 .IP
3755 Input is instead controlled by defining the
3756 .B YY_INPUT
3757 macro.
3758 .IP
3759 The
3760 .I flex
3761 restriction that
3762 .B input()
3763 cannot be redefined is in accordance with the POSIX specification,
3764 which simply does not specify any way of controlling the
3765 scanner's input other than by making an initial assignment to
3766 .I yyin.
3767 .IP -
3768 The
3769 .B unput()
3770 routine is not redefinable.
3771 This restriction is in accordance with POSIX.
3772 .IP -
3773 .I flex
3774 scanners are not as reentrant as
3775 .I lex
3776 scanners.
3777 In particular, if you have an interactive scanner and
3778 an interrupt handler which long-jumps out of the scanner, and
3779 the scanner is subsequently called again, you may get the following
3780 message:
3781 .nf
3782
3783     fatal flex scanner internal error--end of buffer missed
3784
3785 .fi
3786 To reenter the scanner, first use
3787 .nf
3788
3789     yyrestart( yyin );
3790
3791 .fi
3792 Note that this call will throw away any buffered input; usually this
3793 isn't a problem with an interactive scanner.
3794 .IP
3795 Also note that flex C++ scanner classes
3796 .I are
3797 reentrant, so if using C++ is an option for you, you should use
3798 them instead.
3799 See "Generating C++ Scanners" above for details.
3800 .IP -
3801 .B output()
3802 is not supported.
3803 Output from the
3804 .B ECHO
3805 macro is done to the file-pointer
3806 .I yyout
3807 (default
3808 .I stdout).
3809 .IP
3810 .B output()
3811 is not part of the POSIX specification.
3812 .IP -
3813 .I lex
3814 does not support exclusive start conditions (%x), though they
3815 are in the POSIX specification.
3816 .IP -
3817 When definitions are expanded,
3818 .I flex
3819 encloses them in parentheses.
3820 With lex, the following:
3821 .nf
3822
3823     NAME    [A-Z][A-Z0-9]*
3824     %%
3825     foo{NAME}?      printf( "Found it\\n" );
3826     %%
3827
3828 .fi
3829 will not match the string "foo" because when the macro
3830 is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
3831 and the precedence is such that the '?' is associated with
3832 "[A-Z0-9]*".
3833 With
3834 .I flex,
3835 the rule will be expanded to
3836 "foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
3837 .IP
3838 Note that if the definition begins with
3839 .B ^
3840 or ends with
3841 .B $
3842 then it is
3843 .I not
3844 expanded with parentheses, to allow these operators to appear in
3845 definitions without losing their special meanings.
3846 But the
3847 .B <s>, /,
3848 and
3849 .B <<EOF>>
3850 operators cannot be used in a
3851 .I flex
3852 definition.
3853 .IP
3854 Using
3855 .B \-l
3856 results in the
3857 .I lex
3858 behavior of no parentheses around the definition.
3859 .IP
3860 The POSIX specification is that the definition be enclosed in parentheses.
3861 .IP -
3862 Some implementations of
3863 .I lex
3864 allow a rule's action to begin on a separate line, if the rule's pattern
3865 has trailing whitespace:
3866 .nf
3867
3868     %%
3869     foo|bar<space here>
3870       { foobar_action(); }
3871
3872 .fi
3873 .I flex
3874 does not support this feature.
3875 .IP -
3876 The
3877 .I lex
3878 .B %r
3879 (generate a Ratfor scanner) option is not supported.
3880 It is not part
3881 of the POSIX specification.
3882 .IP -
3883 After a call to
3884 .B unput(),
3885 .I yytext
3886 is undefined until the next token is matched, unless the scanner
3887 was built using
3888 .B %array.
3889 This is not the case with
3890 .I lex
3891 or the POSIX specification.
3892 The
3893 .B \-l
3894 option does away with this incompatibility.
3895 .IP -
3896 The precedence of the
3897 .B {}
3898 (numeric range) operator is different.
3899 .I lex
3900 interprets "abc{1,3}" as "match one, two, or
3901 three occurrences of 'abc'", whereas
3902 .I flex
3903 interprets it as "match 'ab'
3904 followed by one, two, or three occurrences of 'c'".
3905 The latter is in agreement with the POSIX specification.
3906 .IP -
3907 The precedence of the
3908 .B ^
3909 operator is different.
3910 .I lex
3911 interprets "^foo|bar" as "match either 'foo' at the beginning of a line,
3912 or 'bar' anywhere", whereas
3913 .I flex
3914 interprets it as "match either 'foo' or 'bar' if they come at the beginning
3915 of a line".
3916 The latter is in agreement with the POSIX specification.
3917 .IP -
3918 The special table-size declarations such as
3919 .B %a
3920 supported by
3921 .I lex
3922 are not required by
3923 .I flex
3924 scanners;
3925 .I flex
3926 ignores them.
3927 .IP -
3928 The name
3929 .B FLEX_SCANNER
3930 is #define'd so scanners may be written for use with either
3931 .I flex
3932 or
3933 .I lex.
3934 Scanners also include
3935 .B YY_FLEX_MAJOR_VERSION
3936 and
3937 .B YY_FLEX_MINOR_VERSION
3938 indicating which version of
3939 .I flex
3940 generated the scanner
3941 (for example, for the 2.5 release, these defines would be 2 and 5
3942 respectively).
3943 .PP
3944 The following
3945 .I flex
3946 features are not included in
3947 .I lex
3948 or the POSIX specification:
3949 .nf
3950
3951     C++ scanners
3952     %option
3953     start condition scopes
3954     start condition stacks
3955     interactive/non-interactive scanners
3956     yy_scan_string() and friends
3957     yyterminate()
3958     yy_set_interactive()
3959     yy_set_bol()
3960     YY_AT_BOL()
3961     <<EOF>>
3962     <*>
3963     YY_DECL
3964     YY_START
3965     YY_USER_ACTION
3966     YY_USER_INIT
3967     #line directives
3968     %{}'s around actions
3969     multiple actions on a line
3970
3971 .fi
3972 plus almost all of the flex flags.
3973 The last feature in the list refers to the fact that with
3974 .I flex
3975 you can put multiple actions on the same line, separated with
3976 semi-colons, while with
3977 .I lex,
3978 the following
3979 .nf
3980
3981     foo    handle_foo(); ++num_foos_seen;
3982
3983 .fi
3984 is (rather surprisingly) truncated to
3985 .nf
3986
3987     foo    handle_foo();
3988
3989 .fi
3990 .I flex
3991 does not truncate the action.
3992 Actions that are not enclosed in
3993 braces are simply terminated at the end of the line.
3994 .SH DIAGNOSTICS
3995 .I warning, rule cannot be matched
3996 indicates that the given rule
3997 cannot be matched because it follows other rules that will
3998 always match the same text as it.
3999 For example, in the following "foo" cannot be matched because it comes after
4000 an identifier "catch-all" rule:
4001 .nf
4002
4003     [a-z]+    got_identifier();
4004     foo       got_foo();
4005
4006 .fi
4007 Using
4008 .B REJECT
4009 in a scanner suppresses this warning.
4010 .PP
4011 .I warning,
4012 .B \-s
4013 .I
4014 option given but default rule can be matched
4015 means that it is possible (perhaps only in a particular start condition)
4016 that the default rule (match any single character) is the only one
4017 that will match a particular input.
4018 Since
4019 .B \-s
4020 was given, presumably this is not intended.
4021 .PP
4022 .I reject_used_but_not_detected undefined
4023 or
4024 .I yymore_used_but_not_detected undefined -
4025 These errors can occur at compile time.
4026 They indicate that the scanner uses
4027 .B REJECT
4028 or
4029 .B yymore()
4030 but that
4031 .I flex
4032 failed to notice the fact, meaning that
4033 .I flex
4034 scanned the first two sections looking for occurrences of these actions
4035 and failed to find any, but somehow you snuck some in (via a #include
4036 file, for example).
4037 Use
4038 .B %option reject
4039 or
4040 .B %option yymore
4041 to indicate to flex that you really do use these features.
4042 .PP
4043 .I flex scanner jammed -
4044 a scanner compiled with
4045 .B \-s
4046 has encountered an input string which wasn't matched by
4047 any of its rules.
4048 This error can also occur due to internal problems.
4049 .PP
4050 .I token too large, exceeds YYLMAX -
4051 your scanner uses
4052 .B %array
4053 and one of its rules matched a string longer than the
4054 .B YYLMAX
4055 constant (8K bytes by default).
4056 You can increase the value by
4057 #define'ing
4058 .B YYLMAX
4059 in the definitions section of your
4060 .I flex
4061 input.
4062 .PP
4063 .I scanner requires \-8 flag to
4064 .I use the character 'x' -
4065 Your scanner specification includes recognizing the 8-bit character
4066 .I 'x'
4067 and you did not specify the \-8 flag, and your scanner defaulted to 7-bit
4068 because you used the
4069 .B \-Cf
4070 or
4071 .B \-CF
4072 table compression options.
4073 See the discussion of the
4074 .B \-7
4075 flag for details.
4076 .PP
4077 .I flex scanner push-back overflow -
4078 you used
4079 .B unput()
4080 to push back so much text that the scanner's buffer could not hold
4081 both the pushed-back text and the current token in
4082 .B yytext.
4083 Ideally the scanner should dynamically resize the buffer in this case, but at
4084 present it does not.
4085 .PP
4086 .I
4087 input buffer overflow, can't enlarge buffer because scanner uses REJECT -
4088 the scanner was working on matching an extremely large token and needed
4089 to expand the input buffer.
4090 This doesn't work with scanners that use
4091 .B
4092 REJECT.
4093 .PP
4094 .I
4095 fatal flex scanner internal error--end of buffer missed -
4096 This can occur in a scanner which is reentered after a long-jump
4097 has jumped out (or over) the scanner's activation frame.
4098 Before reentering the scanner, use:
4099 .nf
4100
4101     yyrestart( yyin );
4102
4103 .fi
4104 or, as noted above, switch to using the C++ scanner class.
4105 .PP
4106 .I too many start conditions in <> construct! -
4107 you listed more start conditions in a <> construct than exist (so
4108 you must have listed at least one of them twice).
4109 .SH FILES
4110 .TP
4111 .B \-ll
4112 library with which scanners must be linked.
4113 .TP
4114 .I lex.yy.c
4115 generated scanner (called
4116 .I lexyy.c
4117 on some systems).
4118 .TP
4119 .I lex.yy.cc
4120 generated C++ scanner class, when using
4121 .B -+.
4122 .TP
4123 .I <FlexLexer.h>
4124 header file defining the C++ scanner base class,
4125 .B FlexLexer,
4126 and its derived class,
4127 .B yyFlexLexer.
4128 .TP
4129 .I flex.skl
4130 skeleton scanner.
4131 This file is only used when building flex, not when flex executes.
4132 .TP
4133 .I lex.backup
4134 backing-up information for
4135 .B \-b
4136 flag (called
4137 .I lex.bck
4138 on some systems).
4139 .SH DEFICIENCIES / BUGS
4140 Some trailing context
4141 patterns cannot be properly matched and generate
4142 warning messages ("dangerous trailing context").
4143 These are patterns where the ending of the
4144 first part of the rule matches the beginning of the second
4145 part, such as "zx*/xy*", where the 'x*' matches the 'x' at
4146 the beginning of the trailing context.
4147 (Note that the POSIX draft
4148 states that the text matched by such patterns is undefined.)
4149 .PP
4150 For some trailing context rules, parts which are actually fixed-length are
4151 not recognized as such, leading to the above mentioned performance loss.
4152 In particular, parts using '|' or {n} (such as "foo{3}") are always
4153 considered variable-length.
4154 .PP
4155 Combining trailing context with the special '|' action can result in
4156 .I fixed
4157 trailing context being turned into the more expensive
4158 .I variable
4159 trailing context.
4160 For example, in the following:
4161 .nf
4162
4163     %%
4164     abc      |
4165     xyz/def
4166
4167 .fi
4168 .PP
4169 Use of
4170 .B unput()
4171 invalidates yytext and yyleng, unless the
4172 .B %array
4173 directive
4174 or the
4175 .B \-l
4176 option has been used.
4177 .PP
4178 Pattern-matching of NUL's is substantially slower than matching other
4179 characters.
4180 .PP
4181 Dynamic resizing of the input buffer is slow, as it entails rescanning
4182 all the text matched so far by the current (generally huge) token.
4183 .PP
4184 Due to both buffering of input and read-ahead, you cannot intermix
4185 calls to <stdio.h> routines, such as, for example,
4186 .B getchar(),
4187 with
4188 .I flex
4189 rules and expect it to work.
4190 Call
4191 .B input()
4192 instead.
4193 .PP
4194 The total table entries listed by the
4195 .B \-v
4196 flag excludes the number of table entries needed to determine
4197 what rule has been matched.
4198 The number of entries is equal
4199 to the number of DFA states if the scanner does not use
4200 .B REJECT,
4201 and somewhat greater than the number of states if it does.
4202 .PP
4203 .B REJECT
4204 cannot be used with the
4205 .B \-f
4206 or
4207 .B \-F
4208 options.
4209 .PP
4210 The
4211 .I flex
4212 internal algorithms need documentation.
4213 .SH SEE ALSO
4214 lex(1), yacc(1), sed(1), awk(1).
4215 .PP
4216 John Levine, Tony Mason, and Doug Brown,
4217 .I Lex & Yacc,
4218 O'Reilly and Associates.
4219 Be sure to get the 2nd edition.
4220 .PP
4221 M. E. Lesk and E. Schmidt,
4222 .I LEX \- Lexical Analyzer Generator
4223 .PP
4224 Alfred Aho, Ravi Sethi and Jeffrey Ullman,
4225 .I Compilers: Principles, Techniques and Tools,
4226 Addison-Wesley (1986).
4227 Describes the pattern-matching techniques used by
4228 .I flex
4229 (deterministic finite automata).
4230 .SH AUTHOR
4231 Vern Paxson, with the help of many ideas and much inspiration from
4232 Van Jacobson.
4233 Original version by Jef Poskanzer.
4234 The fast table
4235 representation is a partial implementation of a design done by Van
4236 Jacobson.
4237 The implementation was done by Kevin Gong and Vern Paxson.
4238 .PP
4239 Thanks to the many
4240 .I flex
4241 beta-testers, feedbackers, and contributors, especially Francois Pinard,
4242 Casey Leedom,
4243 Robert Abramovitz,
4244 Stan Adermann, Terry Allen, David Barker-Plummer, John Basrai,
4245 Neal Becker, Nelson H.F. Beebe, benson@odi.com,
4246 Karl Berry, Peter A. Bigot, Simon Blanchard,
4247 Keith Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher,
4248 Brian Clapper, J.T. Conklin,
4249 Jason Coughlin, Bill Cox, Nick Cropper, Dave Curtis, Scott David
4250 Daniels, Chris G. Demetriou, Theo de Raadt,
4251 Mike Donahue, Chuck Doucette, Tom Epperly, Leo Eskin,
4252 Chris Faylor, Chris Flatters, Jon Forrest, Jeffrey Friedl,
4253 Joe Gayda, Kaveh R. Ghazi, Wolfgang Glunz,
4254 Eric Goldman, Christopher M. Gould, Ulrich Grepel, Peer Griebel,
4255 Jan Hajic, Charles Hemphill, NORO Hideo,
4256 Jarkko Hietaniemi, Scott Hofmann,
4257 Jeff Honig, Dana Hudes, Eric Hughes, John Interrante,
4258 Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones,
4259 Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane,
4260 Amir Katz, ken@ken.hilco.com, Kevin B. Kenny,
4261 Steve Kirsch, Winfried Koenig, Marq Kole, Ronald Lamprecht,
4262 Greg Lee, Rohan Lenard, Craig Leres, John Levine, Steve Liddle,
4263 David Loffredo, Mike Long,
4264 Mohamed el Lozy, Brian Madsen, Malte, Joe Marshall,
4265 Bengt Martensson, Chris Metcalf,
4266 Luke Mewburn, Jim Meyering, R. Alexander Milowski, Erik Naggum,
4267 G.T. Nicol, Landon Noll, James Nordby, Marc Nozell,
4268 Richard Ohnemus, Karsten Pahnke,
4269 Sven Panne, Roland Pesch, Walter Pelissero, Gaumond
4270 Pierre, Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha,
4271 Frederic Raimbault, Pat Rankin, Rick Richardson,
4272 Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto Santini,
4273 Andreas Scherer, Darrell Schiebel, Raf Schietekat,
4274 Doug Schmidt, Philippe Schnoebelen, Andreas Schwab,
4275 Larry Schwimmer, Alex Siegel, Eckehard Stolz, Jan-Erik Strvmquist,
4276 Mike Stump, Paul Stuart, Dave Tallman, Ian Lance Taylor,
4277 Chris Thewalt, Richard M. Timoney, Jodi Tsai,
4278 Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, Ken
4279 Yap, Ron Zellar, Nathan Zelle, David Zuhn,
4280 and those whose names have slipped my marginal
4281 mail-archiving skills but whose contributions are appreciated all the
4282 same.
4283 .PP
4284 Thanks to Keith Bostic, Jon Forrest, Noah Friedman,
4285 John Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T.
4286 Nicol, Francois Pinard, Rich Salz, and Richard Stallman for help with various
4287 distribution headaches.
4288 .PP
4289 Thanks to Esmond Pitt and Earle Horton for 8-bit character support; to
4290 Benson Margulies and Fred Burke for C++ support; to Kent Williams and Tom
4291 Epperly for C++ class support; to Ove Ewerlid for support of NUL's; and to
4292 Eric Hughes for support of multiple buffers.
4293 .PP
4294 This work was primarily done when I was with the Real Time Systems Group
4295 at the Lawrence Berkeley Laboratory in Berkeley, CA.
4296 Many thanks to all there for the support I received.
4297 .PP
4298 Send comments to vern@ee.lbl.gov.