contrib/gperf/doc/gperf.texi

   1 \input texinfo @c -*- texinfo -*-
   2 @c %**start of header
   3 @setfilename gperf.info
   4 @settitle Perfect Hash Function Generator
   5 @c @setchapternewpage odd
   6 @c %**end of header
   7
   8 @c some day we should @include version.texi instead of defining
   9 @c these values at hand.
  10 @set UPDATED 31 March 2007
  11 @set EDITION 3.0.3
  12 @set VERSION 3.0.3
  13 @c ---------------------
  14
  15 @c remove the black boxes generated in the GPL appendix.
  16 @finalout
  17
  18 @c Merge functions into the concept index
  19 @syncodeindex fn cp
  20 @c @synindex pg cp
  21
  22 @dircategory Programming Tools
  23 @direntry
  24 * Gperf: (gperf).                Perfect Hash Function Generator.
  25 @end direntry
  26
  27 @ifinfo
  28 This file documents the features of the GNU Perfect Hash Function
  29 Generator @value{VERSION}.
  30
  31 Copyright @copyright{} 1989-2006 Free Software Foundation, Inc.
  32
  33 Permission is granted to make and distribute verbatim copies of this
  34 manual provided the copyright notice and this permission notice are
  35 preserved on all copies.
  36
  37 @ignore
  38 Permission is granted to process this file through TeX and print the
  39 results, provided the printed document carries a copying permission
  40 notice identical to this one except for the removal of this paragraph
  41 (this paragraph not being relevant to the printed manual).
  42
  43 @end ignore
  44
  45 Permission is granted to copy and distribute modified versions of this
  46 manual under the conditions for verbatim copying, provided also that the
  47 section entitled ``GNU General Public License'' is included exactly as
  48 in the original, and provided that the entire resulting derived work is
  49 distributed under the terms of a permission notice identical to this
  50 one.
  51
  52 Permission is granted to copy and distribute translations of this manual
  53 into another language, under the above conditions for modified versions,
  54 except that the section entitled ``GNU General Public License'' and this
  55 permission notice may be included in translations approved by the Free
  56 Software Foundation instead of in the original English.
  57
  58 @end ifinfo
  59
  60 @titlepage
  61 @title User's Guide to @code{gperf} @value{VERSION}
  62 @subtitle The GNU Perfect Hash Function Generator
  63 @subtitle Edition @value{EDITION}, @value{UPDATED}
  64 @author Douglas C. Schmidt
  65 @author Bruno Haible
  66
  67 @page
  68 @vskip 0pt plus 1filll
  69 Copyright @copyright{} 1989-2007 Free Software Foundation, Inc.
  70
  71
  72 Permission is granted to make and distribute verbatim copies of
  73 this manual provided the copyright notice and this permission notice
  74 are preserved on all copies.
  75
  76 Permission is granted to copy and distribute modified versions of this
  77 manual under the conditions for verbatim copying, provided also that the
  78 section entitled ``GNU General Public License'' is included
  79 exactly as in the original, and provided that the entire resulting
  80 derived work is distributed under the terms of a permission notice
  81 identical to this one.
  82
  83 Permission is granted to copy and distribute translations of this manual
  84 into another language, under the above conditions for modified versions,
  85 except that the section entitled ``GNU General Public License'' may be
  86 included in a translation approved by the author instead of in the
  87 original English.
  88 @end titlepage
  89
  90 @ifinfo
  91 @node Top, Copying, (dir), (dir)
  92 @top Introduction
  93
  94 This manual documents the GNU @code{gperf} perfect hash function generator
  95 utility, focusing on its features and how to use them, and how to report
  96 bugs.
  97
  98 @menu
  99 * Copying::                     GNU @code{gperf} General Public License says
 100                                 how you can copy and share @code{gperf}.
 101 * Contributors::                People who have contributed to @code{gperf}.
 102 * Motivation::                  The purpose of @code{gperf}.
 103 * Search Structures::           Static search structures and GNU @code{gperf}
 104 * Description::                 High-level discussion of how GPERF functions.
 105 * Options::                     A description of options to the program.
 106 * Bugs::                        Known bugs and limitations with GPERF.
 107 * Projects::                    Things still left to do.
 108 * Bibliography::                Material Referenced in this Report.
 109
 110 * Concept Index::
 111
 112 @detailmenu --- The Detailed Node Listing ---
 113
 114 High-Level Description of GNU @code{gperf}
 115
 116 * Input Format::                Input Format to @code{gperf}
 117 * Output Format::               Output Format for Generated C Code with @code{gperf}
 118 * Binary Strings::              Use of NUL bytes
 119
 120 Input Format to @code{gperf}
 121
 122 * Declarations::                Declarations.
 123 * Keywords::                    Format for Keyword Entries.
 124 * Functions::                   Including Additional C Functions.
 125 * Controls for GNU indent::     Where to place directives for GNU @code{indent}.
 126
 127 Declarations
 128
 129 * User-supplied Struct::        Specifying keywords with attributes.
 130 * Gperf Declarations::          Embedding command line options in the input.
 131 * C Code Inclusion::            Including C declarations and definitions.
 132
 133 Invoking @code{gperf}
 134
 135 * Input Details::               Options that affect Interpretation of the Input File
 136 * Output Language::             Specifying the Language for the Output Code
 137 * Output Details::              Fine tuning Details in the Output Code
 138 * Algorithmic Details::         Changing the Algorithms employed by @code{gperf}
 139 * Verbosity::                   Informative Output
 140
 141 @end detailmenu
 142 @end menu
 143
 144 @end ifinfo
 145
 146 @node Copying, Contributors, Top, Top
 147 @unnumbered GNU GENERAL PUBLIC LICENSE
 148 @include gpl.texinfo
 149
 150 @node Contributors, Motivation, Copying, Top
 151 @unnumbered Contributors to GNU @code{gperf} Utility
 152
 153 @itemize @bullet
 154 @item
 155 @cindex Bugs
 156 The GNU @code{gperf} perfect hash function generator utility was
 157 written in GNU C++ by Douglas C. Schmidt.  The general
 158 idea for the perfect hash function generator was inspired by Keith
 159 Bostic's algorithm written in C, and distributed to net.sources around
 160 1984.  The current program is a heavily modified, enhanced, and extended
 161 implementation of Keith's basic idea, created at the University of
 162 California, Irvine.  Bugs, patches, and suggestions should be reported
 163 to @code{<bug-gnu-gperf@@gnu.org>}.
 164
 165 @item
 166 Special thanks is extended to Michael Tiemann and Doug Lea, for
 167 providing a useful compiler, and for giving me a forum to exhibit my
 168 creation.
 169
 170 In addition, Adam de Boor and Nels Olson provided many tips and insights
 171 that greatly helped improve the quality and functionality of @code{gperf}.
 172
 173 @item
 174 Bruno Haible enhanced and optimized the search algorithm.  He also rewrote
 175 the input routines and the output routines for better reliability, and
 176 added a testsuite.
 177 @end itemize
 178
 179 @node Motivation, Search Structures, Contributors, Top
 180 @chapter Introduction
 181
 182 @code{gperf} is a perfect hash function generator written in C++.  It
 183 transforms an @var{n} element user-specified keyword set @var{W} into a
 184 perfect hash function @var{F}.  @var{F} uniquely maps keywords in
 185 @var{W} onto the range 0..@var{k}, where @var{k} >= @var{n-1}.  If @var{k}
 186 = @var{n-1} then @var{F} is a @emph{minimal} perfect hash function.
 187 @code{gperf} generates a 0..@var{k} element static lookup table and a
 188 pair of C functions.  These functions determine whether a given
 189 character string @var{s} occurs in @var{W}, using at most one probe into
 190 the lookup table.
 191
 192 @code{gperf} currently generates the reserved keyword recognizer for
 193 lexical analyzers in several production and research compilers and
 194 language processing tools, including GNU C, GNU C++, GNU Java, GNU Pascal,
 195 GNU Modula 3, and GNU indent.  Complete C++ source code for @code{gperf} is
 196 available from @code{http://ftp.gnu.org/pub/gnu/gperf/}.
 197 A paper describing @code{gperf}'s design and implementation in greater
 198 detail is available in the Second USENIX C++ Conference proceedings
 199 or from @code{http://www.cs.wustl.edu/~schmidt/resume.html}.
 200
 201 @node Search Structures, Description, Motivation, Top
 202 @chapter Static search structures and GNU @code{gperf}
 203 @cindex Static search structure
 204
 205 A @dfn{static search structure} is an Abstract Data Type with certain
 206 fundamental operations, e.g., @emph{initialize}, @emph{insert},
 207 and @emph{retrieve}.  Conceptually, all insertions occur before any
 208 retrievals.  In practice, @code{gperf} generates a @emph{static} array
 209 containing search set keywords and any associated attributes specified
 210 by the user.  Thus, there is essentially no execution-time cost for the
 211 insertions.  It is a useful data structure for representing @emph{static
 212 search sets}.  Static search sets occur frequently in software system
 213 applications.  Typical static search sets include compiler reserved
 214 words, assembler instruction opcodes, and built-in shell interpreter
 215 commands.  Search set members, called @dfn{keywords}, are inserted into
 216 the structure only once, usually during program initialization, and are
 217 not generally modified at run-time.
 218
 219 Numerous static search structure implementations exist, e.g.,
 220 arrays, linked lists, binary search trees, digital search tries, and
 221 hash tables.  Different approaches offer trade-offs between space
 222 utilization and search time efficiency.  For example, an @var{n} element
 223 sorted array is space efficient, though the average-case time
 224 complexity for retrieval operations using binary search is
 225 proportional to log @var{n}.  Conversely, hash table implementations
 226 often locate a table entry in constant time, but typically impose
 227 additional memory overhead and exhibit poor worst case performance.
 228
 229 @cindex Minimal perfect hash functions
 230 @emph{Minimal perfect hash functions} provide an optimal solution for a
 231 particular class of static search sets.  A minimal perfect hash
 232 function is defined by two properties:
 233
 234 @itemize @bullet
 235 @item
 236 It allows keyword recognition in a static search set using at most
 237 @emph{one} probe into the hash table.  This represents the ``perfect''
 238 property.
 239 @item
 240 The actual memory allocated to store the keywords is precisely large
 241 enough for the keyword set, and @emph{no larger}.  This is the
 242 ``minimal'' property.
 243 @end itemize
 244
 245 For most applications it is far easier to generate @emph{perfect} hash
 246 functions than @emph{minimal perfect} hash functions.  Moreover,
 247 non-minimal perfect hash functions frequently execute faster than
 248 minimal ones in practice.  This phenomena occurs since searching a
 249 sparse keyword table increases the probability of locating a ``null''
 250 entry, thereby reducing string comparisons.  @code{gperf}'s default
 251 behavior generates @emph{near-minimal} perfect hash functions for
 252 keyword sets.  However, @code{gperf} provides many options that permit
 253 user control over the degree of minimality and perfection.
 254
 255 Static search sets often exhibit relative stability over time.  For
 256 example, Ada's 63 reserved words have remained constant for nearly a
 257 decade.  It is therefore frequently worthwhile to expend concerted
 258 effort building an optimal search structure @emph{once}, if it
 259 subsequently receives heavy use multiple times.  @code{gperf} removes
 260 the drudgery associated with constructing time- and space-efficient
 261 search structures by hand.  It has proven a useful and practical tool
 262 for serious programming projects.  Output from @code{gperf} is currently
 263 used in several production and research compilers, including GNU C, GNU
 264 C++, GNU Java, GNU Pascal, and GNU Modula 3.  The latter two compilers are
 265 not yet part of the official GNU distribution.  Each compiler utilizes
 266 @code{gperf} to automatically generate static search structures that
 267 efficiently identify their respective reserved keywords.
 268
 269 @node Description, Options, Search Structures, Top
 270 @chapter High-Level Description of GNU @code{gperf}
 271
 272 @menu
 273 * Input Format::                Input Format to @code{gperf}
 274 * Output Format::               Output Format for Generated C Code with @code{gperf}
 275 * Binary Strings::              Use of NUL bytes
 276 @end menu
 277
 278 The perfect hash function generator @code{gperf} reads a set of
 279 ``keywords'' from an input file (or from the standard input by
 280 default).  It attempts to derive a perfect hashing function that
 281 recognizes a member of the @dfn{static keyword set} with at most a
 282 single probe into the lookup table.  If @code{gperf} succeeds in
 283 generating such a function it produces a pair of C source code routines
 284 that perform hashing and table lookup recognition.  All generated C code
 285 is directed to the standard output.  Command-line options described
 286 below allow you to modify the input and output format to @code{gperf}.
 287
 288 By default, @code{gperf} attempts to produce time-efficient code, with
 289 less emphasis on efficient space utilization.  However, several options
 290 exist that permit trading-off execution time for storage space and vice
 291 versa.  In particular, expanding the generated table size produces a
 292 sparse search structure, generally yielding faster searches.
 293 Conversely, you can direct @code{gperf} to utilize a C @code{switch}
 294 statement scheme that minimizes data space storage size.  Furthermore,
 295 using a C @code{switch} may actually speed up the keyword retrieval time
 296 somewhat.  Actual results depend on your C compiler, of course.
 297
 298 In general, @code{gperf} assigns values to the bytes it is using
 299 for hashing until some set of values gives each keyword a unique value.
 300 A helpful heuristic is that the larger the hash value range, the easier
 301 it is for @code{gperf} to find and generate a perfect hash function.
 302 Experimentation is the key to getting the most from @code{gperf}.
 303
 304 @node Input Format, Output Format, Description, Description
 305 @section Input Format to @code{gperf}
 306 @cindex Format
 307 @cindex Declaration section
 308 @cindex Keywords section
 309 @cindex Functions section
 310 You can control the input file format by varying certain command-line
 311 arguments, in particular the @samp{-t} option.  The input's appearance
 312 is similar to GNU utilities @code{flex} and @code{bison} (or UNIX
 313 utilities @code{lex} and @code{yacc}).  Here's an outline of the general
 314 format:
 315
 316 @example
 317 @group
 318 declarations
 319 %%
 320 keywords
 321 %%
 322 functions
 323 @end group
 324 @end example
 325
 326 @emph{Unlike} @code{flex} or @code{bison}, the declarations section and
 327 the functions section are optional.  The following sections describe the
 328 input format for each section.
 329
 330 @menu
 331 * Declarations::                Declarations.
 332 * Keywords::                    Format for Keyword Entries.
 333 * Functions::                   Including Additional C Functions.
 334 * Controls for GNU indent::     Where to place directives for GNU @code{indent}.
 335 @end menu
 336
 337 It is possible to omit the declaration section entirely, if the @samp{-t}
 338 option is not given.  In this case the input file begins directly with the
 339 first keyword line, e.g.:
 340
 341 @example
 342 @group
 343 january
 344 february
 345 march
 346 april
 347 ...
 348 @end group
 349 @end example
 350
 351 @node Declarations, Keywords, Input Format, Input Format
 352 @subsection Declarations
 353
 354 The keyword input file optionally contains a section for including
 355 arbitrary C declarations and definitions, @code{gperf} declarations that
 356 act like command-line options, as well as for providing a user-supplied
 357 @code{struct}.
 358
 359 @menu
 360 * User-supplied Struct::        Specifying keywords with attributes.
 361 * Gperf Declarations::          Embedding command line options in the input.
 362 * C Code Inclusion::            Including C declarations and definitions.
 363 @end menu
 364
 365 @node User-supplied Struct, Gperf Declarations, Declarations, Declarations
 366 @subsubsection User-supplied @code{struct}
 367
 368 If the @samp{-t} option (or, equivalently, the @samp{%struct-type} declaration)
 369 @emph{is} enabled, you @emph{must} provide a C @code{struct} as the last
 370 component in the declaration section from the input file.  The first
 371 field in this struct must be of type @code{char *} or @code{const char *}
 372 if the @samp{-P} option is not given, or of type @code{int} if the option
 373 @samp{-P} (or, equivalently, the @samp{%pic} declaration) is enabled.
 374 This first field must be called @samp{name}, although it is possible to modify
 375 its name with the @samp{-K} option (or, equivalently, the
 376 @samp{%define slot-name} declaration) described below.
 377
 378 Here is a simple example, using months of the year and their attributes as
 379 input:
 380
 381 @example
 382 @group
 383 struct month @{ char *name; int number; int days; int leap_days; @};
 384 %%
 385 january,   1, 31, 31
 386 february,  2, 28, 29
 387 march,     3, 31, 31
 388 april,     4, 30, 30
 389 may,       5, 31, 31
 390 june,      6, 30, 30
 391 july,      7, 31, 31
 392 august,    8, 31, 31
 393 september, 9, 30, 30
 394 october,  10, 31, 31
 395 november, 11, 30, 30
 396 december, 12, 31, 31
 397 @end group
 398 @end example
 399
 400 @cindex @samp{%%}
 401 Separating the @code{struct} declaration from the list of keywords and
 402 other fields are a pair of consecutive percent signs, @samp{%%},
 403 appearing left justified in the first column, as in the UNIX utility
 404 @code{lex}.
 405
 406 If the @code{struct} has already been declared in an include file, it can
 407 be mentioned in an abbreviated form, like this:
 408
 409 @example
 410 @group
 411 struct month;
 412 %%
 413 january,   1, 31, 31
 414 ...
 415 @end group
 416 @end example
 417
 418 @node Gperf Declarations, C Code Inclusion, User-supplied Struct, Declarations
 419 @subsubsection Gperf Declarations
 420
 421 The declaration section can contain @code{gperf} declarations.  They
 422 influence the way @code{gperf} works, like command line options do.
 423 In fact, every such declaration is equivalent to a command line option.
 424 There are three forms of declarations:
 425
 426 @enumerate
 427 @item
 428 Declarations without argument, like @samp{%compare-lengths}.
 429
 430 @item
 431 Declarations with an argument, like @samp{%switch=@var{count}}.
 432
 433 @item
 434 Declarations of names of entities in the output file, like
 435 @samp{%define lookup-function-name @var{name}}.
 436 @end enumerate
 437
 438 When a declaration is given both in the input file and as a command line
 439 option, the command-line option's value prevails.
 440
 441 The following @code{gperf} declarations are available.
 442
 443 @table @samp
 444 @item %delimiters=@var{delimiter-list}
 445 @cindex @samp{%delimiters}
 446 Allows you to provide a string containing delimiters used to
 447 separate keywords from their attributes.  The default is ",".  This
 448 option is essential if you want to use keywords that have embedded
 449 commas or newlines.
 450
 451 @item %struct-type
 452 @cindex @samp{%struct-type}
 453 Allows you to include a @code{struct} type declaration for generated
 454 code; see above for an example.
 455
 456 @item %ignore-case
 457 @cindex @samp{%ignore-case}
 458 Consider upper and lower case ASCII characters as equivalent.  The string
 459 comparison will use a case insignificant character comparison.  Note that
 460 locale dependent case mappings are ignored.
 461
 462 @item %language=@var{language-name}
 463 @cindex @samp{%language}
 464 Instructs @code{gperf} to generate code in the language specified by the
 465 option's argument.  Languages handled are currently:
 466
 467 @table @samp
 468 @item KR-C
 469 Old-style K&R C.  This language is understood by old-style C compilers and
 470 ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)
 471 because of lacking @samp{const}.
 472
 473 @item C
 474 Common C.  This language is understood by ANSI C compilers, and also by
 475 old-style C compilers, provided that you @code{#define const} to empty
 476 for compilers which don't know about this keyword.
 477
 478 @item ANSI-C
 479 ANSI C.  This language is understood by ANSI C compilers and C++ compilers.
 480
 481 @item C++
 482 C++.  This language is understood by C++ compilers.
 483 @end table
 484
 485 The default is C.
 486
 487 @item %define slot-name @var{name}
 488 @cindex @samp{%define slot-name}
 489 This declaration is only useful when option @samp{-t} (or, equivalently, the
 490 @samp{%struct-type} declaration) has been given.
 491 By default, the program assumes the structure component identifier for
 492 the keyword is @samp{name}.  This option allows an arbitrary choice of
 493 identifier for this component, although it still must occur as the first
 494 field in your supplied @code{struct}.
 495
 496 @item %define initializer-suffix @var{initializers}
 497 @cindex @samp{%define initializer-suffix}
 498 This declaration is only useful when option @samp{-t} (or, equivalently, the
 499 @samp{%struct-type} declaration) has been given.
 500 It permits to specify initializers for the structure members following
 501 @var{slot-name} in empty hash table entries.  The list of initializers
 502 should start with a comma.  By default, the emitted code will
 503 zero-initialize structure members following @var{slot-name}.
 504
 505 @item %define hash-function-name @var{name}
 506 @cindex @samp{%define hash-function-name}
 507 Allows you to specify the name for the generated hash function.  Default
 508 name is @samp{hash}.  This option permits the use of two hash tables in
 509 the same file.
 510
 511 @item %define lookup-function-name @var{name}
 512 @cindex @samp{%define lookup-function-name}
 513 Allows you to specify the name for the generated lookup function.
 514 Default name is @samp{in_word_set}.  This option permits multiple
 515 generated hash functions to be used in the same application.
 516
 517 @item %define class-name @var{name}
 518 @cindex @samp{%define class-name}
 519 This option is only useful when option @samp{-L C++} (or, equivalently,
 520 the @samp{%language=C++} declaration) has been given.  It
 521 allows you to specify the name of generated C++ class.  Default name is
 522 @code{Perfect_Hash}.
 523
 524 @item %7bit
 525 @cindex @samp{%7bit}
 526 This option specifies that all strings that will be passed as arguments
 527 to the generated hash function and the generated lookup function will
 528 solely consist of 7-bit ASCII characters (bytes in the range 0..127).
 529 (Note that the ANSI C functions @code{isalnum} and @code{isgraph} do
 530 @emph{not} guarantee that a byte is in this range.  Only an explicit
 531 test like @samp{c >= 'A' && c <= 'Z'} guarantees this.)
 532
 533 @item %compare-lengths
 534 @cindex @samp{%compare-lengths}
 535 Compare keyword lengths before trying a string comparison.  This option
 536 is mandatory for binary comparisons (@pxref{Binary Strings}).  It also might
 537 cut down on the number of string comparisons made during the lookup, since
 538 keywords with different lengths are never compared via @code{strcmp}.
 539 However, using @samp{%compare-lengths} might greatly increase the size of the
 540 generated C code if the lookup table range is large (which implies that
 541 the switch option @samp{-S} or @samp{%switch} is not enabled), since the length
 542 table contains as many elements as there are entries in the lookup table.
 543
 544 @item %compare-strncmp
 545 @cindex @samp{%compare-strncmp}
 546 Generates C code that uses the @code{strncmp} function to perform
 547 string comparisons.  The default action is to use @code{strcmp}.
 548
 549 @item %readonly-tables
 550 @cindex @samp{%readonly-tables}
 551 Makes the contents of all generated lookup tables constant, i.e.,
 552 ``readonly''.  Many compilers can generate more efficient code for this
 553 by putting the tables in readonly memory.
 554
 555 @item %enum
 556 @cindex @samp{%enum}
 557 Define constant values using an enum local to the lookup function rather
 558 than with #defines.  This also means that different lookup functions can
 559 reside in the same file.  Thanks to James Clark @code{<jjc@@ai.mit.edu>}.
 560
 561 @item %includes
 562 @cindex @samp{%includes}
 563 Include the necessary system include file, @code{<string.h>}, at the
 564 beginning of the code.  By default, this is not done; the user must
 565 include this header file himself to allow compilation of the code.
 566
 567 @item %global-table
 568 @cindex @samp{%global-table}
 569 Generate the static table of keywords as a static global variable,
 570 rather than hiding it inside of the lookup function (which is the
 571 default behavior).
 572
 573 @item %pic
 574 @cindex @samp{%pic}
 575 Optimize the generated table for inclusion in shared libraries.  This
 576 reduces the startup time of programs using a shared library containing
 577 the generated code.  If the @samp{%struct-type} declaration (or,
 578 equivalently, the option @samp{-t}) is also given, the first field of the
 579 user-defined struct must be of type @samp{int}, not @samp{char *}, because
 580 it will contain offsets into the string pool instead of actual strings.
 581 To convert such an offset to a string, you can use the expression
 582 @samp{stringpool + @var{o}}, where @var{o} is the offset.  The string pool
 583 name can be changed through the @samp{%define string-pool-name} declaration.
 584
 585 @item %define string-pool-name @var{name}
 586 @cindex @samp{%define string-pool-name}
 587 Allows you to specify the name of the generated string pool created by
 588 the declaration @samp{%pic} (or, equivalently, the option @samp{-P}).
 589 The default name is @samp{stringpool}.  This declaration permits the use of
 590 two hash tables in the same file, with @samp{%pic} and even when the
 591 @samp{%global-table} declaration (or, equivalently, the option @samp{-G})
 592 is given.
 593
 594 @item %null-strings
 595 @cindex @samp{%null-strings}
 596 Use NULL strings instead of empty strings for empty keyword table entries.
 597 This reduces the startup time of programs using a shared library containing
 598 the generated code (but not as much as the declaration @samp{%pic}), at the
 599 expense of one more test-and-branch instruction at run time.
 600
 601 @item %define word-array-name @var{name}
 602 @cindex @samp{%define word-array-name}
 603 Allows you to specify the name for the generated array containing the
 604 hash table.  Default name is @samp{wordlist}.  This option permits the
 605 use of two hash tables in the same file, even when the option @samp{-G}
 606 (or, equivalently, the @samp{%global-table} declaration) is given.
 607
 608 @item %define length-table-name @var{name}
 609 @cindex @samp{%define length-table-name}
 610 Allows you to specify the name for the generated array containing the
 611 length table.  Default name is @samp{lengthtable}.  This option permits the
 612 use of two length tables in the same file, even when the option @samp{-G}
 613 (or, equivalently, the @samp{%global-table} declaration) is given.
 614
 615 @item %switch=@var{count}
 616 @cindex @samp{%switch}
 617 Causes the generated C code to use a @code{switch} statement scheme,
 618 rather than an array lookup table.  This can lead to a reduction in both
 619 time and space requirements for some input files.  The argument to this
 620 option determines how many @code{switch} statements are generated.  A
 621 value of 1 generates 1 @code{switch} containing all the elements, a
 622 value of 2 generates 2 tables with 1/2 the elements in each
 623 @code{switch}, etc.  This is useful since many C compilers cannot
 624 correctly generate code for large @code{switch} statements.  This option
 625 was inspired in part by Keith Bostic's original C program.
 626
 627 @item %omit-struct-type
 628 @cindex @samp{%omit-struct-type}
 629 Prevents the transfer of the type declaration to the output file.  Use
 630 this option if the type is already defined elsewhere.
 631 @end table
 632
 633 @node C Code Inclusion,  , Gperf Declarations, Declarations
 634 @subsubsection C Code Inclusion
 635
 636 @cindex @samp{%@{}
 637 @cindex @samp{%@}}
 638 Using a syntax similar to GNU utilities @code{flex} and @code{bison}, it
 639 is possible to directly include C source text and comments verbatim into
 640 the generated output file.  This is accomplished by enclosing the region
 641 inside left-justified surrounding @samp{%@{}, @samp{%@}} pairs.  Here is
 642 an input fragment based on the previous example that illustrates this
 643 feature:
 644
 645 @example
 646 @group
 647 %@{
 648 #include <assert.h>
 649 /* This section of code is inserted directly into the output. */
 650 int return_month_days (struct month *months, int is_leap_year);
 651 %@}
 652 struct month @{ char *name; int number; int days; int leap_days; @};
 653 %%
 654 january,   1, 31, 31
 655 february,  2, 28, 29
 656 march,     3, 31, 31
 657 ...
 658 @end group
 659 @end example
 660
 661 @node Keywords, Functions, Declarations, Input Format
 662 @subsection Format for Keyword Entries
 663
 664 The second input file format section contains lines of keywords and any
 665 associated attributes you might supply.  A line beginning with @samp{#}
 666 in the first column is considered a comment.  Everything following the
 667 @samp{#} is ignored, up to and including the following newline.  A line
 668 beginning with @samp{%} in the first column is an option declaration and
 669 must not occur within the keywords section.
 670
 671 The first field of each non-comment line is always the keyword itself.  It
 672 can be given in two ways: as a simple name, i.e., without surrounding
 673 string quotation marks, or as a string enclosed in double-quotes, in
 674 C syntax, possibly with backslash escapes like @code{\"} or @code{\234}
 675 or @code{\xa8}.  In either case, it must start right at the beginning
 676 of the line, without leading whitespace.
 677 In this context, a ``field'' is considered to extend up to, but
 678 not include, the first blank, comma, or newline.  Here is a simple
 679 example taken from a partial list of C reserved words:
 680
 681 @example
 682 @group
 683 # These are a few C reserved words, see the c.gperf file
 684 # for a complete list of ANSI C reserved words.
 685 unsigned
 686 sizeof
 687 switch
 688 signed
 689 if
 690 default
 691 for
 692 while
 693 return
 694 @end group
 695 @end example
 696
 697 Note that unlike @code{flex} or @code{bison} the first @samp{%%} marker
 698 may be elided if the declaration section is empty.
 699
 700 Additional fields may optionally follow the leading keyword.  Fields
 701 should be separated by commas, and terminate at the end of line.  What
 702 these fields mean is entirely up to you; they are used to initialize the
 703 elements of the user-defined @code{struct} provided by you in the
 704 declaration section.  If the @samp{-t} option (or, equivalently, the
 705 @samp{%struct-type} declaration) is @emph{not} enabled
 706 these fields are simply ignored.  All previous examples except the last
 707 one contain keyword attributes.
 708
 709 @node Functions, Controls for GNU indent, Keywords, Input Format
 710 @subsection Including Additional C Functions
 711
 712 The optional third section also corresponds closely with conventions
 713 found in @code{flex} and @code{bison}.  All text in this section,
 714 starting at the final @samp{%%} and extending to the end of the input
 715 file, is included verbatim into the generated output file.  Naturally,
 716 it is your responsibility to ensure that the code contained in this
 717 section is valid C.
 718
 719 @node Controls for GNU indent,  , Functions, Input Format
 720 @subsection Where to place directives for GNU @code{indent}.
 721
 722 If you want to invoke GNU @code{indent} on a @code{gperf} input file,
 723 you will see that GNU @code{indent} doesn't understand the @samp{%%},
 724 @samp{%@{} and @samp{%@}} directives that control @code{gperf}'s
 725 interpretation of the input file.  Therefore you have to insert some
 726 directives for GNU @code{indent}.  More precisely, assuming the most
 727 general input file structure
 728
 729 @example
 730 @group
 731 declarations part 1
 732 %@{
 733 verbatim code
 734 %@}
 735 declarations part 2
 736 %%
 737 keywords
 738 %%
 739 functions
 740 @end group
 741 @end example
 742
 743 @noindent
 744 you would insert @samp{*INDENT-OFF*} and @samp{*INDENT-ON*} comments
 745 as follows:
 746
 747 @example
 748 @group
 749 /* *INDENT-OFF* */
 750 declarations part 1
 751 %@{
 752 /* *INDENT-ON* */
 753 verbatim code
 754 /* *INDENT-OFF* */
 755 %@}
 756 declarations part 2
 757 %%
 758 keywords
 759 %%
 760 /* *INDENT-ON* */
 761 functions
 762 @end group
 763 @end example
 764
 765 @node Output Format, Binary Strings, Input Format, Description
 766 @section Output Format for Generated C Code with @code{gperf}
 767 @cindex hash table
 768
 769 Several options control how the generated C code appears on the standard
 770 output.  Two C functions are generated.  They are called @code{hash} and
 771 @code{in_word_set}, although you may modify their names with a command-line
 772 option.  Both functions require two arguments, a string, @code{char *}
 773 @var{str}, and a length parameter, @code{int} @var{len}.  Their default
 774 function prototypes are as follows:
 775
 776 @deftypefun {unsigned int} hash (const char * @var{str}, unsigned int @var{len})
 777 By default, the generated @code{hash} function returns an integer value
 778 created by adding @var{len} to several user-specified @var{str} byte
 779 positions indexed into an @dfn{associated values} table stored in a
 780 local static array.  The associated values table is constructed
 781 internally by @code{gperf} and later output as a static local C array
 782 called @samp{hash_table}.  The relevant selected positions (i.e. indices
 783 into @var{str}) are specified via the @samp{-k} option when running
 784 @code{gperf}, as detailed in the @emph{Options} section below (@pxref{Options}).
 785 @end deftypefun
 786
 787 @deftypefun {} in_word_set (const char * @var{str}, unsigned int @var{len})
 788 If @var{str} is in the keyword set, returns a pointer to that
 789 keyword.  More exactly, if the option @samp{-t} (or, equivalently, the
 790 @samp{%struct-type} declaration) was given, it returns
 791 a pointer to the matching keyword's structure.  Otherwise it returns
 792 @code{NULL}.
 793 @end deftypefun
 794
 795 If the option @samp{-c} (or, equivalently, the @samp{%compare-strncmp}
 796 declaration) is not used, @var{str} must be a NUL terminated
 797 string of exactly length @var{len}.  If @samp{-c} (or, equivalently, the
 798 @samp{%compare-strncmp} declaration) is used, @var{str} must
 799 simply be an array of @var{len} bytes and does not need to be NUL
 800 terminated.
 801
 802 The code generated for these two functions is affected by the following
 803 options:
 804
 805 @table @samp
 806 @item -t
 807 @itemx --struct-type
 808 Make use of the user-defined @code{struct}.
 809
 810 @item -S @var{total-switch-statements}
 811 @itemx --switch=@var{total-switch-statements}
 812 @cindex @code{switch}
 813 Generate 1 or more C @code{switch} statement rather than use a large,
 814 (and potentially sparse) static array.  Although the exact time and
 815 space savings of this approach vary according to your C compiler's
 816 degree of optimization, this method often results in smaller and faster
 817 code.
 818 @end table
 819
 820 If the @samp{-t} and @samp{-S} options (or, equivalently, the
 821 @samp{%struct-type} and @samp{%switch} declarations) are omitted, the default
 822 action
 823 is to generate a @code{char *} array containing the keywords, together with
 824 additional empty strings used for padding the array.  By experimenting
 825 with the various input and output options, and timing the resulting C
 826 code, you can determine the best option choices for different keyword
 827 set characteristics.
 828
 829 @node Binary Strings,  , Output Format, Description
 830 @section Use of NUL bytes
 831 @cindex NUL
 832
 833 By default, the code generated by @code{gperf} operates on zero
 834 terminated strings, the usual representation of strings in C.  This means
 835 that the keywords in the input file must not contain NUL bytes,
 836 and the @var{str} argument passed to @code{hash} or @code{in_word_set}
 837 must be NUL terminated and have exactly length @var{len}.
 838
 839 If option @samp{-c} (or, equivalently, the @samp{%compare-strncmp}
 840 declaration) is used, then the @var{str} argument does not need
 841 to be NUL terminated.  The code generated by @code{gperf} will only
 842 access the first @var{len}, not @var{len+1}, bytes starting at @var{str}.
 843 However, the keywords in the input file still must not contain NUL
 844 bytes.
 845
 846 If option @samp{-l} (or, equivalently, the @samp{%compare-lengths}
 847 declaration) is used, then the hash table performs binary
 848 comparison.  The keywords in the input file may contain NUL bytes,
 849 written in string syntax as @code{\000} or @code{\x00}, and the code
 850 generated by @code{gperf} will treat NUL like any other byte.
 851 Also, in this case the @samp{-c} option (or, equivalently, the
 852 @samp{%compare-strncmp} declaration) is ignored.
 853
 854 @node Options, Bugs, Description, Top
 855 @chapter Invoking @code{gperf}
 856
 857 There are @emph{many} options to @code{gperf}.  They were added to make
 858 the program more convenient for use with real applications.  ``On-line''
 859 help is readily available via the @samp{--help} option.  Here is the
 860 complete list of options.
 861
 862 @menu
 863 * Output File::                 Specifying the Location of the Output File
 864 * Input Details::               Options that affect Interpretation of the Input File
 865 * Output Language::             Specifying the Language for the Output Code
 866 * Output Details::              Fine tuning Details in the Output Code
 867 * Algorithmic Details::         Changing the Algorithms employed by @code{gperf}
 868 * Verbosity::                   Informative Output
 869 @end menu
 870
 871 @node Output File, Input Details, Options, Options
 872 @section Specifying the Location of the Output File
 873
 874 @table @samp
 875 @item --output-file=@var{file}
 876 Allows you to specify the name of the file to which the output is written to.
 877 @end table
 878
 879 The results are written to standard output if no output file is specified
 880 or if it is @samp{-}.
 881
 882 @node Input Details, Output Language, Output File, Options
 883 @section Options that affect Interpretation of the Input File
 884
 885 These options are also available as declarations in the input file
 886 (@pxref{Gperf Declarations}).
 887
 888 @table @samp
 889 @item -e @var{keyword-delimiter-list}
 890 @itemx --delimiters=@var{keyword-delimiter-list}
 891 @cindex Delimiters
 892 Allows you to provide a string containing delimiters used to
 893 separate keywords from their attributes.  The default is ",".  This
 894 option is essential if you want to use keywords that have embedded
 895 commas or newlines.  One useful trick is to use -e'TAB', where TAB is
 896 the literal tab character.
 897
 898 @item -t
 899 @itemx --struct-type
 900 Allows you to include a @code{struct} type declaration for generated
 901 code.  Any text before a pair of consecutive @samp{%%} is considered
 902 part of the type declaration.  Keywords and additional fields may follow
 903 this, one group of fields per line.  A set of examples for generating
 904 perfect hash tables and functions for Ada, C, C++, Pascal, Modula 2,
 905 Modula 3 and JavaScript reserved words are distributed with this release.
 906
 907 @item --ignore-case
 908 Consider upper and lower case ASCII characters as equivalent.  The string
 909 comparison will use a case insignificant character comparison.  Note that
 910 locale dependent case mappings are ignored.  This option is therefore not
 911 suitable if a properly internationalized or locale aware case mapping
 912 should be used.  (For example, in a Turkish locale, the upper case equivalent
 913 of the lowercase ASCII letter @samp{i} is the non-ASCII character
 914 @samp{capital i with dot above}.)  For this case, it is better to apply
 915 an uppercase or lowercase conversion on the string before passing it to
 916 the @code{gperf} generated function.
 917 @end table
 918
 919 @node Output Language, Output Details, Input Details, Options
 920 @section Options to specify the Language for the Output Code
 921
 922 These options are also available as declarations in the input file
 923 (@pxref{Gperf Declarations}).
 924
 925 @table @samp
 926 @item -L @var{generated-language-name}
 927 @itemx --language=@var{generated-language-name}
 928 Instructs @code{gperf} to generate code in the language specified by the
 929 option's argument.  Languages handled are currently:
 930
 931 @table @samp
 932 @item KR-C
 933 Old-style K&R C.  This language is understood by old-style C compilers and
 934 ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)
 935 because of lacking @samp{const}.
 936
 937 @item C
 938 Common C.  This language is understood by ANSI C compilers, and also by
 939 old-style C compilers, provided that you @code{#define const} to empty
 940 for compilers which don't know about this keyword.
 941
 942 @item ANSI-C
 943 ANSI C.  This language is understood by ANSI C compilers and C++ compilers.
 944
 945 @item C++
 946 C++.  This language is understood by C++ compilers.
 947 @end table
 948
 949 The default is C.
 950
 951 @item -a
 952 This option is supported for compatibility with previous releases of
 953 @code{gperf}.  It does not do anything.
 954
 955 @item -g
 956 This option is supported for compatibility with previous releases of
 957 @code{gperf}.  It does not do anything.
 958 @end table
 959
 960 @node Output Details, Algorithmic Details, Output Language, Options
 961 @section Options for fine tuning Details in the Output Code
 962
 963 Most of these options are also available as declarations in the input file
 964 (@pxref{Gperf Declarations}).
 965
 966 @table @samp
 967 @item -K @var{slot-name}
 968 @itemx --slot-name=@var{slot-name}
 969 @cindex Slot name
 970 This option is only useful when option @samp{-t} (or, equivalently, the
 971 @samp{%struct-type} declaration) has been given.
 972 By default, the program assumes the structure component identifier for
 973 the keyword is @samp{name}.  This option allows an arbitrary choice of
 974 identifier for this component, although it still must occur as the first
 975 field in your supplied @code{struct}.
 976
 977 @item -F @var{initializers}
 978 @itemx --initializer-suffix=@var{initializers}
 979 @cindex Initializers
 980 This option is only useful when option @samp{-t} (or, equivalently, the
 981 @samp{%struct-type} declaration) has been given.
 982 It permits to specify initializers for the structure members following
 983 @var{slot-name} in empty hash table entries.  The list of initializers
 984 should start with a comma.  By default, the emitted code will
 985 zero-initialize structure members following @var{slot-name}.
 986
 987 @item -H @var{hash-function-name}
 988 @itemx --hash-function-name=@var{hash-function-name}
 989 Allows you to specify the name for the generated hash function.  Default
 990 name is @samp{hash}.  This option permits the use of two hash tables in
 991 the same file.
 992
 993 @item -N @var{lookup-function-name}
 994 @itemx --lookup-function-name=@var{lookup-function-name}
 995 Allows you to specify the name for the generated lookup function.
 996 Default name is @samp{in_word_set}.  This option permits multiple
 997 generated hash functions to be used in the same application.
 998
 999 @item -Z @var{class-name}
1000 @itemx --class-name=@var{class-name}
1001 @cindex Class name
1002 This option is only useful when option @samp{-L C++} (or, equivalently,
1003 the @samp{%language=C++} declaration) has been given.  It
1004 allows you to specify the name of generated C++ class.  Default name is
1005 @code{Perfect_Hash}.
1006
1007 @item -7
1008 @itemx --seven-bit
1009 This option specifies that all strings that will be passed as arguments
1010 to the generated hash function and the generated lookup function will
1011 solely consist of 7-bit ASCII characters (bytes in the range 0..127).
1012 (Note that the ANSI C functions @code{isalnum} and @code{isgraph} do
1013 @emph{not} guarantee that a byte is in this range.  Only an explicit
1014 test like @samp{c >= 'A' && c <= 'Z'} guarantees this.) This was the
1015 default in versions of @code{gperf} earlier than 2.7; now the default is
1016 to support 8-bit and multibyte characters.
1017
1018 @item -l
1019 @itemx --compare-lengths
1020 Compare keyword lengths before trying a string comparison.  This option
1021 is mandatory for binary comparisons (@pxref{Binary Strings}).  It also might
1022 cut down on the number of string comparisons made during the lookup, since
1023 keywords with different lengths are never compared via @code{strcmp}.
1024 However, using @samp{-l} might greatly increase the size of the
1025 generated C code if the lookup table range is large (which implies that
1026 the switch option @samp{-S} or @samp{%switch} is not enabled), since the length
1027 table contains as many elements as there are entries in the lookup table.
1028
1029 @item -c
1030 @itemx --compare-strncmp
1031 Generates C code that uses the @code{strncmp} function to perform
1032 string comparisons.  The default action is to use @code{strcmp}.
1033
1034 @item -C
1035 @itemx --readonly-tables
1036 Makes the contents of all generated lookup tables constant, i.e.,
1037 ``readonly''.  Many compilers can generate more efficient code for this
1038 by putting the tables in readonly memory.
1039
1040 @item -E
1041 @itemx  --enum
1042 Define constant values using an enum local to the lookup function rather
1043 than with #defines.  This also means that different lookup functions can
1044 reside in the same file.  Thanks to James Clark @code{<jjc@@ai.mit.edu>}.
1045
1046 @item -I
1047 @itemx --includes
1048 Include the necessary system include file, @code{<string.h>}, at the
1049 beginning of the code.  By default, this is not done; the user must
1050 include this header file himself to allow compilation of the code.
1051
1052 @item -G
1053 @itemx --global-table
1054 Generate the static table of keywords as a static global variable,
1055 rather than hiding it inside of the lookup function (which is the
1056 default behavior).
1057
1058 @item -P
1059 @itemx --pic
1060 Optimize the generated table for inclusion in shared libraries.  This
1061 reduces the startup time of programs using a shared library containing
1062 the generated code.  If the option @samp{-t} (or, equivalently, the
1063 @samp{%struct-type} declaration) is also given, the first field of the
1064 user-defined struct must be of type @samp{int}, not @samp{char *}, because
1065 it will contain offsets into the string pool instead of actual strings.
1066 To convert such an offset to a string, you can use the expression
1067 @samp{stringpool + @var{o}}, where @var{o} is the offset.  The string pool
1068 name can be changed through the option @samp{--string-pool-name}.
1069
1070 @item -Q @var{string-pool-name}
1071 @itemx --string-pool-name=@var{string-pool-name}
1072 Allows you to specify the name of the generated string pool created by
1073 option @samp{-P}.  The default name is @samp{stringpool}.  This option
1074 permits the use of two hash tables in the same file, with @samp{-P} and
1075 even when the option @samp{-G} (or, equivalently, the @samp{%global-table}
1076 declaration) is given.
1077
1078 @item --null-strings
1079 Use NULL strings instead of empty strings for empty keyword table entries.
1080 This reduces the startup time of programs using a shared library containing
1081 the generated code (but not as much as option @samp{-P}), at the expense
1082 of one more test-and-branch instruction at run time.
1083
1084 @item -W @var{hash-table-array-name}
1085 @itemx --word-array-name=@var{hash-table-array-name}
1086 @cindex Array name
1087 Allows you to specify the name for the generated array containing the
1088 hash table.  Default name is @samp{wordlist}.  This option permits the
1089 use of two hash tables in the same file, even when the option @samp{-G}
1090 (or, equivalently, the @samp{%global-table} declaration) is given.
1091
1092 @itemx --length-table-name=@var{length-table-array-name}
1093 @cindex Array name
1094 Allows you to specify the name for the generated array containing the
1095 length table.  Default name is @samp{lengthtable}.  This option permits the
1096 use of two length tables in the same file, even when the option @samp{-G}
1097 (or, equivalently, the @samp{%global-table} declaration) is given.
1098
1099 @item -S @var{total-switch-statements}
1100 @itemx --switch=@var{total-switch-statements}
1101 @cindex @code{switch}
1102 Causes the generated C code to use a @code{switch} statement scheme,
1103 rather than an array lookup table.  This can lead to a reduction in both
1104 time and space requirements for some input files.  The argument to this
1105 option determines how many @code{switch} statements are generated.  A
1106 value of 1 generates 1 @code{switch} containing all the elements, a
1107 value of 2 generates 2 tables with 1/2 the elements in each
1108 @code{switch}, etc.  This is useful since many C compilers cannot
1109 correctly generate code for large @code{switch} statements.  This option
1110 was inspired in part by Keith Bostic's original C program.
1111
1112 @item -T
1113 @itemx --omit-struct-type
1114 Prevents the transfer of the type declaration to the output file.  Use
1115 this option if the type is already defined elsewhere.
1116
1117 @item -p
1118 This option is supported for compatibility with previous releases of
1119 @code{gperf}.  It does not do anything.
1120 @end table
1121
1122 @node Algorithmic Details, Verbosity, Output Details, Options
1123 @section Options for changing the Algorithms employed by @code{gperf}
1124
1125 @table @samp
1126 @item -k @var{selected-byte-positions}
1127 @itemx --key-positions=@var{selected-byte-positions}
1128 Allows selection of the byte positions used in the keywords'
1129 hash function.  The allowable choices range between 1-255, inclusive.
1130 The positions are separated by commas, e.g., @samp{-k 9,4,13,14};
1131 ranges may be used, e.g., @samp{-k 2-7}; and positions may occur
1132 in any order.  Furthermore, the wildcard '*' causes the generated
1133 hash function to consider @strong{all} byte positions in each keyword,
1134 whereas '$' instructs the hash function to use the ``final byte''
1135 of a keyword (this is the only way to use a byte position greater than
1136 255, incidentally).
1137
1138 For instance, the option @samp{-k 1,2,4,6-10,'$'} generates a hash
1139 function that considers positions 1,2,4,6,7,8,9,10, plus the last
1140 byte in each keyword (which may be at a different position for each
1141 keyword, obviously).  Keywords
1142 with length less than the indicated byte positions work properly, since
1143 selected byte positions exceeding the keyword length are simply not
1144 referenced in the hash function.
1145
1146 This option is not normally needed since version 2.8 of @code{gperf};
1147 the default byte positions are computed depending on the keyword set,
1148 through a search that minimizes the number of byte positions.
1149
1150 @item -D
1151 @itemx --duplicates
1152 @cindex Duplicates
1153 Handle keywords whose selected byte sets hash to duplicate values.
1154 Duplicate hash values can occur if a set of keywords has the same names, but
1155 possesses different attributes, or if the selected byte positions are not well
1156 chosen.  With the -D option @code{gperf} treats all these keywords as
1157 part of an equivalence class and generates a perfect hash function with
1158 multiple comparisons for duplicate keywords.  It is up to you to completely
1159 disambiguate the keywords by modifying the generated C code.  However,
1160 @code{gperf} helps you out by organizing the output.
1161
1162 Using this option usually means that the generated hash function is no
1163 longer perfect.  On the other hand, it permits @code{gperf} to work on
1164 keyword sets that it otherwise could not handle.
1165
1166 @item -m @var{iterations}
1167 @itemx --multiple-iterations=@var{iterations}
1168 Perform multiple choices of the @samp{-i} and @samp{-j} values, and
1169 choose the best results.  This increases the running time by a factor of
1170 @var{iterations} but does a good job minimizing the generated table size.
1171
1172 @item -i @var{initial-value}
1173 @itemx --initial-asso=@var{initial-value}
1174 Provides an initial @var{value} for the associate values array.  Default
1175 is 0.  Increasing the initial value helps inflate the final table size,
1176 possibly leading to more time efficient keyword lookups.  Note that this
1177 option is not particularly useful when @samp{-S} (or, equivalently,
1178 @samp{%switch}) is used.  Also,
1179 @samp{-i} is overridden when the @samp{-r} option is used.
1180
1181 @item -j @var{jump-value}
1182 @itemx --jump=@var{jump-value}
1183 @cindex Jump value
1184 Affects the ``jump value'', i.e., how far to advance the associated
1185 byte value upon collisions.  @var{Jump-value} is rounded up to an
1186 odd number, the default is 5.  If the @var{jump-value} is 0 @code{gperf}
1187 jumps by random amounts.
1188
1189 @item -n
1190 @itemx --no-strlen
1191 Instructs the generator not to include the length of a keyword when
1192 computing its hash value.  This may save a few assembly instructions in
1193 the generated lookup table.
1194
1195 @item -r
1196 @itemx --random
1197 Utilizes randomness to initialize the associated values table.  This
1198 frequently generates solutions faster than using deterministic
1199 initialization (which starts all associated values at 0).  Furthermore,
1200 using the randomization option generally increases the size of the
1201 table.
1202
1203 @item -s @var{size-multiple}
1204 @itemx --size-multiple=@var{size-multiple}
1205 Affects the size of the generated hash table.  The numeric argument for
1206 this option indicates ``how many times larger or smaller'' the maximum
1207 associated value range should be, in relationship to the number of keywords.
1208 It can be written as an integer, a floating-point number or a fraction.
1209 For example, a value of 3 means ``allow the maximum associated value to be
1210 about 3 times larger than the number of input keywords''.
1211 Conversely, a value of 1/3 means ``allow the maximum associated value to
1212 be about 3 times smaller than the number of input keywords''.  Values
1213 smaller than 1 are useful for limiting the overall size of the generated hash
1214 table, though the option @samp{-m} is better at this purpose.
1215
1216 If `generate switch' option @samp{-S} (or, equivalently, @samp{%switch}) is
1217 @emph{not} enabled, the maximum
1218 associated value influences the static array table size, and a larger
1219 table should decrease the time required for an unsuccessful search, at
1220 the expense of extra table space.
1221
1222 The default value is 1, thus the default maximum associated value about
1223 the same size as the number of keywords (for efficiency, the maximum
1224 associated value is always rounded up to a power of 2).  The actual
1225 table size may vary somewhat, since this technique is essentially a
1226 heuristic.
1227 @end table
1228
1229 @node Verbosity,  , Algorithmic Details, Options
1230 @section Informative Output
1231
1232 @table @samp
1233 @item -h
1234 @itemx --help
1235 Prints a short summary on the meaning of each program option.  Aborts
1236 further program execution.
1237
1238 @item -v
1239 @itemx --version
1240 Prints out the current version number.
1241
1242 @item -d
1243 @itemx --debug
1244 Enables the debugging option.  This produces verbose diagnostics to
1245 ``standard error'' when @code{gperf} is executing.  It is useful both for
1246 maintaining the program and for determining whether a given set of
1247 options is actually speeding up the search for a solution.  Some useful
1248 information is dumped at the end of the program when the @samp{-d}
1249 option is enabled.
1250 @end table
1251
1252 @node Bugs, Projects, Options, Top
1253 @chapter Known Bugs and Limitations with @code{gperf}
1254
1255 The following are some limitations with the current release of
1256 @code{gperf}:
1257
1258 @itemize @bullet
1259 @item
1260 The @code{gperf} utility is tuned to execute quickly, and works quickly
1261 for small to medium size data sets (around 1000 keywords).  It is
1262 extremely useful for maintaining perfect hash functions for compiler
1263 keyword sets.  Several recent enhancements now enable @code{gperf} to
1264 work efficiently on much larger keyword sets (over 15,000 keywords).
1265 When processing large keyword sets it helps greatly to have over 8 megs
1266 of RAM.
1267
1268 @item
1269 The size of the generate static keyword array can get @emph{extremely}
1270 large if the input keyword file is large or if the keywords are quite
1271 similar.  This tends to slow down the compilation of the generated C
1272 code, and @emph{greatly} inflates the object code size.  If this
1273 situation occurs, consider using the @samp{-S} option to reduce data
1274 size, potentially increasing keyword recognition time a negligible
1275 amount.  Since many C compilers cannot correctly generate code for
1276 large switch statements it is important to qualify the @var{-S} option
1277 with an appropriate numerical argument that controls the number of
1278 switch statements generated.
1279
1280 @item
1281 The maximum number of selected byte positions has an
1282 arbitrary limit of 255.  This restriction should be removed, and if
1283 anyone considers this a problem write me and let me know so I can remove
1284 the constraint.
1285 @end itemize
1286
1287 @node Projects, Bibliography, Bugs, Top
1288 @chapter Things Still Left to Do
1289
1290 It should be ``relatively'' easy to replace the current perfect hash
1291 function algorithm with a more exhaustive approach; the perfect hash
1292 module is essential independent from other program modules.  Additional
1293 worthwhile improvements include:
1294
1295 @itemize @bullet
1296 @item
1297 Another useful extension involves modifying the program to generate
1298 ``minimal'' perfect hash functions (under certain circumstances, the
1299 current version can be rather extravagant in the generated table size).
1300 This is mostly of theoretical interest, since a sparse table
1301 often produces faster lookups, and use of the @samp{-S} @code{switch}
1302 option can minimize the data size, at the expense of slightly longer
1303 lookups (note that the gcc compiler generally produces good code for
1304 @code{switch} statements, reducing the need for more complex schemes).
1305
1306 @item
1307 In addition to improving the algorithm, it would also be useful to
1308 generate an Ada package as the code output, in addition to the current
1309 C and C++ routines.
1310 @end itemize
1311
1312 @page
1313
1314 @node Bibliography, Concept Index, Projects, Top
1315 @chapter Bibliography
1316
1317 [1] Chang, C.C.: @i{A Scheme for Constructing Ordered Minimal Perfect
1318 Hashing Functions} Information Sciences 39(1986), 187-195.
1319
1320 [2] Cichelli, Richard J. @i{Author's Response to ``On Cichelli's Minimal Perfect Hash
1321 Functions Method''} Communications of the ACM, 23, 12(December 1980), 729.
1322
1323 [3] Cichelli, Richard J. @i{Minimal Perfect Hash Functions Made Simple}
1324 Communications of the ACM, 23, 1(January 1980), 17-19.
1325
1326 [4] Cook, C. R. and Oldehoeft, R.R. @i{A Letter Oriented Minimal
1327 Perfect Hashing Function} SIGPLAN Notices, 17, 9(September 1982), 18-27.
1328
1329 [5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M.
1330 @i{Practical Perfect Hashing} Computer Journal, 28, 1(January 1985), 54-58.
1331
1332 [6] Jaeschke, G. @i{Reciprocal Hashing: A Method for Generating Minimal
1333 Perfect Hashing Functions} Communications of the ACM, 24, 12(December
1334 1981), 829-833.
1335
1336 [7] Jaeschke, G. and Osterburg, G. @i{On Cichelli's Minimal Perfect
1337 Hash Functions Method} Communications of the ACM, 23, 12(December 1980),
1338 728-729.
1339
1340 [8] Sager, Thomas J. @i{A Polynomial Time Generator for Minimal Perfect
1341 Hash Functions} Communications of the ACM, 28, 5(December 1985), 523-532
1342
1343 [9] Schmidt, Douglas C. @i{GPERF: A Perfect Hash Function Generator}
1344 Second USENIX C++ Conference Proceedings, April 1990.
1345
1346 [10] Schmidt, Douglas C. @i{GPERF: A Perfect Hash Function Generator}
1347 C++ Report, SIGS 10 10 (November/December 1998).
1348
1349 [11] Sebesta, R.W. and Taylor, M.A. @i{Minimal Perfect Hash Functions
1350 for Reserved Word Lists}  SIGPLAN Notices, 20, 12(September 1985), 47-53.
1351
1352 [12] Sprugnoli, R. @i{Perfect Hashing Functions: A Single Probe
1353 Retrieving Method for Static Sets} Communications of the ACM, 20
1354 11(November 1977), 841-850.
1355
1356 [13] Stallman, Richard M. @i{Using and Porting GNU CC} Free Software Foundation,
1357 1988.
1358
1359 [14] Stroustrup, Bjarne @i{The C++ Programming Language.} Addison-Wesley, 1986.
1360
1361 [15] Tiemann, Michael D. @i{User's Guide to GNU C++} Free Software
1362 Foundation, 1989.
1363
1364 @node Concept Index,  , Bibliography, Top
1365 @unnumbered Concept Index
1366
1367 @printindex cp
1368
1369 @contents
1370 @bye