3 @setfilename regex.info
7 @c \\{fill-paragraph} works better (for me, anyway) if the text in the
8 @c source file isn't indented.
11 @c Define a new index for our magic constants.
14 @c Put everything in one index (arbitrarily chosen to be the concept index).
21 @c Here is what we use in the Info `dir' file:
22 @c * Regex: (regex). Regular expression library.
26 This file documents the GNU regular expression library.
28 Copyright (C) 1992, 1993 Free Software Foundation, Inc.
30 Permission is granted to make and distribute verbatim copies of this
31 manual provided the copyright notice and this permission notice are
32 preserved on all copies.
35 Permission is granted to process this file through TeX and print the
36 results, provided the printed document carries a copying permission
37 notice identical to this one except for the removal of this paragraph
38 (this paragraph not being relevant to the printed manual).
41 Permission is granted to copy and distribute modified versions of this
42 manual under the conditions for verbatim copying, provided also that the
43 section entitled ``GNU General Public License'' is included exactly as
44 in the original, and provided that the entire resulting derived work is
45 distributed under the terms of a permission notice identical to this one.
47 Permission is granted to copy and distribute translations of this manual
48 into another language, under the above conditions for modified versions,
49 except that the section entitled ``GNU General Public License'' may be
50 included in a translation approved by the Free Software Foundation
51 instead of in the original English.
58 @subtitle edition 0.12a
59 @subtitle 19 September 1992
60 @author Kathryn A. Hargreaves
65 @vskip 0pt plus 1filll
66 Copyright @copyright{} 1992 Free Software Foundation.
68 Permission is granted to make and distribute verbatim copies of this
69 manual provided the copyright notice and this permission notice are
70 preserved on all copies.
72 Permission is granted to copy and distribute modified versions of this
73 manual under the conditions for verbatim copying, provided also that the
74 section entitled ``GNU General Public License'' is included exactly as
75 in the original, and provided that the entire resulting derived work is
76 distributed under the terms of a permission notice identical to this
79 Permission is granted to copy and distribute translations of this manual
80 into another language, under the above conditions for modified versions,
81 except that the section entitled ``GNU General Public License'' may be
82 included in a translation approved by the Free Software Foundation
83 instead of in the original English.
89 @node Top, Overview, (dir), (dir)
90 @top Regular Expression Library
92 This manual documents how to program with the GNU regular expression
93 library. This is edition 0.12a of the manual, 19 September 1992.
95 The first part of this master menu lists the major nodes in this Info
96 document, including the index. The rest of the menu lists all the
97 lower level nodes in the document.
101 * Regular Expression Syntax::
104 * GNU Emacs Operators::
105 * What Gets Matched?::
106 * Programming with Regex::
107 * Copying:: Copying and sharing Regex.
108 * Index:: General index.
109 --- The Detailed Node Listing ---
111 Regular Expression Syntax
114 * Predefined Syntaxes::
115 * Collating Elements vs. Characters::
116 * The Backslash Character::
120 * Match-self Operator:: Ordinary characters.
121 * Match-any-character Operator:: .
122 * Concatenation Operator:: Juxtaposition.
123 * Repetition Operators:: * + ? @{@}
124 * Alternation Operator:: |
125 * List Operators:: [...] [^...]
126 * Grouping Operators:: (...)
127 * Back-reference Operator:: \digit
128 * Anchoring Operators:: ^ $
132 * Match-zero-or-more Operator:: *
133 * Match-one-or-more Operator:: +
134 * Match-zero-or-one Operator:: ?
135 * Interval Operators:: @{@}
137 List Operators (@code{[} @dots{} @code{]} and @code{[^} @dots{} @code{]})
139 * Character Class Operators:: [:class:]
140 * Range Operator:: start-end
144 * Match-beginning-of-line Operator:: ^
145 * Match-end-of-line Operator:: $
154 * Non-Emacs Syntax Tables::
155 * Match-word-boundary Operator:: \b
156 * Match-within-word Operator:: \B
157 * Match-beginning-of-word Operator:: \<
158 * Match-end-of-word Operator:: \>
159 * Match-word-constituent Operator:: \w
160 * Match-non-word-constituent Operator:: \W
164 * Match-beginning-of-buffer Operator:: \`
165 * Match-end-of-buffer Operator:: \'
169 * Syntactic Class Operators::
171 Syntactic Class Operators
173 * Emacs Syntax Tables::
174 * Match-syntactic-class Operator:: \sCLASS
175 * Match-not-syntactic-class Operator:: \SCLASS
177 Programming with Regex
179 * GNU Regex Functions::
180 * POSIX Regex Functions::
181 * BSD Regex Functions::
185 * GNU Pattern Buffers:: The re_pattern_buffer type.
186 * GNU Regular Expression Compiling:: re_compile_pattern ()
187 * GNU Matching:: re_match ()
188 * GNU Searching:: re_search ()
189 * Matching/Searching with Split Data:: re_match_2 (), re_search_2 ()
190 * Searching with Fastmaps:: re_compile_fastmap ()
191 * GNU Translate Tables:: The `translate' field.
192 * Using Registers:: The re_registers type and related fns.
193 * Freeing GNU Pattern Buffers:: regfree ()
195 POSIX Regex Functions
197 * POSIX Pattern Buffers:: The regex_t type.
198 * POSIX Regular Expression Compiling:: regcomp ()
199 * POSIX Matching:: regexec ()
200 * Reporting Errors:: regerror ()
201 * Using Byte Offsets:: The regmatch_t type.
202 * Freeing POSIX Pattern Buffers:: regfree ()
206 * BSD Regular Expression Compiling:: re_comp ()
207 * BSD Searching:: re_exec ()
210 @node Overview, Regular Expression Syntax, Top, Top
213 A @dfn{regular expression} (or @dfn{regexp}, or @dfn{pattern}) is a text
214 string that describes some (mathematical) set of strings. A regexp
215 @var{r} @dfn{matches} a string @var{s} if @var{s} is in the set of
216 strings described by @var{r}.
218 Using the Regex library, you can:
223 see if a string matches a specified pattern as a whole, and
226 search within a string for a substring matching a specified pattern.
230 Some regular expressions match only one string, i.e., the set they
231 describe has only one member. For example, the regular expression
232 @samp{foo} matches the string @samp{foo} and no others. Other regular
233 expressions match more than one string, i.e., the set they describe has
234 more than one member. For example, the regular expression @samp{f*}
235 matches the set of strings made up of any number (including zero) of
236 @samp{f}s. As you can see, some characters in regular expressions match
237 themselves (such as @samp{f}) and some don't (such as @samp{*}); the
238 ones that don't match themselves instead let you specify patterns that
239 describe many different strings.
241 To either match or search for a regular expression with the Regex
242 library functions, you must first compile it with a Regex pattern
243 compiling function. A @dfn{compiled pattern} is a regular expression
244 converted to the internal format used by the library functions. Once
245 you've compiled a pattern, you can use it for matching or searching any
248 The Regex library consists of two source files: @file{regex.h} and
252 Regex provides three groups of functions with which you can operate on
253 regular expressions. One group---the @sc{gnu} group---is more powerful
254 but not completely compatible with the other two, namely the @sc{posix}
255 and Berkeley @sc{unix} groups; its interface was designed specifically
256 for @sc{gnu}. The other groups have the same interfaces as do the
257 regular expression functions in @sc{posix} and Berkeley
260 We wrote this chapter with programmers in mind, not users of
261 programs---such as Emacs---that use Regex. We describe the Regex
262 library in its entirety, not how to write regular expressions that a
263 particular program understands.
266 @node Regular Expression Syntax, Common Operators, Overview, Top
267 @chapter Regular Expression Syntax
269 @cindex regular expressions, syntax of
270 @cindex syntax of regular expressions
272 @dfn{Characters} are things you can type. @dfn{Operators} are things in
273 a regular expression that match one or more characters. You compose
274 regular expressions from operators, which in turn you specify using one
277 Most characters represent what we call the match-self operator, i.e.,
278 they match themselves; we call these characters @dfn{ordinary}. Other
279 characters represent either all or parts of fancier operators; e.g.,
280 @samp{.} represents what we call the match-any-character operator
281 (which, no surprise, matches (almost) any character); we call these
282 characters @dfn{special}. Two different things determine what
283 characters represent what operators:
287 the regular expression syntax your program has told the Regex library to
291 the context of the character in the regular expression.
294 In the following sections, we describe these things in more detail.
298 * Predefined Syntaxes::
299 * Collating Elements vs. Characters::
300 * The Backslash Character::
304 @node Syntax Bits, Predefined Syntaxes, , Regular Expression Syntax
309 In any particular syntax for regular expressions, some characters are
310 always special, others are sometimes special, and others are never
311 special. The particular syntax that Regex recognizes for a given
312 regular expression depends on the value in the @code{syntax} field of
313 the pattern buffer of that regular expression.
315 You get a pattern buffer by compiling a regular expression. @xref{GNU
316 Pattern Buffers}, and @ref{POSIX Pattern Buffers}, for more information
317 on pattern buffers. @xref{GNU Regular Expression Compiling}, @ref{POSIX
318 Regular Expression Compiling}, and @ref{BSD Regular Expression
319 Compiling}, for more information on compiling.
321 Regex considers the value of the @code{syntax} field to be a collection
322 of bits; we refer to these bits as @dfn{syntax bits}. In most cases,
323 they affect what characters represent what operators. We describe the
324 meanings of the operators to which we refer in @ref{Common Operators},
325 @ref{GNU Operators}, and @ref{GNU Emacs Operators}.
327 For reference, here is the complete list of syntax bits, in alphabetical
332 @cnindex RE_BACKSLASH_ESCAPE_IN_LIST
333 @item RE_BACKSLASH_ESCAPE_IN_LISTS
334 If this bit is set, then @samp{\} inside a list (@pxref{List Operators}
335 quotes (makes ordinary, if it's special) the following character; if
336 this bit isn't set, then @samp{\} is an ordinary character inside lists.
337 (@xref{The Backslash Character}, for what `\' does outside of lists.)
339 @cnindex RE_BK_PLUS_QM
341 If this bit is set, then @samp{\+} represents the match-one-or-more
342 operator and @samp{\?} represents the match-zero-or-more operator; if
343 this bit isn't set, then @samp{+} represents the match-one-or-more
344 operator and @samp{?} represents the match-zero-or-one operator. This
345 bit is irrelevant if @code{RE_LIMITED_OPS} is set.
347 @cnindex RE_CHAR_CLASSES
348 @item RE_CHAR_CLASSES
349 If this bit is set, then you can use character classes in lists; if this
350 bit isn't set, then you can't.
352 @cnindex RE_CONTEXT_INDEP_ANCHORS
353 @item RE_CONTEXT_INDEP_ANCHORS
354 If this bit is set, then @samp{^} and @samp{$} are special anywhere outside
355 a list; if this bit isn't set, then these characters are special only in
356 certain contexts. @xref{Match-beginning-of-line Operator}, and
357 @ref{Match-end-of-line Operator}.
359 @cnindex RE_CONTEXT_INDEP_OPS
360 @item RE_CONTEXT_INDEP_OPS
361 If this bit is set, then certain characters are special anywhere outside
362 a list; if this bit isn't set, then those characters are special only in
363 some contexts and are ordinary elsewhere. Specifically, if this bit
364 isn't set then @samp{*}, and (if the syntax bit @code{RE_LIMITED_OPS}
365 isn't set) @samp{+} and @samp{?} (or @samp{\+} and @samp{\?}, depending
366 on the syntax bit @code{RE_BK_PLUS_QM}) represent repetition operators
367 only if they're not first in a regular expression or just after an
368 open-group or alternation operator. The same holds for @samp{@{} (or
369 @samp{\@{}, depending on the syntax bit @code{RE_NO_BK_BRACES}) if
370 it is the beginning of a valid interval and the syntax bit
371 @code{RE_INTERVALS} is set.
373 @cnindex RE_CONTEXT_INVALID_OPS
374 @item RE_CONTEXT_INVALID_OPS
375 If this bit is set, then repetition and alternation operators can't be
376 in certain positions within a regular expression. Specifically, the
377 regular expression is invalid if it has:
382 a repetition operator first in the regular expression or just after a
383 match-beginning-of-line, open-group, or alternation operator; or
386 an alternation operator first or last in the regular expression, just
387 before a match-end-of-line operator, or just after an alternation or
392 If this bit isn't set, then you can put the characters representing the
393 repetition and alternation characters anywhere in a regular expression.
394 Whether or not they will in fact be operators in certain positions
395 depends on other syntax bits.
397 @cnindex RE_DOT_NEWLINE
399 If this bit is set, then the match-any-character operator matches
400 a newline; if this bit isn't set, then it doesn't.
402 @cnindex RE_DOT_NOT_NULL
403 @item RE_DOT_NOT_NULL
404 If this bit is set, then the match-any-character operator doesn't match
405 a null character; if this bit isn't set, then it does.
407 @cnindex RE_INTERVALS
409 If this bit is set, then Regex recognizes interval operators; if this bit
410 isn't set, then it doesn't.
412 @cnindex RE_LIMITED_OPS
414 If this bit is set, then Regex doesn't recognize the match-one-or-more,
415 match-zero-or-one or alternation operators; if this bit isn't set, then
418 @cnindex RE_NEWLINE_ALT
420 If this bit is set, then newline represents the alternation operator; if
421 this bit isn't set, then newline is ordinary.
423 @cnindex RE_NO_BK_BRACES
424 @item RE_NO_BK_BRACES
425 If this bit is set, then @samp{@{} represents the open-interval operator
426 and @samp{@}} represents the close-interval operator; if this bit isn't
427 set, then @samp{\@{} represents the open-interval operator and
428 @samp{\@}} represents the close-interval operator. This bit is relevant
429 only if @code{RE_INTERVALS} is set.
431 @cnindex RE_NO_BK_PARENS
432 @item RE_NO_BK_PARENS
433 If this bit is set, then @samp{(} represents the open-group operator and
434 @samp{)} represents the close-group operator; if this bit isn't set, then
435 @samp{\(} represents the open-group operator and @samp{\)} represents
436 the close-group operator.
438 @cnindex RE_NO_BK_REFS
440 If this bit is set, then Regex doesn't recognize @samp{\}@var{digit} as
441 the back reference operator; if this bit isn't set, then it does.
443 @cnindex RE_NO_BK_VBAR
445 If this bit is set, then @samp{|} represents the alternation operator;
446 if this bit isn't set, then @samp{\|} represents the alternation
447 operator. This bit is irrelevant if @code{RE_LIMITED_OPS} is set.
449 @cnindex RE_NO_EMPTY_RANGES
450 @item RE_NO_EMPTY_RANGES
451 If this bit is set, then a regular expression with a range whose ending
452 point collates lower than its starting point is invalid; if this bit
453 isn't set, then Regex considers such a range to be empty.
455 @cnindex RE_UNMATCHED_RIGHT_PAREN_ORD
456 @item RE_UNMATCHED_RIGHT_PAREN_ORD
457 If this bit is set and the regular expression has no matching open-group
458 operator, then Regex considers what would otherwise be a close-group
459 operator (based on how @code{RE_NO_BK_PARENS} is set) to match @samp{)}.
464 @node Predefined Syntaxes, Collating Elements vs. Characters, Syntax Bits, Regular Expression Syntax
465 @section Predefined Syntaxes
467 If you're programming with Regex, you can set a pattern buffer's
468 (@pxref{GNU Pattern Buffers}, and @ref{POSIX Pattern Buffers})
469 @code{syntax} field either to an arbitrary combination of syntax bits
470 (@pxref{Syntax Bits}) or else to the configurations defined by Regex.
471 These configurations define the syntaxes used by certain
472 programs---@sc{gnu} Emacs,
481 Egrep---in addition to syntaxes for @sc{posix} basic and extended
484 The predefined syntaxes--taken directly from @file{regex.h}---are:
490 @node Collating Elements vs. Characters, The Backslash Character, Predefined Syntaxes, Regular Expression Syntax
491 @section Collating Elements vs.@: Characters
493 @sc{posix} generalizes the notion of a character to that of a
494 collating element. It defines a @dfn{collating element} to be ``a
495 sequence of one or more bytes defined in the current collating sequence
496 as a unit of collation.''
498 This generalizes the notion of a character in
499 two ways. First, a single character can map into two or more collating
500 elements. For example, the German
507 collates as the collating element @samp{s} followed by another collating
508 element @samp{s}. Second, two or more characters can map into one
509 collating element. For example, the Spanish @samp{ll} collates after
510 @samp{l} and before @samp{m}.
512 Since @sc{posix}'s ``collating element'' preserves the essential idea of
513 a ``character,'' we use the latter, more familiar, term in this document.
515 @node The Backslash Character, , Collating Elements vs. Characters, Regular Expression Syntax
516 @section The Backslash Character
519 The @samp{\} character has one of four different meanings, depending on
520 the context in which you use it and what syntax bits are set
521 (@pxref{Syntax Bits}). It can: 1) stand for itself, 2) quote the next
522 character, 3) introduce an operator, or 4) do nothing.
526 It stands for itself inside a list
527 (@pxref{List Operators}) if the syntax bit
528 @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is not set. For example, @samp{[\]}
529 would match @samp{\}.
532 It quotes (makes ordinary, if it's special) the next character when you
537 outside a list,@footnote{Sometimes
538 you don't have to explicitly quote special characters to make
539 them ordinary. For instance, most characters lose any special meaning
540 inside a list (@pxref{List Operators}). In addition, if the syntax bits
541 @code{RE_CONTEXT_INVALID_OPS} and @code{RE_CONTEXT_INDEP_OPS}
542 aren't set, then (for historical reasons) the matcher considers special
543 characters ordinary if they are in contexts where the operations they
544 represent make no sense; for example, then the match-zero-or-more
545 operator (represented by @samp{*}) matches itself in the regular
546 expression @samp{*foo} because there is no preceding expression on which
547 it can operate. It is poor practice, however, to depend on this
548 behavior; if you want a special character to be ordinary outside a list,
549 it's better to always quote it, regardless.} or
552 inside a list and the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is set.
557 It introduces an operator when followed by certain ordinary
558 characters---sometimes only when certain syntax bits are set. See the
559 cases @code{RE_BK_PLUS_QM}, @code{RE_NO_BK_BRACES}, @code{RE_NO_BK_VAR},
560 @code{RE_NO_BK_PARENS}, @code{RE_NO_BK_REF} in @ref{Syntax Bits}. Also:
564 @samp{\b} represents the match-word-boundary operator
565 (@pxref{Match-word-boundary Operator}).
568 @samp{\B} represents the match-within-word operator
569 (@pxref{Match-within-word Operator}).
572 @samp{\<} represents the match-beginning-of-word operator @*
573 (@pxref{Match-beginning-of-word Operator}).
576 @samp{\>} represents the match-end-of-word operator
577 (@pxref{Match-end-of-word Operator}).
580 @samp{\w} represents the match-word-constituent operator
581 (@pxref{Match-word-constituent Operator}).
584 @samp{\W} represents the match-non-word-constituent operator
585 (@pxref{Match-non-word-constituent Operator}).
588 @samp{\`} represents the match-beginning-of-buffer
589 operator and @samp{\'} represents the match-end-of-buffer operator
590 (@pxref{Buffer Operators}).
593 If Regex was compiled with the C preprocessor symbol @code{emacs}
594 defined, then @samp{\s@var{class}} represents the match-syntactic-class
595 operator and @samp{\S@var{class}} represents the
596 match-not-syntactic-class operator (@pxref{Syntactic Class Operators}).
601 In all other cases, Regex ignores @samp{\}. For example,
602 @samp{\n} matches @samp{n}.
606 @node Common Operators, GNU Operators, Regular Expression Syntax, Top
607 @chapter Common Operators
609 You compose regular expressions from operators. In the following
610 sections, we describe the regular expression operators specified by
611 @sc{posix}; @sc{gnu} also uses these. Most operators have more than one
612 representation as characters. @xref{Regular Expression Syntax}, for
613 what characters represent what operators under what circumstances.
615 For most operators that can be represented in two ways, one
616 representation is a single character and the other is that character
617 preceded by @samp{\}. For example, either @samp{(} or @samp{\(}
618 represents the open-group operator. Which one does depends on the
619 setting of a syntax bit, in this case @code{RE_NO_BK_PARENS}. Why is
620 this so? Historical reasons dictate some of the varying
621 representations, while @sc{posix} dictates others.
623 Finally, almost all characters lose any special meaning inside a list
624 (@pxref{List Operators}).
627 * Match-self Operator:: Ordinary characters.
628 * Match-any-character Operator:: .
629 * Concatenation Operator:: Juxtaposition.
630 * Repetition Operators:: * + ? @{@}
631 * Alternation Operator:: |
632 * List Operators:: [...] [^...]
633 * Grouping Operators:: (...)
634 * Back-reference Operator:: \digit
635 * Anchoring Operators:: ^ $
638 @node Match-self Operator, Match-any-character Operator, , Common Operators
639 @section The Match-self Operator (@var{ordinary character})
641 This operator matches the character itself. All ordinary characters
642 (@pxref{Regular Expression Syntax}) represent this operator. For
643 example, @samp{f} is always an ordinary character, so the regular
644 expression @samp{f} matches only the string @samp{f}. In
645 particular, it does @emph{not} match the string @samp{ff}.
647 @node Match-any-character Operator, Concatenation Operator, Match-self Operator, Common Operators
648 @section The Match-any-character Operator (@code{.})
652 This operator matches any single printing or nonprinting character
653 except it won't match a:
657 if the syntax bit @code{RE_DOT_NEWLINE} isn't set.
660 if the syntax bit @code{RE_DOT_NOT_NULL} is set.
664 The @samp{.} (period) character represents this operator. For example,
665 @samp{a.b} matches any three-character string beginning with @samp{a}
666 and ending with @samp{b}.
668 @node Concatenation Operator, Repetition Operators, Match-any-character Operator, Common Operators
669 @section The Concatenation Operator
671 This operator concatenates two regular expressions @var{a} and @var{b}.
672 No character represents this operator; you simply put @var{b} after
673 @var{a}. The result is a regular expression that will match a string if
674 @var{a} matches its first part and @var{b} matches the rest. For
675 example, @samp{xy} (two match-self operators) matches @samp{xy}.
677 @node Repetition Operators, Alternation Operator, Concatenation Operator, Common Operators
678 @section Repetition Operators
680 Repetition operators repeat the preceding regular expression a specified
684 * Match-zero-or-more Operator:: *
685 * Match-one-or-more Operator:: +
686 * Match-zero-or-one Operator:: ?
687 * Interval Operators:: @{@}
690 @node Match-zero-or-more Operator, Match-one-or-more Operator, , Repetition Operators
691 @subsection The Match-zero-or-more Operator (@code{*})
695 This operator repeats the smallest possible preceding regular expression
696 as many times as necessary (including zero) to match the pattern.
697 @samp{*} represents this operator. For example, @samp{o*}
698 matches any string made up of zero or more @samp{o}s. Since this
699 operator operates on the smallest preceding regular expression,
700 @samp{fo*} has a repeating @samp{o}, not a repeating @samp{fo}. So,
701 @samp{fo*} matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
703 Since the match-zero-or-more operator is a suffix operator, it may be
704 useless as such when no regular expression precedes it. This is the
709 is first in a regular expression, or
712 follows a match-beginning-of-line, open-group, or alternation
718 Three different things can happen in these cases:
722 If the syntax bit @code{RE_CONTEXT_INVALID_OPS} is set, then the
723 regular expression is invalid.
726 If @code{RE_CONTEXT_INVALID_OPS} isn't set, but
727 @code{RE_CONTEXT_INDEP_OPS} is, then @samp{*} represents the
728 match-zero-or-more operator (which then operates on the empty string).
731 Otherwise, @samp{*} is ordinary.
736 The matcher processes a match-zero-or-more operator by first matching as
737 many repetitions of the smallest preceding regular expression as it can.
738 Then it continues to match the rest of the pattern.
740 If it can't match the rest of the pattern, it backtracks (as many times
741 as necessary), each time discarding one of the matches until it can
742 either match the entire pattern or be certain that it cannot get a
743 match. For example, when matching @samp{ca*ar} against @samp{caaar},
744 the matcher first matches all three @samp{a}s of the string with the
745 @samp{a*} of the regular expression. However, it cannot then match the
746 final @samp{ar} of the regular expression against the final @samp{r} of
747 the string. So it backtracks, discarding the match of the last @samp{a}
748 in the string. It can then match the remaining @samp{ar}.
751 @node Match-one-or-more Operator, Match-zero-or-one Operator, Match-zero-or-more Operator, Repetition Operators
752 @subsection The Match-one-or-more Operator (@code{+} or @code{\+})
756 If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't recognize
757 this operator. Otherwise, if the syntax bit @code{RE_BK_PLUS_QM} isn't
758 set, then @samp{+} represents this operator; if it is, then @samp{\+}
761 This operator is similar to the match-zero-or-more operator except that
762 it repeats the preceding regular expression at least once;
763 @pxref{Match-zero-or-more Operator}, for what it operates on, how some
764 syntax bits affect it, and how Regex backtracks to match it.
766 For example, supposing that @samp{+} represents the match-one-or-more
767 operator; then @samp{ca+r} matches, e.g., @samp{car} and
768 @samp{caaaar}, but not @samp{cr}.
770 @node Match-zero-or-one Operator, Interval Operators, Match-one-or-more Operator, Repetition Operators
771 @subsection The Match-zero-or-one Operator (@code{?} or @code{\?})
774 If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't
775 recognize this operator. Otherwise, if the syntax bit
776 @code{RE_BK_PLUS_QM} isn't set, then @samp{?} represents this operator;
777 if it is, then @samp{\?} does.
779 This operator is similar to the match-zero-or-more operator except that
780 it repeats the preceding regular expression once or not at all;
781 @pxref{Match-zero-or-more Operator}, to see what it operates on, how
782 some syntax bits affect it, and how Regex backtracks to match it.
784 For example, supposing that @samp{?} represents the match-zero-or-one
785 operator; then @samp{ca?r} matches both @samp{car} and @samp{cr}, but
788 @node Interval Operators, , Match-zero-or-one Operator, Repetition Operators
789 @subsection Interval Operators (@code{@{} @dots{} @code{@}} or @code{\@{} @dots{} @code{\@}})
791 @cindex interval expression
797 If the syntax bit @code{RE_INTERVALS} is set, then Regex recognizes
798 @dfn{interval expressions}. They repeat the smallest possible preceding
799 regular expression a specified number of times.
801 If the syntax bit @code{RE_NO_BK_BRACES} is set, @samp{@{} represents
802 the @dfn{open-interval operator} and @samp{@}} represents the
803 @dfn{close-interval operator} ; otherwise, @samp{\@{} and @samp{\@}} do.
805 Specifically, supposing that @samp{@{} and @samp{@}} represent the
806 open-interval and close-interval operators; then:
809 @item @{@var{count}@}
810 matches exactly @var{count} occurrences of the preceding regular
814 matches @var{min} or more occurrences of the preceding regular
817 @item @{@var{min, max}@}
818 matches at least @var{min} but no more than @var{max} occurrences of
819 the preceding regular expression.
823 The interval expression (but not necessarily the regular expression that
824 contains it) is invalid if:
828 @var{min} is greater than @var{max}, or
831 any of @var{count}, @var{min}, or @var{max} are outside the range
832 zero to @code{RE_DUP_MAX} (which symbol @file{regex.h}
837 If the interval expression is invalid and the syntax bit
838 @code{RE_NO_BK_BRACES} is set, then Regex considers all the
839 characters in the would-be interval to be ordinary. If that bit
840 isn't set, then the regular expression is invalid.
842 If the interval expression is valid but there is no preceding regular
843 expression on which to operate, then if the syntax bit
844 @code{RE_CONTEXT_INVALID_OPS} is set, the regular expression is invalid.
845 If that bit isn't set, then Regex considers all the characters---other
846 than backslashes, which it ignores---in the would-be interval to be
850 @node Alternation Operator, List Operators, Repetition Operators, Common Operators
851 @section The Alternation Operator (@code{|} or @code{\|})
855 @cindex alternation operator
858 If the syntax bit @code{RE_LIMITED_OPS} is set, then Regex doesn't
859 recognize this operator. Otherwise, if the syntax bit
860 @code{RE_NO_BK_VBAR} is set, then @samp{|} represents this operator;
861 otherwise, @samp{\|} does.
863 Alternatives match one of a choice of regular expressions:
864 if you put the character(s) representing the alternation operator between
865 any two regular expressions @var{a} and @var{b}, the result matches
866 the union of the strings that @var{a} and @var{b} match. For
867 example, supposing that @samp{|} is the alternation operator, then
868 @samp{foo|bar|quux} would match any of @samp{foo}, @samp{bar} or
872 @c Nobody needs to disallow empty alternatives any more.
873 If the syntax bit @code{RE_NO_EMPTY_ALTS} is set, then if either of the regular
874 expressions @var{a} or @var{b} is empty, the
875 regular expression is invalid. More precisely, if this syntax bit is
876 set, then the alternation operator can't:
880 be first or last in a regular expression;
883 follow either another alternation operator or an open-group operator
884 (@pxref{Grouping Operators}); or
887 precede a close-group operator.
892 For example, supposing @samp{(} and @samp{)} represent the open and
893 close-group operators, then @samp{|foo}, @samp{foo|}, @samp{foo||bar},
894 @samp{foo(|bar)}, and @samp{(foo|)bar} would all be invalid.
897 The alternation operator operates on the @emph{largest} possible
898 surrounding regular expressions. (Put another way, it has the lowest
899 precedence of any regular expression operator.)
900 Thus, the only way you can
901 delimit its arguments is to use grouping. For example, if @samp{(} and
902 @samp{)} are the open and close-group operators, then @samp{fo(o|b)ar}
903 would match either @samp{fooar} or @samp{fobar}. (@samp{foo|bar} would
904 match @samp{foo} or @samp{bar}.)
907 The matcher usually tries all combinations of alternatives so as to
908 match the longest possible string. For example, when matching
909 @samp{(fooq|foo)*(qbarquux|bar)} against @samp{fooqbarquux}, it cannot
910 take, say, the first (``depth-first'') combination it could match, since
911 then it would be content to match just @samp{fooqbar}.
913 @comment xx something about leftmost-longest
916 @node List Operators, Grouping Operators, Alternation Operator, Common Operators
917 @section List Operators (@code{[} @dots{} @code{]} and @code{[^} @dots{} @code{]})
919 @cindex matching list
926 @cindex nonmatching list
927 @cindex matching newline
928 @cindex bracket expression
930 @dfn{Lists}, also called @dfn{bracket expressions}, are a set of one or
931 more items. An @dfn{item} is a character,
933 (These get added when they get implemented.)
934 a collating symbol, an equivalence class expression,
936 a character class expression, or a range expression. The syntax bits
937 affect which kinds of items you can put in a list. We explain the last
938 two items in subsections below. Empty lists are invalid.
940 A @dfn{matching list} matches a single character represented by one of
941 the list items. You form a matching list by enclosing one or more items
942 within an @dfn{open-matching-list operator} (represented by @samp{[})
943 and a @dfn{close-list operator} (represented by @samp{]}).
945 For example, @samp{[ab]} matches either @samp{a} or @samp{b}.
946 @samp{[ad]*} matches the empty string and any string composed of just
947 @samp{a}s and @samp{d}s in any order. Regex considers invalid a regular
948 expression with a @samp{[} but no matching
951 @dfn{Nonmatching lists} are similar to matching lists except that they
952 match a single character @emph{not} represented by one of the list
953 items. You use an @dfn{open-nonmatching-list operator} (represented by
954 @samp{[^}@footnote{Regex therefore doesn't consider the @samp{^} to be
955 the first character in the list. If you put a @samp{^} character first
956 in (what you think is) a matching list, you'll turn it into a
957 nonmatching list.}) instead of an open-matching-list operator to start a
960 For example, @samp{[^ab]} matches any character except @samp{a} or
963 If the @code{posix_newline} field in the pattern buffer (@pxref{GNU
964 Pattern Buffers} is set, then nonmatching lists do not match a newline.
966 Most characters lose any special meaning inside a list. The special
967 characters inside a list follow.
971 ends the list if it's not the first list item. So, if you want to make
972 the @samp{]} character a list item, you must put it first.
975 quotes the next character if the syntax bit @code{RE_BACKSLASH_ESCAPE_IN_LISTS} is
979 Put these in if they get implemented.
982 represents the open-collating-symbol operator (@pxref{Collating Symbol
986 represents the close-collating-symbol operator.
989 represents the open-equivalence-class operator (@pxref{Equivalence Class
993 represents the close-equivalence-class operator.
998 represents the open-character-class operator (@pxref{Character Class
999 Operators}) if the syntax bit @code{RE_CHAR_CLASSES} is set and what
1000 follows is a valid character class expression.
1003 represents the close-character-class operator if the syntax bit
1004 @code{RE_CHAR_CLASSES} is set and what precedes it is an
1005 open-character-class operator followed by a valid character class name.
1008 represents the range operator (@pxref{Range Operator}) if it's
1009 not first or last in a list or the ending point of a range.
1014 All other characters are ordinary. For example, @samp{[.*]} matches
1015 @samp{.} and @samp{*}.
1018 * Character Class Operators:: [:class:]
1019 * Range Operator:: start-end
1023 (If collating symbols and equivalence class expressions get implemented,
1026 node Collating Symbol Operators
1027 subsubsection Collating Symbol Operators (@code{[.} @dots{} @code{.]})
1029 If the syntax bit @code{XX} is set, then you can represent
1030 collating symbols inside lists. You form a @dfn{collating symbol} by
1031 putting a collating element between an @dfn{open-collating-symbol
1032 operator} and an @dfn{close-collating-symbol operator}. @samp{[.}
1033 represents the open-collating-symbol operator and @samp{.]} represents
1034 the close-collating-symbol operator. For example, if @samp{ll} is a
1035 collating element, then @samp{[[.ll.]]} would match @samp{ll}.
1037 node Equivalence Class Operators
1038 subsubsection Equivalence Class Operators (@code{[=} @dots{} @code{=]})
1039 @cindex equivalence class expression in regex
1040 @cindex @samp{[=} in regex
1041 @cindex @samp{=]} in regex
1043 If the syntax bit @code{XX} is set, then Regex recognizes equivalence class
1044 expressions inside lists. A @dfn{equivalence class expression} is a set
1045 of collating elements which all belong to the same equivalence class.
1046 You form an equivalence class expression by putting a collating
1047 element between an @dfn{open-equivalence-class operator} and a
1048 @dfn{close-equivalence-class operator}. @samp{[=} represents the
1049 open-equivalence-class operator and @samp{=]} represents the
1050 close-equivalence-class operator. For example, if @samp{a} and @samp{A}
1051 were an equivalence class, then both @samp{[[=a=]]} and @samp{[[=A=]]}
1052 would match both @samp{a} and @samp{A}. If the collating element in an
1053 equivalence class expression isn't part of an equivalence class, then
1054 the matcher considers the equivalence class expression to be a collating
1059 @node Character Class Operators, Range Operator, , List Operators
1060 @subsection Character Class Operators (@code{[:} @dots{} @code{:]})
1062 @cindex character classes
1063 @cindex @samp{[:} in regex
1064 @cindex @samp{:]} in regex
1066 If the syntax bit @code{RE_CHARACTER_CLASSES} is set, then Regex
1067 recognizes character class expressions inside lists. A @dfn{character
1068 class expression} matches one character from a given class. You form a
1069 character class expression by putting a character class name between an
1070 @dfn{open-character-class operator} (represented by @samp{[:}) and a
1071 @dfn{close-character-class operator} (represented by @samp{:]}). The
1072 character class names and their meanings are:
1083 system-dependent; for @sc{gnu}, a space or tab
1086 control characters (in the @sc{ascii} encoding, code 0177 and codes
1093 same as @code{print} except omits space
1099 printable characters (in the @sc{ascii} encoding, space
1100 tilde---codes 040 through 0176)
1103 neither control nor alphanumeric characters
1106 space, carriage return, newline, vertical tab, and form feed
1112 hexadecimal digits: @code{0}--@code{9}, @code{a}--@code{f}, @code{A}--@code{F}
1117 These correspond to the definitions in the C library's @file{<ctype.h>}
1118 facility. For example, @samp{[:alpha:]} corresponds to the standard
1119 facility @code{isalpha}. Regex recognizes character class expressions
1120 only inside of lists; so @samp{[[:alpha:]]} matches any letter, but
1121 @samp{[:alpha:]} outside of a bracket expression and not followed by a
1122 repetition operator matches just itself.
1124 @node Range Operator, , Character Class Operators, List Operators
1125 @subsection The Range Operator (@code{-})
1127 Regex recognizes @dfn{range expressions} inside a list. They represent
1129 that fall between two elements in the current collating sequence. You
1130 form a range expression by putting a @dfn{range operator} between two
1132 (If these get implemented, then substitute this for ``characters.'')
1133 of any of the following: characters, collating elements, collating symbols,
1134 and equivalence class expressions. The starting point of the range and
1135 the ending point of the range don't have to be the same kind of item,
1136 e.g., the starting point could be a collating element and the ending
1137 point could be an equivalence class expression. If a range's ending
1138 point is an equivalence class, then all the collating elements in that
1139 class will be in the range.
1141 characters.@footnote{You can't use a character class for the starting
1142 or ending point of a range, since a character class is not a single
1143 character.} @samp{-} represents the range operator. For example,
1144 @samp{a-f} within a list represents all the characters from @samp{a}
1148 If the syntax bit @code{RE_NO_EMPTY_RANGES} is set, then if the range's
1149 ending point collates less than its starting point, the range (and the
1150 regular expression containing it) is invalid. For example, the regular
1151 expression @samp{[z-a]} would be invalid. If this bit isn't set, then
1152 Regex considers such a range to be empty.
1154 Since @samp{-} represents the range operator, if you want to make a
1155 @samp{-} character itself
1156 a list item, you must do one of the following:
1160 Put the @samp{-} either first or last in the list.
1163 Include a range whose starting point collates strictly lower than
1164 @samp{-} and whose ending point collates equal or higher. Unless a
1165 range is the first item in a list, a @samp{-} can't be its starting
1166 point, but @emph{can} be its ending point. That is because Regex
1167 considers @samp{-} to be the range operator unless it is preceded by
1168 another @samp{-}. For example, in the @sc{ascii} encoding, @samp{)},
1169 @samp{*}, @samp{+}, @samp{,}, @samp{-}, @samp{.}, and @samp{/} are
1170 contiguous characters in the collating sequence. You might think that
1171 @samp{[)-+--/]} has two ranges: @samp{)-+} and @samp{--/}. Rather, it
1172 has the ranges @samp{)-+} and @samp{+--}, plus the character @samp{/}, so
1173 it matches, e.g., @samp{,}, not @samp{.}.
1176 Put a range whose starting point is @samp{-} first in the list.
1180 For example, @samp{[-a-z]} matches a lowercase letter or a hyphen (in
1181 English, in @sc{ascii}).
1184 @node Grouping Operators, Back-reference Operator, List Operators, Common Operators
1185 @section Grouping Operators (@code{(} @dots{} @code{)} or @code{\(} @dots{} @code{\)})
1192 @cindex subexpressions
1193 @cindex parenthesizing
1195 A @dfn{group}, also known as a @dfn{subexpression}, consists of an
1196 @dfn{open-group operator}, any number of other operators, and a
1197 @dfn{close-group operator}. Regex treats this sequence as a unit, just
1198 as mathematics and programming languages treat a parenthesized
1199 expression as a unit.
1201 Therefore, using @dfn{groups}, you can:
1205 delimit the argument(s) to an alternation operator (@pxref{Alternation
1206 Operator}) or a repetition operator (@pxref{Repetition
1210 keep track of the indices of the substring that matched a given group.
1211 @xref{Using Registers}, for a precise explanation.
1216 use the back-reference operator (@pxref{Back-reference Operator}).
1219 use registers (@pxref{Using Registers}).
1225 If the syntax bit @code{RE_NO_BK_PARENS} is set, then @samp{(} represents
1226 the open-group operator and @samp{)} represents the
1227 close-group operator; otherwise, @samp{\(} and @samp{\)} do.
1229 If the syntax bit @code{RE_UNMATCHED_RIGHT_PAREN_ORD} is set and a
1230 close-group operator has no matching open-group operator, then Regex
1231 considers it to match @samp{)}.
1234 @node Back-reference Operator, Anchoring Operators, Grouping Operators, Common Operators
1235 @section The Back-reference Operator (@dfn{\}@var{digit})
1237 @cindex back references
1239 If the syntax bit @code{RE_NO_BK_REF} isn't set, then Regex recognizes
1240 back references. A back reference matches a specified preceding group.
1241 The back reference operator is represented by @samp{\@var{digit}}
1242 anywhere after the end of a regular expression's @w{@var{digit}-th}
1243 group (@pxref{Grouping Operators}).
1245 @var{digit} must be between @samp{1} and @samp{9}. The matcher assigns
1246 numbers 1 through 9 to the first nine groups it encounters. By using
1247 one of @samp{\1} through @samp{\9} after the corresponding group's
1248 close-group operator, you can match a substring identical to the
1249 one that the group does.
1251 Back references match according to the following (in all examples below,
1252 @samp{(} represents the open-group, @samp{)} the close-group, @samp{@{}
1253 the open-interval and @samp{@}} the close-interval operator):
1257 If the group matches a substring, the back reference matches an
1258 identical substring. For example, @samp{(a)\1} matches @samp{aa} and
1259 @samp{(bana)na\1bo\1} matches @samp{bananabanabobana}. Likewise,
1260 @samp{(.*)\1} matches any (newline-free if the syntax bit
1261 @code{RE_DOT_NEWLINE} isn't set) string that is composed of two
1262 identical halves; the @samp{(.*)} matches the first half and the
1263 @samp{\1} matches the second half.
1266 If the group matches more than once (as it might if followed
1267 by, e.g., a repetition operator), then the back reference matches the
1268 substring the group @emph{last} matched. For example,
1269 @samp{((a*)b)*\1\2} matches @samp{aabababa}; first @w{group 1} (the
1270 outer one) matches @samp{aab} and @w{group 2} (the inner one) matches
1271 @samp{aa}. Then @w{group 1} matches @samp{ab} and @w{group 2} matches
1272 @samp{a}. So, @samp{\1} matches @samp{ab} and @samp{\2} matches
1276 If the group doesn't participate in a match, i.e., it is part of an
1277 alternative not taken or a repetition operator allows zero repetitions
1278 of it, then the back reference makes the whole match fail. For example,
1279 @samp{(one()|two())-and-(three\2|four\3)} matches @samp{one-and-three}
1280 and @samp{two-and-four}, but not @samp{one-and-four} or
1281 @samp{two-and-three}. For example, if the pattern matches
1282 @samp{one-and-}, then its @w{group 2} matches the empty string and its
1283 @w{group 3} doesn't participate in the match. So, if it then matches
1284 @samp{four}, then when it tries to back reference @w{group 3}---which it
1285 will attempt to do because @samp{\3} follows the @samp{four}---the match
1286 will fail because @w{group 3} didn't participate in the match.
1290 You can use a back reference as an argument to a repetition operator. For
1291 example, @samp{(a(b))\2*} matches @samp{a} followed by two or more
1292 @samp{b}s. Similarly, @samp{(a(b))\2@{3@}} matches @samp{abbbb}.
1294 If there is no preceding @w{@var{digit}-th} subexpression, the regular
1295 expression is invalid.
1298 @node Anchoring Operators, , Back-reference Operator, Common Operators
1299 @section Anchoring Operators
1302 @cindex regexp anchoring
1304 These operators can constrain a pattern to match only at the beginning or
1305 end of the entire string or at the beginning or end of a line.
1308 * Match-beginning-of-line Operator:: ^
1309 * Match-end-of-line Operator:: $
1313 @node Match-beginning-of-line Operator, Match-end-of-line Operator, , Anchoring Operators
1314 @subsection The Match-beginning-of-line Operator (@code{^})
1317 @cindex beginning-of-line operator
1320 This operator can match the empty string either at the beginning of the
1321 string or after a newline character. Thus, it is said to @dfn{anchor}
1322 the pattern to the beginning of a line.
1324 In the cases following, @samp{^} represents this operator. (Otherwise,
1325 @samp{^} is ordinary.)
1330 It (the @samp{^}) is first in the pattern, as in @samp{^foo}.
1332 @cnindex RE_CONTEXT_INDEP_ANCHORS @r{(and @samp{^})}
1334 The syntax bit @code{RE_CONTEXT_INDEP_ANCHORS} is set, and it is outside
1335 a bracket expression.
1337 @cindex open-group operator and @samp{^}
1338 @cindex alternation operator and @samp{^}
1340 It follows an open-group or alternation operator, as in @samp{a\(^b\)}
1341 and @samp{a\|^b}. @xref{Grouping Operators}, and @ref{Alternation
1346 These rules imply that some valid patterns containing @samp{^} cannot be
1347 matched; for example, @samp{foo^bar} if @code{RE_CONTEXT_INDEP_ANCHORS}
1350 @vindex not_bol @r{field in pattern buffer}
1351 If the @code{not_bol} field is set in the pattern buffer (@pxref{GNU
1352 Pattern Buffers}), then @samp{^} fails to match at the beginning of the
1353 string. @xref{POSIX Matching}, for when you might find this useful.
1355 @vindex newline_anchor @r{field in pattern buffer}
1356 If the @code{newline_anchor} field is set in the pattern buffer, then
1357 @samp{^} fails to match after a newline. This is useful when you do not
1358 regard the string to be matched as broken into lines.
1361 @node Match-end-of-line Operator, , Match-beginning-of-line Operator, Anchoring Operators
1362 @subsection The Match-end-of-line Operator (@code{$})
1365 @cindex end-of-line operator
1368 This operator can match the empty string either at the end of
1369 the string or before a newline character in the string. Thus, it is
1370 said to @dfn{anchor} the pattern to the end of a line.
1372 It is always represented by @samp{$}. For example, @samp{foo$} usually
1373 matches, e.g., @samp{foo} and, e.g., the first three characters of
1376 Its interaction with the syntax bits and pattern buffer fields is
1377 exactly the dual of @samp{^}'s; see the previous section. (That is,
1378 ``beginning'' becomes ``end'', ``next'' becomes ``previous'', and
1379 ``after'' becomes ``before''.)
1382 @node GNU Operators, GNU Emacs Operators, Common Operators, Top
1383 @chapter GNU Operators
1385 Following are operators that @sc{gnu} defines (and @sc{posix} doesn't).
1389 * Buffer Operators::
1392 @node Word Operators, Buffer Operators, , GNU Operators
1393 @section Word Operators
1395 The operators in this section require Regex to recognize parts of words.
1396 Regex uses a syntax table to determine whether or not a character is
1397 part of a word, i.e., whether or not it is @dfn{word-constituent}.
1400 * Non-Emacs Syntax Tables::
1401 * Match-word-boundary Operator:: \b
1402 * Match-within-word Operator:: \B
1403 * Match-beginning-of-word Operator:: \<
1404 * Match-end-of-word Operator:: \>
1405 * Match-word-constituent Operator:: \w
1406 * Match-non-word-constituent Operator:: \W
1409 @node Non-Emacs Syntax Tables, Match-word-boundary Operator, , Word Operators
1410 @subsection Non-Emacs Syntax Tables
1412 A @dfn{syntax table} is an array indexed by the characters in your
1413 character set. In the @sc{ascii} encoding, therefore, a syntax table
1414 has 256 elements. Regex always uses a @code{char *} variable
1415 @code{re_syntax_table} as its syntax table. In some cases, it
1416 initializes this variable and in others it expects you to initialize it.
1420 If Regex is compiled with the preprocessor symbols @code{emacs} and
1421 @code{SYNTAX_TABLE} both undefined, then Regex allocates
1422 @code{re_syntax_table} and initializes an element @var{i} either to
1423 @code{Sword} (which it defines) if @var{i} is a letter, number, or
1424 @samp{_}, or to zero if it's not.
1427 If Regex is compiled with @code{emacs} undefined but @code{SYNTAX_TABLE}
1428 defined, then Regex expects you to define a @code{char *} variable
1429 @code{re_syntax_table} to be a valid syntax table.
1432 @xref{Emacs Syntax Tables}, for what happens when Regex is compiled with
1433 the preprocessor symbol @code{emacs} defined.
1437 @node Match-word-boundary Operator, Match-within-word Operator, Non-Emacs Syntax Tables, Word Operators
1438 @subsection The Match-word-boundary Operator (@code{\b})
1441 @cindex word boundaries, matching
1443 This operator (represented by @samp{\b}) matches the empty string at
1444 either the beginning or the end of a word. For example, @samp{\brat\b}
1445 matches the separate word @samp{rat}.
1447 @node Match-within-word Operator, Match-beginning-of-word Operator, Match-word-boundary Operator, Word Operators
1448 @subsection The Match-within-word Operator (@code{\B})
1452 This operator (represented by @samp{\B}) matches the empty string within
1453 a word. For example, @samp{c\Brat\Be} matches @samp{crate}, but
1454 @samp{dirty \Brat} doesn't match @samp{dirty rat}.
1456 @node Match-beginning-of-word Operator, Match-end-of-word Operator, Match-within-word Operator, Word Operators
1457 @subsection The Match-beginning-of-word Operator (@code{\<})
1461 This operator (represented by @samp{\<}) matches the empty string at the
1462 beginning of a word.
1464 @node Match-end-of-word Operator, Match-word-constituent Operator, Match-beginning-of-word Operator, Word Operators
1465 @subsection The Match-end-of-word Operator (@code{\>})
1469 This operator (represented by @samp{\>}) matches the empty string at the
1472 @node Match-word-constituent Operator, Match-non-word-constituent Operator, Match-end-of-word Operator, Word Operators
1473 @subsection The Match-word-constituent Operator (@code{\w})
1477 This operator (represented by @samp{\w}) matches any word-constituent
1480 @node Match-non-word-constituent Operator, , Match-word-constituent Operator, Word Operators
1481 @subsection The Match-non-word-constituent Operator (@code{\W})
1485 This operator (represented by @samp{\W}) matches any character that is
1486 not word-constituent.
1489 @node Buffer Operators, , Word Operators, GNU Operators
1490 @section Buffer Operators
1492 Following are operators which work on buffers. In Emacs, a @dfn{buffer}
1493 is, naturally, an Emacs buffer. For other programs, Regex considers the
1494 entire string to be matched as the buffer.
1497 * Match-beginning-of-buffer Operator:: \`
1498 * Match-end-of-buffer Operator:: \'
1502 @node Match-beginning-of-buffer Operator, Match-end-of-buffer Operator, , Buffer Operators
1503 @subsection The Match-beginning-of-buffer Operator (@code{\`})
1507 This operator (represented by @samp{\`}) matches the empty string at the
1508 beginning of the buffer.
1510 @node Match-end-of-buffer Operator, , Match-beginning-of-buffer Operator, Buffer Operators
1511 @subsection The Match-end-of-buffer Operator (@code{\'})
1515 This operator (represented by @samp{\'}) matches the empty string at the
1519 @node GNU Emacs Operators, What Gets Matched?, GNU Operators, Top
1520 @chapter GNU Emacs Operators
1522 Following are operators that @sc{gnu} defines (and @sc{posix} doesn't)
1523 that you can use only when Regex is compiled with the preprocessor
1524 symbol @code{emacs} defined.
1527 * Syntactic Class Operators::
1531 @node Syntactic Class Operators, , , GNU Emacs Operators
1532 @section Syntactic Class Operators
1534 The operators in this section require Regex to recognize the syntactic
1535 classes of characters. Regex uses a syntax table to determine this.
1538 * Emacs Syntax Tables::
1539 * Match-syntactic-class Operator:: \sCLASS
1540 * Match-not-syntactic-class Operator:: \SCLASS
1543 @node Emacs Syntax Tables, Match-syntactic-class Operator, , Syntactic Class Operators
1544 @subsection Emacs Syntax Tables
1546 A @dfn{syntax table} is an array indexed by the characters in your
1547 character set. In the @sc{ascii} encoding, therefore, a syntax table
1550 If Regex is compiled with the preprocessor symbol @code{emacs} defined,
1551 then Regex expects you to define and initialize the variable
1552 @code{re_syntax_table} to be an Emacs syntax table. Emacs' syntax
1553 tables are more complicated than Regex's own (@pxref{Non-Emacs Syntax
1554 Tables}). @xref{Syntax, , Syntax, emacs, The GNU Emacs User's Manual},
1555 for a description of Emacs' syntax tables.
1557 @node Match-syntactic-class Operator, Match-not-syntactic-class Operator, Emacs Syntax Tables, Syntactic Class Operators
1558 @subsection The Match-syntactic-class Operator (@code{\s}@var{class})
1562 This operator matches any character whose syntactic class is represented
1563 by a specified character. @samp{\s@var{class}} represents this operator
1564 where @var{class} is the character representing the syntactic class you
1565 want. For example, @samp{w} represents the syntactic
1566 class of word-constituent characters, so @samp{\sw} matches any
1567 word-constituent character.
1569 @node Match-not-syntactic-class Operator, , Match-syntactic-class Operator, Syntactic Class Operators
1570 @subsection The Match-not-syntactic-class Operator (@code{\S}@var{class})
1574 This operator is similar to the match-syntactic-class operator except
1575 that it matches any character whose syntactic class is @emph{not}
1576 represented by the specified character. @samp{\S@var{class}} represents
1577 this operator. For example, @samp{w} represents the syntactic class of
1578 word-constituent characters, so @samp{\Sw} matches any character that is
1579 not word-constituent.
1582 @node What Gets Matched?, Programming with Regex, GNU Emacs Operators, Top
1583 @chapter What Gets Matched?
1585 Regex usually matches strings according to the ``leftmost longest''
1586 rule; that is, it chooses the longest of the leftmost matches. This
1587 does not mean that for a regular expression containing subexpressions
1588 that it simply chooses the longest match for each subexpression, left to
1589 right; the overall match must also be the longest possible one.
1591 For example, @samp{(ac*)(c*d[ac]*)\1} matches @samp{acdacaaa}, not
1592 @samp{acdac}, as it would if it were to choose the longest match for the
1593 first subexpression.
1596 @node Programming with Regex, Copying, What Gets Matched?, Top
1597 @chapter Programming with Regex
1599 Here we describe how you use the Regex data structures and functions in
1600 C programs. Regex has three interfaces: one designed for @sc{gnu}, one
1601 compatible with @sc{posix} and one compatible with Berkeley @sc{unix}.
1604 * GNU Regex Functions::
1605 * POSIX Regex Functions::
1606 * BSD Regex Functions::
1610 @node GNU Regex Functions, POSIX Regex Functions, , Programming with Regex
1611 @section GNU Regex Functions
1613 If you're writing code that doesn't need to be compatible with either
1614 @sc{posix} or Berkeley @sc{unix}, you can use these functions. They
1615 provide more options than the other interfaces.
1618 * GNU Pattern Buffers:: The re_pattern_buffer type.
1619 * GNU Regular Expression Compiling:: re_compile_pattern ()
1620 * GNU Matching:: re_match ()
1621 * GNU Searching:: re_search ()
1622 * Matching/Searching with Split Data:: re_match_2 (), re_search_2 ()
1623 * Searching with Fastmaps:: re_compile_fastmap ()
1624 * GNU Translate Tables:: The `translate' field.
1625 * Using Registers:: The re_registers type and related fns.
1626 * Freeing GNU Pattern Buffers:: regfree ()
1630 @node GNU Pattern Buffers, GNU Regular Expression Compiling, , GNU Regex Functions
1631 @subsection GNU Pattern Buffers
1633 @cindex pattern buffer, definition of
1634 @tindex re_pattern_buffer @r{definition}
1635 @tindex struct re_pattern_buffer @r{definition}
1637 To compile, match, or search for a given regular expression, you must
1638 supply a pattern buffer. A @dfn{pattern buffer} holds one compiled
1639 regular expression.@footnote{Regular expressions are also referred to as
1640 ``patterns,'' hence the name ``pattern buffer.''}
1642 You can have several different pattern buffers simultaneously, each
1643 holding a compiled pattern for a different regular expression.
1645 @file{regex.h} defines the pattern buffer @code{struct} as follows:
1648 [[[ pattern_buffer ]]]
1652 @node GNU Regular Expression Compiling, GNU Matching, GNU Pattern Buffers, GNU Regex Functions
1653 @subsection GNU Regular Expression Compiling
1655 In @sc{gnu}, you can both match and search for a given regular
1656 expression. To do either, you must first compile it in a pattern buffer
1657 (@pxref{GNU Pattern Buffers}).
1659 @cindex syntax initialization
1660 @vindex re_syntax_options @r{initialization}
1661 Regular expressions match according to the syntax with which they were
1662 compiled; with @sc{gnu}, you indicate what syntax you want by setting
1663 the variable @code{re_syntax_options} (declared in @file{regex.h} and
1664 defined in @file{regex.c}) before calling the compiling function,
1665 @code{re_compile_pattern} (see below). @xref{Syntax Bits}, and
1666 @ref{Predefined Syntaxes}.
1668 You can change the value of @code{re_syntax_options} at any time.
1669 Usually, however, you set its value once and then never change it.
1671 @cindex pattern buffer initialization
1672 @code{re_compile_pattern} takes a pattern buffer as an argument. You
1673 must initialize the following fields:
1677 @item translate @r{initialization}
1680 @vindex translate @r{initialization}
1681 Initialize this to point to a translate table if you want one, or to
1682 zero if you don't. We explain translate tables in @ref{GNU Translate
1686 @vindex fastmap @r{initialization}
1687 Initialize this to nonzero if you want a fastmap, or to zero if you
1692 @vindex buffer @r{initialization}
1693 @vindex allocated @r{initialization}
1695 If you want @code{re_compile_pattern} to allocate memory for the
1696 compiled pattern, set both of these to zero. If you have an existing
1697 block of memory (allocated with @code{malloc}) you want Regex to use,
1698 set @code{buffer} to its address and @code{allocated} to its size (in
1701 @code{re_compile_pattern} uses @code{realloc} to extend the space for
1702 the compiled pattern as necessary.
1706 To compile a pattern buffer, use:
1708 @findex re_compile_pattern
1711 re_compile_pattern (const char *@var{regex}, const int @var{regex_size},
1712 struct re_pattern_buffer *@var{pattern_buffer})
1716 @var{regex} is the regular expression's address, @var{regex_size} is its
1717 length, and @var{pattern_buffer} is the pattern buffer's address.
1719 If @code{re_compile_pattern} successfully compiles the regular
1720 expression, it returns zero and sets @code{*@var{pattern_buffer}} to the
1721 compiled pattern. It sets the pattern buffer's fields as follows:
1725 @vindex buffer @r{field, set by @code{re_compile_pattern}}
1726 to the compiled pattern.
1729 @vindex used @r{field, set by @code{re_compile_pattern}}
1730 to the number of bytes the compiled pattern in @code{buffer} occupies.
1733 @vindex syntax @r{field, set by @code{re_compile_pattern}}
1734 to the current value of @code{re_syntax_options}.
1737 @vindex re_nsub @r{field, set by @code{re_compile_pattern}}
1738 to the number of subexpressions in @var{regex}.
1740 @item fastmap_accurate
1741 @vindex fastmap_accurate @r{field, set by @code{re_compile_pattern}}
1742 to zero on the theory that the pattern you're compiling is different
1743 than the one previously compiled into @code{buffer}; in that case (since
1744 you can't make a fastmap without a compiled pattern),
1745 @code{fastmap} would either contain an incompatible fastmap, or nothing
1751 If @code{re_compile_pattern} can't compile @var{regex}, it returns an
1752 error string corresponding to one of the errors listed in @ref{POSIX
1753 Regular Expression Compiling}.
1756 @node GNU Matching, GNU Searching, GNU Regular Expression Compiling, GNU Regex Functions
1757 @subsection GNU Matching
1759 @cindex matching with GNU functions
1761 Matching the @sc{gnu} way means trying to match as much of a string as
1762 possible starting at a position within it you specify. Once you've compiled
1763 a pattern into a pattern buffer (@pxref{GNU Regular Expression
1764 Compiling}), you can ask the matcher to match that pattern against a
1770 re_match (struct re_pattern_buffer *@var{pattern_buffer},
1771 const char *@var{string}, const int @var{size},
1772 const int @var{start}, struct re_registers *@var{regs})
1776 @var{pattern_buffer} is the address of a pattern buffer containing a
1777 compiled pattern. @var{string} is the string you want to match; it can
1778 contain newline and null characters. @var{size} is the length of that
1779 string. @var{start} is the string index at which you want to
1780 begin matching; the first character of @var{string} is at index zero.
1781 @xref{Using Registers}, for a explanation of @var{regs}; you can safely
1784 @code{re_match} matches the regular expression in @var{pattern_buffer}
1785 against the string @var{string} according to the syntax in
1786 @var{pattern_buffers}'s @code{syntax} field. (@xref{GNU Regular
1787 Expression Compiling}, for how to set it.) The function returns
1788 @math{-1} if the compiled pattern does not match any part of
1789 @var{string} and @math{-2} if an internal error happens; otherwise, it
1790 returns how many (possibly zero) characters of @var{string} the pattern
1793 An example: suppose @var{pattern_buffer} points to a pattern buffer
1794 containing the compiled pattern for @samp{a*}, and @var{string} points
1795 to @samp{aaaaab} (whereupon @var{size} should be 6). Then if @var{start}
1796 is 2, @code{re_match} returns 3, i.e., @samp{a*} would have matched the
1797 last three @samp{a}s in @var{string}. If @var{start} is 0,
1798 @code{re_match} returns 5, i.e., @samp{a*} would have matched all the
1799 @samp{a}s in @var{string}. If @var{start} is either 5 or 6, it returns
1802 If @var{start} is not between zero and @var{size}, then
1803 @code{re_match} returns @math{-1}.
1806 @node GNU Searching, Matching/Searching with Split Data, GNU Matching, GNU Regex Functions
1807 @subsection GNU Searching
1809 @cindex searching with GNU functions
1811 @dfn{Searching} means trying to match starting at successive positions
1812 within a string. The function @code{re_search} does this.
1814 Before calling @code{re_search}, you must compile your regular
1815 expression. @xref{GNU Regular Expression Compiling}.
1817 Here is the function declaration:
1822 re_search (struct re_pattern_buffer *@var{pattern_buffer},
1823 const char *@var{string}, const int @var{size},
1824 const int @var{start}, const int @var{range},
1825 struct re_registers *@var{regs})
1829 @vindex start @r{argument to @code{re_search}}
1830 @vindex range @r{argument to @code{re_search}}
1831 whose arguments are the same as those to @code{re_match} (@pxref{GNU
1832 Matching}) except that the two arguments @var{start} and @var{range}
1833 replace @code{re_match}'s argument @var{start}.
1835 If @var{range} is positive, then @code{re_search} attempts a match
1836 starting first at index @var{start}, then at @math{@var{start} + 1} if
1837 that fails, and so on, up to @math{@var{start} + @var{range}}; if
1838 @var{range} is negative, then it attempts a match starting first at
1839 index @var{start}, then at @math{@var{start} -1} if that fails, and so
1842 If @var{start} is not between zero and @var{size}, then @code{re_search}
1843 returns @math{-1}. When @var{range} is positive, @code{re_search}
1844 adjusts @var{range} so that @math{@var{start} + @var{range} - 1} is
1845 between zero and @var{size}, if necessary; that way it won't search
1846 outside of @var{string}. Similarly, when @var{range} is negative,
1847 @code{re_search} adjusts @var{range} so that @math{@var{start} +
1848 @var{range} + 1} is between zero and @var{size}, if necessary.
1850 If the @code{fastmap} field of @var{pattern_buffer} is zero,
1851 @code{re_search} matches starting at consecutive positions; otherwise,
1852 it uses @code{fastmap} to make the search more efficient.
1853 @xref{Searching with Fastmaps}.
1855 If no match is found, @code{re_search} returns @math{-1}. If
1856 a match is found, it returns the index where the match began. If an
1857 internal error happens, it returns @math{-2}.
1860 @node Matching/Searching with Split Data, Searching with Fastmaps, GNU Searching, GNU Regex Functions
1861 @subsection Matching and Searching with Split Data
1863 Using the functions @code{re_match_2} and @code{re_search_2}, you can
1864 match or search in data that is divided into two strings.
1871 re_match_2 (struct re_pattern_buffer *@var{buffer},
1872 const char *@var{string1}, const int @var{size1},
1873 const char *@var{string2}, const int @var{size2},
1874 const int @var{start},
1875 struct re_registers *@var{regs},
1876 const int @var{stop})
1880 is similar to @code{re_match} (@pxref{GNU Matching}) except that you
1881 pass @emph{two} data strings and sizes, and an index @var{stop} beyond
1882 which you don't want the matcher to try matching. As with
1883 @code{re_match}, if it succeeds, @code{re_match_2} returns how many
1884 characters of @var{string} it matched. Regard @var{string1} and
1885 @var{string2} as concatenated when you set the arguments @var{start} and
1886 @var{stop} and use the contents of @var{regs}; @code{re_match_2} never
1887 returns a value larger than @math{@var{size1} + @var{size2}}.
1894 re_search_2 (struct re_pattern_buffer *@var{buffer},
1895 const char *@var{string1}, const int @var{size1},
1896 const char *@var{string2}, const int @var{size2},
1897 const int @var{start}, const int @var{range},
1898 struct re_registers *@var{regs},
1899 const int @var{stop})
1903 is similarly related to @code{re_search}.
1906 @node Searching with Fastmaps, GNU Translate Tables, Matching/Searching with Split Data, GNU Regex Functions
1907 @subsection Searching with Fastmaps
1910 If you're searching through a long string, you should use a fastmap.
1911 Without one, the searcher tries to match at consecutive positions in the
1912 string. Generally, most of the characters in the string could not start
1913 a match. It takes much longer to try matching at a given position in the
1914 string than it does to check in a table whether or not the character at
1915 that position could start a match. A @dfn{fastmap} is such a table.
1917 More specifically, a fastmap is an array indexed by the characters in
1918 your character set. Under the @sc{ascii} encoding, therefore, a fastmap
1919 has 256 elements. If you want the searcher to use a fastmap with a
1920 given pattern buffer, you must allocate the array and assign the array's
1921 address to the pattern buffer's @code{fastmap} field. You either can
1922 compile the fastmap yourself or have @code{re_search} do it for you;
1923 when @code{fastmap} is nonzero, it automatically compiles a fastmap the
1924 first time you search using a particular compiled pattern.
1926 To compile a fastmap yourself, use:
1928 @findex re_compile_fastmap
1931 re_compile_fastmap (struct re_pattern_buffer *@var{pattern_buffer})
1935 @var{pattern_buffer} is the address of a pattern buffer. If the
1936 character @var{c} could start a match for the pattern,
1937 @code{re_compile_fastmap} makes
1938 @code{@var{pattern_buffer}->fastmap[@var{c}]} nonzero. It returns
1939 @math{0} if it can compile a fastmap and @math{-2} if there is an
1940 internal error. For example, if @samp{|} is the alternation operator
1941 and @var{pattern_buffer} holds the compiled pattern for @samp{a|b}, then
1942 @code{re_compile_fastmap} sets @code{fastmap['a']} and
1943 @code{fastmap['b']} (and no others).
1945 @code{re_search} uses a fastmap as it moves along in the string: it
1946 checks the string's characters until it finds one that's in the fastmap.
1947 Then it tries matching at that character. If the match fails, it
1948 repeats the process. So, by using a fastmap, @code{re_search} doesn't
1949 waste time trying to match at positions in the string that couldn't
1952 If you don't want @code{re_search} to use a fastmap,
1953 store zero in the @code{fastmap} field of the pattern buffer before
1954 calling @code{re_search}.
1956 Once you've initialized a pattern buffer's @code{fastmap} field, you
1957 need never do so again---even if you compile a new pattern in
1958 it---provided the way the field is set still reflects whether or not you
1959 want a fastmap. @code{re_search} will still either do nothing if
1960 @code{fastmap} is null or, if it isn't, compile a new fastmap for the
1963 @node GNU Translate Tables, Using Registers, Searching with Fastmaps, GNU Regex Functions
1964 @subsection GNU Translate Tables
1966 If you set the @code{translate} field of a pattern buffer to a translate
1967 table, then the @sc{gnu} Regex functions to which you've passed that
1968 pattern buffer use it to apply a simple transformation
1969 to all the regular expression and string characters at which they look.
1971 A @dfn{translate table} is an array indexed by the characters in your
1972 character set. Under the @sc{ascii} encoding, therefore, a translate
1973 table has 256 elements. The array's elements are also characters in
1974 your character set. When the Regex functions see a character @var{c},
1975 they use @code{translate[@var{c}]} in its place, with one exception: the
1976 character after a @samp{\} is not translated. (This ensures that, the
1977 operators, e.g., @samp{\B} and @samp{\b}, are always distinguishable.)
1979 For example, a table that maps all lowercase letters to the
1980 corresponding uppercase ones would cause the matcher to ignore
1981 differences in case.@footnote{A table that maps all uppercase letters to
1982 the corresponding lowercase ones would work just as well for this
1983 purpose.} Such a table would map all characters except lowercase letters
1984 to themselves, and lowercase letters to the corresponding uppercase
1985 ones. Under the @sc{ascii} encoding, here's how you could initialize
1986 such a table (we'll call it @code{case_fold}):
1989 for (i = 0; i < 256; i++)
1991 for (i = 'a'; i <= 'z'; i++)
1992 case_fold[i] = i - ('a' - 'A');
1995 You tell Regex to use a translate table on a given pattern buffer by
1996 assigning that table's address to the @code{translate} field of that
1997 buffer. If you don't want Regex to do any translation, put zero into
1998 this field. You'll get weird results if you change the table's contents
1999 anytime between compiling the pattern buffer, compiling its fastmap, and
2000 matching or searching with the pattern buffer.
2002 @node Using Registers, Freeing GNU Pattern Buffers, GNU Translate Tables, GNU Regex Functions
2003 @subsection Using Registers
2005 A group in a regular expression can match a (posssibly empty) substring
2006 of the string that regular expression as a whole matched. The matcher
2007 remembers the beginning and end of the substring matched by
2010 To find out what they matched, pass a nonzero @var{regs} argument to a
2011 @sc{gnu} matching or searching function (@pxref{GNU Matching} and
2012 @ref{GNU Searching}), i.e., the address of a structure of this type, as
2013 defined in @file{regex.h}:
2015 @c We don't bother to include this directly from regex.h,
2016 @c since it changes so rarely.
2018 @tindex re_registers
2019 @vindex num_regs @r{in @code{struct re_registers}}
2020 @vindex start @r{in @code{struct re_registers}}
2021 @vindex end @r{in @code{struct re_registers}}
2030 Except for (possibly) the @var{num_regs}'th element (see below), the
2031 @var{i}th element of the @code{start} and @code{end} arrays records
2032 information about the @var{i}th group in the pattern. (They're declared
2033 as C pointers, but this is only because not all C compilers accept
2034 zero-length arrays; conceptually, it is simplest to think of them as
2037 The @code{start} and @code{end} arrays are allocated in various ways,
2038 depending on the value of the @code{regs_allocated}
2039 @vindex regs_allocated
2040 field in the pattern buffer passed to the matcher.
2042 The simplest and perhaps most useful is to let the matcher (re)allocate
2043 enough space to record information for all the groups in the regular
2044 expression. If @code{regs_allocated} is @code{REGS_UNALLOCATED},
2045 @vindex REGS_UNALLOCATED
2046 the matcher allocates @math{1 + @var{re_nsub}} (another field in the
2047 pattern buffer; @pxref{GNU Pattern Buffers}). The extra element is set
2048 to @math{-1}, and sets @code{regs_allocated} to @code{REGS_REALLOCATE}.
2049 @vindex REGS_REALLOCATE
2050 Then on subsequent calls with the same pattern buffer and @var{regs}
2051 arguments, the matcher reallocates more space if necessary.
2053 It would perhaps be more logical to make the @code{regs_allocated} field
2054 part of the @code{re_registers} structure, instead of part of the
2055 pattern buffer. But in that case the caller would be forced to
2056 initialize the structure before passing it. Much existing code doesn't
2057 do this initialization, and it's arguably better to avoid it anyway.
2059 @code{re_compile_pattern} sets @code{regs_allocated} to
2060 @code{REGS_UNALLOCATED},
2061 so if you use the GNU regular expression
2062 functions, you get this behavior by default.
2064 xx document re_set_registers
2066 @sc{posix}, on the other hand, requires a different interface: the
2067 caller is supposed to pass in a fixed-length array which the matcher
2068 fills. Therefore, if @code{regs_allocated} is @code{REGS_FIXED}
2070 the matcher simply fills that array.
2072 The following examples illustrate the information recorded in the
2073 @code{re_registers} structure. (In all of them, @samp{(} represents the
2074 open-group and @samp{)} the close-group operator. The first character
2075 in the string @var{string} is at index 0.)
2077 @c xx i'm not sure this is all true anymore.
2082 If the regular expression has an @w{@var{i}-th}
2083 group not contained within another group that matches a
2084 substring of @var{string}, then the function sets
2085 @code{@w{@var{regs}->}start[@var{i}]} to the index in @var{string} where
2086 the substring matched by the @w{@var{i}-th} group begins, and
2087 @code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that
2088 substring's end. The function sets @code{@w{@var{regs}->}start[0]} and
2089 @code{@w{@var{regs}->}end[0]} to analogous information about the entire
2092 For example, when you match @samp{((a)(b))} against @samp{ab}, you get:
2096 0 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]}
2099 0 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]}
2102 0 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]}
2105 1 in @code{@w{@var{regs}->}start[3]} and 2 in @code{@w{@var{regs}->}end[3]}
2109 If a group matches more than once (as it might if followed by,
2110 e.g., a repetition operator), then the function reports the information
2111 about what the group @emph{last} matched.
2113 For example, when you match the pattern @samp{(a)*} against the string
2118 0 in @code{@w{@var{regs}->}start[0]} and 2 in @code{@w{@var{regs}->}end[0]}
2121 1 in @code{@w{@var{regs}->}start[1]} and 2 in @code{@w{@var{regs}->}end[1]}
2125 If the @w{@var{i}-th} group does not participate in a
2126 successful match, e.g., it is an alternative not taken or a
2127 repetition operator allows zero repetitions of it, then the function
2128 sets @code{@w{@var{regs}->}start[@var{i}]} and
2129 @code{@w{@var{regs}->}end[@var{i}]} to @math{-1}.
2131 For example, when you match the pattern @samp{(a)*b} against
2132 the string @samp{b}, you get:
2136 0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
2139 @math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]}
2143 If the @w{@var{i}-th} group matches a zero-length string, then the
2144 function sets @code{@w{@var{regs}->}start[@var{i}]} and
2145 @code{@w{@var{regs}->}end[@var{i}]} to the index just beyond that
2148 For example, when you match the pattern @samp{(a*)b} against the string
2153 0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
2156 0 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]}
2160 The function sets @code{@w{@var{regs}->}start[0]} and
2161 @code{@w{@var{regs}->}end[0]} to analogous information about the entire
2164 For example, when you match the pattern @samp{(a*)} against the empty
2169 0 in @code{@w{@var{regs}->}start[0]} and 0 in @code{@w{@var{regs}->}end[0]}
2172 0 in @code{@w{@var{regs}->}start[1]} and 0 in @code{@w{@var{regs}->}end[1]}
2177 If an @w{@var{i}-th} group contains a @w{@var{j}-th} group
2178 in turn not contained within any other group within group @var{i} and
2179 the function reports a match of the @w{@var{i}-th} group, then it
2180 records in @code{@w{@var{regs}->}start[@var{j}]} and
2181 @code{@w{@var{regs}->}end[@var{j}]} the last match (if it matched) of
2182 the @w{@var{j}-th} group.
2184 For example, when you match the pattern @samp{((a*)b)*} against the
2185 string @samp{abb}, @w{group 2} last matches the empty string, so you
2186 get what it previously matched:
2190 0 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]}
2193 2 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]}
2196 2 in @code{@w{@var{regs}->}start[2]} and 2 in @code{@w{@var{regs}->}end[2]}
2199 When you match the pattern @samp{((a)*b)*} against the string
2200 @samp{abb}, @w{group 2} doesn't participate in the last match, so you
2205 0 in @code{@w{@var{regs}->}start[0]} and 3 in @code{@w{@var{regs}->}end[0]}
2208 2 in @code{@w{@var{regs}->}start[1]} and 3 in @code{@w{@var{regs}->}end[1]}
2211 0 in @code{@w{@var{regs}->}start[2]} and 1 in @code{@w{@var{regs}->}end[2]}
2215 If an @w{@var{i}-th} group contains a @w{@var{j}-th} group
2216 in turn not contained within any other group within group @var{i}
2217 and the function sets
2218 @code{@w{@var{regs}->}start[@var{i}]} and
2219 @code{@w{@var{regs}->}end[@var{i}]} to @math{-1}, then it also sets
2220 @code{@w{@var{regs}->}start[@var{j}]} and
2221 @code{@w{@var{regs}->}end[@var{j}]} to @math{-1}.
2223 For example, when you match the pattern @samp{((a)*b)*c} against the
2224 string @samp{c}, you get:
2228 0 in @code{@w{@var{regs}->}start[0]} and 1 in @code{@w{@var{regs}->}end[0]}
2231 @math{-1} in @code{@w{@var{regs}->}start[1]} and @math{-1} in @code{@w{@var{regs}->}end[1]}
2234 @math{-1} in @code{@w{@var{regs}->}start[2]} and @math{-1} in @code{@w{@var{regs}->}end[2]}
2239 @node Freeing GNU Pattern Buffers, , Using Registers, GNU Regex Functions
2240 @subsection Freeing GNU Pattern Buffers
2242 To free any allocated fields of a pattern buffer, you can use the
2243 @sc{posix} function described in @ref{Freeing POSIX Pattern Buffers},
2244 since the type @code{regex_t}---the type for @sc{posix} pattern
2245 buffers---is equivalent to the type @code{re_pattern_buffer}. After
2246 freeing a pattern buffer, you need to again compile a regular expression
2247 in it (@pxref{GNU Regular Expression Compiling}) before passing it to
2248 a matching or searching function.
2251 @node POSIX Regex Functions, BSD Regex Functions, GNU Regex Functions, Programming with Regex
2252 @section POSIX Regex Functions
2254 If you're writing code that has to be @sc{posix} compatible, you'll need
2255 to use these functions. Their interfaces are as specified by @sc{posix},
2259 * POSIX Pattern Buffers:: The regex_t type.
2260 * POSIX Regular Expression Compiling:: regcomp ()
2261 * POSIX Matching:: regexec ()
2262 * Reporting Errors:: regerror ()
2263 * Using Byte Offsets:: The regmatch_t type.
2264 * Freeing POSIX Pattern Buffers:: regfree ()
2268 @node POSIX Pattern Buffers, POSIX Regular Expression Compiling, , POSIX Regex Functions
2269 @subsection POSIX Pattern Buffers
2271 To compile or match a given regular expression the @sc{posix} way, you
2272 must supply a pattern buffer exactly the way you do for @sc{gnu}
2273 (@pxref{GNU Pattern Buffers}). @sc{posix} pattern buffers have type
2274 @code{regex_t}, which is equivalent to the @sc{gnu} pattern buffer
2275 type @code{re_pattern_buffer}.
2278 @node POSIX Regular Expression Compiling, POSIX Matching, POSIX Pattern Buffers, POSIX Regex Functions
2279 @subsection POSIX Regular Expression Compiling
2281 With @sc{posix}, you can only search for a given regular expression; you
2282 can't match it. To do this, you must first compile it in a
2283 pattern buffer, using @code{regcomp}.
2286 Before calling @code{regcomp}, you must initialize this pattern buffer
2287 as you do for @sc{gnu} (@pxref{GNU Regular Expression Compiling}). See
2288 below, however, for how to choose a syntax with which to compile.
2291 To compile a pattern buffer, use:
2296 regcomp (regex_t *@var{preg}, const char *@var{regex}, int @var{cflags})
2300 @var{preg} is the initialized pattern buffer's address, @var{regex} is
2301 the regular expression's address, and @var{cflags} is the compilation
2302 flags, which Regex considers as a collection of bits. Here are the
2303 valid bits, as defined in @file{regex.h}:
2308 @vindex REG_EXTENDED
2309 says to use @sc{posix} Extended Regular Expression syntax; if this isn't
2310 set, then says to use @sc{posix} Basic Regular Expression syntax.
2311 @code{regcomp} sets @var{preg}'s @code{syntax} field accordingly.
2315 @cindex ignoring case
2316 says to ignore case; @code{regcomp} sets @var{preg}'s @code{translate}
2317 field to a translate table which ignores case, replacing anything you've
2322 says to set @var{preg}'s @code{no_sub} field; @pxref{POSIX Matching},
2323 for what this means.
2332 match-any-character operator (@pxref{Match-any-character
2333 Operator}) doesn't match a newline.
2336 nonmatching list not containing a newline (@pxref{List
2337 Operators}) matches a newline.
2340 match-beginning-of-line operator (@pxref{Match-beginning-of-line
2341 Operator}) matches the empty string immediately after a newline,
2342 regardless of how @code{REG_NOTBOL} is set (@pxref{POSIX Matching}, for
2343 an explanation of @code{REG_NOTBOL}).
2346 match-end-of-line operator (@pxref{Match-beginning-of-line
2347 Operator}) matches the empty string immediately before a newline,
2348 regardless of how @code{REG_NOTEOL} is set (@pxref{POSIX Matching},
2349 for an explanation of @code{REG_NOTEOL}).
2355 If @code{regcomp} successfully compiles the regular expression, it
2356 returns zero and sets @code{*@var{pattern_buffer}} to the compiled
2357 pattern. Except for @code{syntax} (which it sets as explained above), it
2358 also sets the same fields the same way as does the @sc{gnu} compiling
2359 function (@pxref{GNU Regular Expression Compiling}).
2361 If @code{regcomp} can't compile the regular expression, it returns one
2362 of the error codes listed here. (Except when noted differently, the
2363 syntax of in all examples below is basic regular expression syntax.)
2367 @comment repetitions
2369 For example, the consecutive repetition operators @samp{**} in
2370 @samp{a**} are invalid. As another example, if the syntax is extended
2371 regular expression syntax, then the repetition operator @samp{*} with
2372 nothing on which to operate in @samp{*} is invalid.
2375 For example, the @var{count} @samp{-1} in @samp{a\@{-1} is invalid.
2378 For example, @samp{a\@{1} is missing a close-interval operator.
2382 For example, @samp{[a} is missing a close-list operator.
2385 For example, the range ending point @samp{z} that collates lower than
2386 does its starting point @samp{a} in @samp{[z-a]} is invalid. Also, the
2387 range with the character class @samp{[:alpha:]} as its starting point in
2388 @samp{[[:alpha:]-|]}.
2391 For example, the character class name @samp{foo} in @samp{[[:foo:]} is
2396 For example, @samp{a\)} is missing an open-group operator and @samp{\(a}
2397 is missing a close-group operator.
2400 For example, the back reference @samp{\2} that refers to a nonexistent
2401 subexpression in @samp{\(a\)\2} is invalid.
2403 @comment unfinished business
2406 Returned when a regular expression causes no other more specific error.
2409 For example, the trailing backslash @samp{\} in @samp{a\} is invalid, as is the
2412 @comment kitchen sink
2414 For example, in the extended regular expression syntax, the empty group
2415 @samp{()} in @samp{a()b} is invalid.
2419 Returned when a regular expression needs a pattern buffer larger than
2423 Returned when a regular expression makes Regex to run out of memory.
2428 @node POSIX Matching, Reporting Errors, POSIX Regular Expression Compiling, POSIX Regex Functions
2429 @subsection POSIX Matching
2431 Matching the @sc{posix} way means trying to match a null-terminated
2432 string starting at its first character. Once you've compiled a pattern
2433 into a pattern buffer (@pxref{POSIX Regular Expression Compiling}), you
2434 can ask the matcher to match that pattern against a string using:
2439 regexec (const regex_t *@var{preg}, const char *@var{string},
2440 size_t @var{nmatch}, regmatch_t @var{pmatch}[], int @var{eflags})
2444 @var{preg} is the address of a pattern buffer for a compiled pattern.
2445 @var{string} is the string you want to match.
2447 @xref{Using Byte Offsets}, for an explanation of @var{pmatch}. If you
2448 pass zero for @var{nmatch} or you compiled @var{preg} with the
2449 compilation flag @code{REG_NOSUB} set, then @code{regexec} will ignore
2450 @var{pmatch}; otherwise, you must allocate it to have at least
2451 @var{nmatch} elements. @code{regexec} will record @var{nmatch} byte
2452 offsets in @var{pmatch}, and set to @math{-1} any unused elements up to
2453 @math{@var{pmatch}@code{[@var{nmatch}]} - 1}.
2455 @var{eflags} specifies @dfn{execution flags}---namely, the two bits
2456 @code{REG_NOTBOL} and @code{REG_NOTEOL} (defined in @file{regex.h}). If
2457 you set @code{REG_NOTBOL}, then the match-beginning-of-line operator
2458 (@pxref{Match-beginning-of-line Operator}) always fails to match.
2459 This lets you match against pieces of a line, as you would need to if,
2460 say, searching for repeated instances of a given pattern in a line; it
2461 would work correctly for patterns both with and without
2462 match-beginning-of-line operators. @code{REG_NOTEOL} works analogously
2463 for the match-end-of-line operator (@pxref{Match-end-of-line
2464 Operator}); it exists for symmetry.
2466 @code{regexec} tries to find a match for @var{preg} in @var{string}
2467 according to the syntax in @var{preg}'s @code{syntax} field.
2468 (@xref{POSIX Regular Expression Compiling}, for how to set it.) The
2469 function returns zero if the compiled pattern matches @var{string} and
2470 @code{REG_NOMATCH} (defined in @file{regex.h}) if it doesn't.
2472 @node Reporting Errors, Using Byte Offsets, POSIX Matching, POSIX Regex Functions
2473 @subsection Reporting Errors
2475 If either @code{regcomp} or @code{regexec} fail, they return a nonzero
2476 error code, the possibilities for which are defined in @file{regex.h}.
2477 @xref{POSIX Regular Expression Compiling}, and @ref{POSIX Matching}, for
2478 what these codes mean. To get an error string corresponding to these
2484 regerror (int @var{errcode},
2485 const regex_t *@var{preg},
2487 size_t @var{errbuf_size})
2491 @var{errcode} is an error code, @var{preg} is the address of the pattern
2492 buffer which provoked the error, @var{errbuf} is the error buffer, and
2493 @var{errbuf_size} is @var{errbuf}'s size.
2495 @code{regerror} returns the size in bytes of the error string
2496 corresponding to @var{errcode} (including its terminating null). If
2497 @var{errbuf} and @var{errbuf_size} are nonzero, it also returns in
2498 @var{errbuf} the first @math{@var{errbuf_size} - 1} characters of the
2499 error string, followed by a null.
2500 @var{errbuf_size} must be a nonnegative number less than or equal to the
2501 size in bytes of @var{errbuf}.
2503 You can call @code{regerror} with a null @var{errbuf} and a zero
2504 @var{errbuf_size} to determine how large @var{errbuf} need be to
2505 accommodate @code{regerror}'s error string.
2507 @node Using Byte Offsets, Freeing POSIX Pattern Buffers, Reporting Errors, POSIX Regex Functions
2508 @subsection Using Byte Offsets
2510 In @sc{posix}, variables of type @code{regmatch_t} hold analogous
2511 information, but are not identical to, @sc{gnu}'s registers (@pxref{Using
2512 Registers}). To get information about registers in @sc{posix}, pass to
2513 @code{regexec} a nonzero @var{pmatch} of type @code{regmatch_t}, i.e.,
2514 the address of a structure of this type, defined in
2526 When reading in @ref{Using Registers}, about how the matching function
2527 stores the information into the registers, substitute @var{pmatch} for
2528 @var{regs}, @code{@w{@var{pmatch}[@var{i}]->}rm_so} for
2529 @code{@w{@var{regs}->}start[@var{i}]} and
2530 @code{@w{@var{pmatch}[@var{i}]->}rm_eo} for
2531 @code{@w{@var{regs}->}end[@var{i}]}.
2533 @node Freeing POSIX Pattern Buffers, , Using Byte Offsets, POSIX Regex Functions
2534 @subsection Freeing POSIX Pattern Buffers
2536 To free any allocated fields of a pattern buffer, use:
2541 regfree (regex_t *@var{preg})
2545 @var{preg} is the pattern buffer whose allocated fields you want freed.
2546 @code{regfree} also sets @var{preg}'s @code{allocated} and @code{used}
2547 fields to zero. After freeing a pattern buffer, you need to again
2548 compile a regular expression in it (@pxref{POSIX Regular Expression
2549 Compiling}) before passing it to the matching function (@pxref{POSIX
2553 @node BSD Regex Functions, , POSIX Regex Functions, Programming with Regex
2554 @section BSD Regex Functions
2556 If you're writing code that has to be Berkeley @sc{unix} compatible,
2557 you'll need to use these functions whose interfaces are the same as those
2558 in Berkeley @sc{unix}.
2561 * BSD Regular Expression Compiling:: re_comp ()
2562 * BSD Searching:: re_exec ()
2565 @node BSD Regular Expression Compiling, BSD Searching, , BSD Regex Functions
2566 @subsection BSD Regular Expression Compiling
2568 With Berkeley @sc{unix}, you can only search for a given regular
2569 expression; you can't match one. To search for it, you must first
2570 compile it. Before you compile it, you must indicate the regular
2571 expression syntax you want it compiled according to by setting the
2572 variable @code{re_syntax_options} (declared in @file{regex.h} to some
2573 syntax (@pxref{Regular Expression Syntax}).
2575 To compile a regular expression use:
2580 re_comp (char *@var{regex})
2584 @var{regex} is the address of a null-terminated regular expression.
2585 @code{re_comp} uses an internal pattern buffer, so you can use only the
2586 most recently compiled pattern buffer. This means that if you want to
2587 use a given regular expression that you've already compiled---but it
2588 isn't the latest one you've compiled---you'll have to recompile it. If
2589 you call @code{re_comp} with the null string (@emph{not} the empty
2590 string) as the argument, it doesn't change the contents of the pattern
2593 If @code{re_comp} successfully compiles the regular expression, it
2594 returns zero. If it can't compile the regular expression, it returns
2595 an error string. @code{re_comp}'s error messages are identical to those
2596 of @code{re_compile_pattern} (@pxref{GNU Regular Expression
2599 @node BSD Searching, , BSD Regular Expression Compiling, BSD Regex Functions
2600 @subsection BSD Searching
2602 Searching the Berkeley @sc{unix} way means searching in a string
2603 starting at its first character and trying successive positions within
2604 it to find a match. Once you've compiled a pattern using @code{re_comp}
2605 (@pxref{BSD Regular Expression Compiling}), you can ask Regex
2606 to search for that pattern in a string using:
2611 re_exec (char *@var{string})
2615 @var{string} is the address of the null-terminated string in which you
2618 @code{re_exec} returns either 1 for success or 0 for failure. It
2619 automatically uses a @sc{gnu} fastmap (@pxref{Searching with Fastmaps}).
2622 @node Copying, Index, Programming with Regex, Top
2623 @appendix GNU GENERAL PUBLIC LICENSE
2624 @center Version 2, June 1991
2627 Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc.
2628 675 Mass Ave, Cambridge, MA 02139, USA
2630 Everyone is permitted to copy and distribute verbatim copies
2631 of this license document, but changing it is not allowed.
2634 @unnumberedsec Preamble
2636 The licenses for most software are designed to take away your
2637 freedom to share and change it. By contrast, the GNU General Public
2638 License is intended to guarantee your freedom to share and change free
2639 software---to make sure the software is free for all its users. This
2640 General Public License applies to most of the Free Software
2641 Foundation's software and to any other program whose authors commit to
2642 using it. (Some other Free Software Foundation software is covered by
2643 the GNU Library General Public License instead.) You can apply it to
2646 When we speak of free software, we are referring to freedom, not
2647 price. Our General Public Licenses are designed to make sure that you
2648 have the freedom to distribute copies of free software (and charge for
2649 this service if you wish), that you receive source code or can get it
2650 if you want it, that you can change the software or use pieces of it
2651 in new free programs; and that you know you can do these things.
2653 To protect your rights, we need to make restrictions that forbid
2654 anyone to deny you these rights or to ask you to surrender the rights.
2655 These restrictions translate to certain responsibilities for you if you
2656 distribute copies of the software, or if you modify it.
2658 For example, if you distribute copies of such a program, whether
2659 gratis or for a fee, you must give the recipients all the rights that
2660 you have. You must make sure that they, too, receive or can get the
2661 source code. And you must show them these terms so they know their
2664 We protect your rights with two steps: (1) copyright the software, and
2665 (2) offer you this license which gives you legal permission to copy,
2666 distribute and/or modify the software.
2668 Also, for each author's protection and ours, we want to make certain
2669 that everyone understands that there is no warranty for this free
2670 software. If the software is modified by someone else and passed on, we
2671 want its recipients to know that what they have is not the original, so
2672 that any problems introduced by others will not reflect on the original
2673 authors' reputations.
2675 Finally, any free program is threatened constantly by software
2676 patents. We wish to avoid the danger that redistributors of a free
2677 program will individually obtain patent licenses, in effect making the
2678 program proprietary. To prevent this, we have made it clear that any
2679 patent must be licensed for everyone's free use or not licensed at all.
2681 The precise terms and conditions for copying, distribution and
2682 modification follow.
2685 @unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
2688 @center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
2693 This License applies to any program or other work which contains
2694 a notice placed by the copyright holder saying it may be distributed
2695 under the terms of this General Public License. The ``Program'', below,
2696 refers to any such program or work, and a ``work based on the Program''
2697 means either the Program or any derivative work under copyright law:
2698 that is to say, a work containing the Program or a portion of it,
2699 either verbatim or with modifications and/or translated into another
2700 language. (Hereinafter, translation is included without limitation in
2701 the term ``modification''.) Each licensee is addressed as ``you''.
2703 Activities other than copying, distribution and modification are not
2704 covered by this License; they are outside its scope. The act of
2705 running the Program is not restricted, and the output from the Program
2706 is covered only if its contents constitute a work based on the
2707 Program (independent of having been made by running the Program).
2708 Whether that is true depends on what the Program does.
2711 You may copy and distribute verbatim copies of the Program's
2712 source code as you receive it, in any medium, provided that you
2713 conspicuously and appropriately publish on each copy an appropriate
2714 copyright notice and disclaimer of warranty; keep intact all the
2715 notices that refer to this License and to the absence of any warranty;
2716 and give any other recipients of the Program a copy of this License
2717 along with the Program.
2719 You may charge a fee for the physical act of transferring a copy, and
2720 you may at your option offer warranty protection in exchange for a fee.
2723 You may modify your copy or copies of the Program or any portion
2724 of it, thus forming a work based on the Program, and copy and
2725 distribute such modifications or work under the terms of Section 1
2726 above, provided that you also meet all of these conditions:
2730 You must cause the modified files to carry prominent notices
2731 stating that you changed the files and the date of any change.
2734 You must cause any work that you distribute or publish, that in
2735 whole or in part contains or is derived from the Program or any
2736 part thereof, to be licensed as a whole at no charge to all third
2737 parties under the terms of this License.
2740 If the modified program normally reads commands interactively
2741 when run, you must cause it, when started running for such
2742 interactive use in the most ordinary way, to print or display an
2743 announcement including an appropriate copyright notice and a
2744 notice that there is no warranty (or else, saying that you provide
2745 a warranty) and that users may redistribute the program under
2746 these conditions, and telling the user how to view a copy of this
2747 License. (Exception: if the Program itself is interactive but
2748 does not normally print such an announcement, your work based on
2749 the Program is not required to print an announcement.)
2752 These requirements apply to the modified work as a whole. If
2753 identifiable sections of that work are not derived from the Program,
2754 and can be reasonably considered independent and separate works in
2755 themselves, then this License, and its terms, do not apply to those
2756 sections when you distribute them as separate works. But when you
2757 distribute the same sections as part of a whole which is a work based
2758 on the Program, the distribution of the whole must be on the terms of
2759 this License, whose permissions for other licensees extend to the
2760 entire whole, and thus to each and every part regardless of who wrote it.
2762 Thus, it is not the intent of this section to claim rights or contest
2763 your rights to work written entirely by you; rather, the intent is to
2764 exercise the right to control the distribution of derivative or
2765 collective works based on the Program.
2767 In addition, mere aggregation of another work not based on the Program
2768 with the Program (or with a work based on the Program) on a volume of
2769 a storage or distribution medium does not bring the other work under
2770 the scope of this License.
2773 You may copy and distribute the Program (or a work based on it,
2774 under Section 2) in object code or executable form under the terms of
2775 Sections 1 and 2 above provided that you also do one of the following:
2779 Accompany it with the complete corresponding machine-readable
2780 source code, which must be distributed under the terms of Sections
2781 1 and 2 above on a medium customarily used for software interchange; or,
2784 Accompany it with a written offer, valid for at least three
2785 years, to give any third party, for a charge no more than your
2786 cost of physically performing source distribution, a complete
2787 machine-readable copy of the corresponding source code, to be
2788 distributed under the terms of Sections 1 and 2 above on a medium
2789 customarily used for software interchange; or,
2792 Accompany it with the information you received as to the offer
2793 to distribute corresponding source code. (This alternative is
2794 allowed only for noncommercial distribution and only if you
2795 received the program in object code or executable form with such
2796 an offer, in accord with Subsection b above.)
2799 The source code for a work means the preferred form of the work for
2800 making modifications to it. For an executable work, complete source
2801 code means all the source code for all modules it contains, plus any
2802 associated interface definition files, plus the scripts used to
2803 control compilation and installation of the executable. However, as a
2804 special exception, the source code distributed need not include
2805 anything that is normally distributed (in either source or binary
2806 form) with the major components (compiler, kernel, and so on) of the
2807 operating system on which the executable runs, unless that component
2808 itself accompanies the executable.
2810 If distribution of executable or object code is made by offering
2811 access to copy from a designated place, then offering equivalent
2812 access to copy the source code from the same place counts as
2813 distribution of the source code, even though third parties are not
2814 compelled to copy the source along with the object code.
2817 You may not copy, modify, sublicense, or distribute the Program
2818 except as expressly provided under this License. Any attempt
2819 otherwise to copy, modify, sublicense or distribute the Program is
2820 void, and will automatically terminate your rights under this License.
2821 However, parties who have received copies, or rights, from you under
2822 this License will not have their licenses terminated so long as such
2823 parties remain in full compliance.
2826 You are not required to accept this License, since you have not
2827 signed it. However, nothing else grants you permission to modify or
2828 distribute the Program or its derivative works. These actions are
2829 prohibited by law if you do not accept this License. Therefore, by
2830 modifying or distributing the Program (or any work based on the
2831 Program), you indicate your acceptance of this License to do so, and
2832 all its terms and conditions for copying, distributing or modifying
2833 the Program or works based on it.
2836 Each time you redistribute the Program (or any work based on the
2837 Program), the recipient automatically receives a license from the
2838 original licensor to copy, distribute or modify the Program subject to
2839 these terms and conditions. You may not impose any further
2840 restrictions on the recipients' exercise of the rights granted herein.
2841 You are not responsible for enforcing compliance by third parties to
2845 If, as a consequence of a court judgment or allegation of patent
2846 infringement or for any other reason (not limited to patent issues),
2847 conditions are imposed on you (whether by court order, agreement or
2848 otherwise) that contradict the conditions of this License, they do not
2849 excuse you from the conditions of this License. If you cannot
2850 distribute so as to satisfy simultaneously your obligations under this
2851 License and any other pertinent obligations, then as a consequence you
2852 may not distribute the Program at all. For example, if a patent
2853 license would not permit royalty-free redistribution of the Program by
2854 all those who receive copies directly or indirectly through you, then
2855 the only way you could satisfy both it and this License would be to
2856 refrain entirely from distribution of the Program.
2858 If any portion of this section is held invalid or unenforceable under
2859 any particular circumstance, the balance of the section is intended to
2860 apply and the section as a whole is intended to apply in other
2863 It is not the purpose of this section to induce you to infringe any
2864 patents or other property right claims or to contest validity of any
2865 such claims; this section has the sole purpose of protecting the
2866 integrity of the free software distribution system, which is
2867 implemented by public license practices. Many people have made
2868 generous contributions to the wide range of software distributed
2869 through that system in reliance on consistent application of that
2870 system; it is up to the author/donor to decide if he or she is willing
2871 to distribute software through any other system and a licensee cannot
2874 This section is intended to make thoroughly clear what is believed to
2875 be a consequence of the rest of this License.
2878 If the distribution and/or use of the Program is restricted in
2879 certain countries either by patents or by copyrighted interfaces, the
2880 original copyright holder who places the Program under this License
2881 may add an explicit geographical distribution limitation excluding
2882 those countries, so that distribution is permitted only in or among
2883 countries not thus excluded. In such case, this License incorporates
2884 the limitation as if written in the body of this License.
2887 The Free Software Foundation may publish revised and/or new versions
2888 of the General Public License from time to time. Such new versions will
2889 be similar in spirit to the present version, but may differ in detail to
2890 address new problems or concerns.
2892 Each version is given a distinguishing version number. If the Program
2893 specifies a version number of this License which applies to it and ``any
2894 later version'', you have the option of following the terms and conditions
2895 either of that version or of any later version published by the Free
2896 Software Foundation. If the Program does not specify a version number of
2897 this License, you may choose any version ever published by the Free Software
2901 If you wish to incorporate parts of the Program into other free
2902 programs whose distribution conditions are different, write to the author
2903 to ask for permission. For software which is copyrighted by the Free
2904 Software Foundation, write to the Free Software Foundation; we sometimes
2905 make exceptions for this. Our decision will be guided by the two goals
2906 of preserving the free status of all derivatives of our free software and
2907 of promoting the sharing and reuse of software generally.
2910 @heading NO WARRANTY
2917 BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
2918 FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
2919 OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
2920 PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
2921 OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
2922 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
2923 TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
2924 PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
2925 REPAIR OR CORRECTION.
2928 IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
2929 WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
2930 REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
2931 INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
2932 OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
2933 TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
2934 YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
2935 PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
2936 POSSIBILITY OF SUCH DAMAGES.
2940 @heading END OF TERMS AND CONDITIONS
2943 @center END OF TERMS AND CONDITIONS
2947 @unnumberedsec Appendix: How to Apply These Terms to Your New Programs
2949 If you develop a new program, and you want it to be of the greatest
2950 possible use to the public, the best way to achieve this is to make it
2951 free software which everyone can redistribute and change under these terms.
2953 To do so, attach the following notices to the program. It is safest
2954 to attach them to the start of each source file to most effectively
2955 convey the exclusion of warranty; and each file should have at least
2956 the ``copyright'' line and a pointer to where the full notice is found.
2959 @var{one line to give the program's name and a brief idea of what it does.}
2960 Copyright (C) 19@var{yy} @var{name of author}
2962 This program is free software; you can redistribute it and/or modify
2963 it under the terms of the GNU General Public License as published by
2964 the Free Software Foundation; either version 2 of the License, or
2965 (at your option) any later version.
2967 This program is distributed in the hope that it will be useful,
2968 but WITHOUT ANY WARRANTY; without even the implied warranty of
2969 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
2970 GNU General Public License for more details.
2972 You should have received a copy of the GNU General Public License
2973 along with this program; if not, write to the Free Software
2974 Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
2977 Also add information on how to contact you by electronic and paper mail.
2979 If the program is interactive, make it output a short notice like this
2980 when it starts in an interactive mode:
2983 Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
2984 Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
2985 This is free software, and you are welcome to redistribute it
2986 under certain conditions; type `show c' for details.
2989 The hypothetical commands @samp{show w} and @samp{show c} should show
2990 the appropriate parts of the General Public License. Of course, the
2991 commands you use may be called something other than @samp{show w} and
2992 @samp{show c}; they could even be mouse-clicks or menu items---whatever
2995 You should also get your employer (if you work as a programmer) or your
2996 school, if any, to sign a ``copyright disclaimer'' for the program, if
2997 necessary. Here is a sample; alter the names:
3000 Yoyodyne, Inc., hereby disclaims all copyright interest in the program
3001 `Gnomovision' (which makes passes at compilers) written by James Hacker.
3003 @var{signature of Ty Coon}, 1 April 1989
3004 Ty Coon, President of Vice
3007 This General Public License does not permit incorporating your program into
3008 proprietary programs. If your program is a subroutine library, you may
3009 consider it more useful to permit linking proprietary applications with the
3010 library. If this is what you want to do, use the GNU Library General
3011 Public License instead of this License.
3014 @node Index, , Copying, Top