11 .IP "" "\w'\fB\\$1\ \ \fP'u"
15 .CT 1 files prog_other
17 awk \- pattern-directed scanning and processing language
41 for lines that match any of a set of patterns specified literally in
43 or in one or more files
48 there can be an associated action that will be performed
52 Each line is matched against the
53 pattern portion of every pattern-action statement;
54 the associated action is performed for each matched pattern.
57 means the standard input.
62 is treated as an assignment, not a filename,
63 and is executed at the time it would have been opened if it were a filename.
68 is an assignment to be done before
73 options may be present.
77 option defines the input field separator to be the regular expression
80 An input line is normally made up of fields separated by white space,
81 or by the regular expression
83 The fields are denoted
88 refers to the entire line.
91 is null, the input line is split into one field per character.
93 A pattern-action statement has the form:
95 .IB pattern " { " action " }
100 a missing pattern always matches.
101 Pattern-action statements are separated by newlines or semicolons.
103 An action is a sequence of statements.
104 A statement can be one of the following:
107 .ta \w'\f(CWdelete array[expression]\fR'u
111 if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
112 while(\fI expression \fP)\fI statement\fP
113 for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
114 for(\fI var \fPin\fI array \fP)\fI statement\fP
115 do\fI statement \fPwhile(\fI expression \fP)
118 {\fR [\fP\fI statement ... \fP\fR] \fP}
119 \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
120 print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
121 printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
122 return\fR [ \fP\fIexpression \fP\fR]\fP
123 next #\fR skip remaining patterns on this input line\fP
124 nextfile #\fR skip rest of this file, open next, start at top\fP
125 delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
126 delete\fI array\fP #\fR delete all elements of array\fP
127 exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
133 Statements are terminated by
134 semicolons, newlines or right braces.
139 String constants are quoted \&\f(CW"\ "\fR,
140 with the usual C escapes recognized within.
141 Expressions take on string or numeric values as appropriate,
142 and are built using the operators
144 (exponentiation), and concatenation (indicated by white space).
147 ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
148 are also available in expressions.
149 Variables may be scalars, array elements
153 Variables are initialized to the null string.
154 Array subscripts may be any string,
155 not necessarily numeric;
156 this allows for a form of associative memory.
157 Multiple subscripts such as
159 are permitted; the constituents are concatenated,
160 separated by the value of
165 statement prints its arguments on the standard output
170 is present or on a pipe if
172 is present), separated by the current output field separator,
173 and terminated by the output record separator.
177 may be literal names or parenthesized expressions;
178 identical string values in different statements denote
182 statement formats its expression list according to the
186 The built-in function
188 closes the file or pipe
190 The built-in function
192 flushes any buffered output for the file or pipe
195 The mathematical functions
204 Other built-in functions:
208 the length of its argument
210 number of elements in an array for an array argument,
216 random number on [0,1).
221 and returns the previous seed.
224 truncates to an integer value.
226 \fBsubstr(\fIs\fB, \fIm\fR [\fB, \fIn\^\fR]\fB)\fR
231 that begins at position
236 use the rest of the string.
238 .BI index( s , " t" )
243 occurs, or 0 if it does not.
245 .BI match( s , " r" )
248 where the regular expression
250 occurs, or 0 if it does not.
255 are set to the position and length of the matched string.
257 \fBsplit(\fIs\fB, \fIa \fR[\fB, \fIfs\^\fR]\fB)\fR
267 The separation is done with the regular expression
269 or with the field separator
274 An empty string as field separator splits the string
275 into one array element per character.
277 \fBsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
280 for the first occurrence of the regular expression
290 \fBgsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
293 except that all occurrences of the regular expression
298 return the number of replacements.
300 \fBgensub(\fIpat\fB, \fIrepl\fB, \fIhow\fR [\fB, \fItarget\fR]\fB)\fR
301 replaces instances of
309 is \fB"g"\fR or \fB"G"\fR, do so globally. Otherwise,
311 is a number indicating which occurrence to replace. If no
315 Return the resulting string;
319 .BI sprintf( fmt , " expr" , " ...\fB)
320 the string resulting from formatting
328 returns the current date and time as a standard
329 ``seconds since the epoch'' value.
331 .BI strftime( fmt ", " timestamp\^ )
334 (a value in seconds since the epoch)
337 which is a format string as supported by
343 may be omitted; if no
345 the current time of day is used, and if no
347 a default format of \fB"%a %b %e %H:%M:%S %Z %Y"\fR is used.
352 and returns its exit status. This will be \-1 upon error,
354 exit status upon a normal exit,
357 upon death-by-signal, where
359 is the number of the murdering signal,
362 if there was a core dump.
367 with all upper-case characters translated to their
368 corresponding lower-case equivalents.
373 with all lower-case characters translated to their
374 corresponding upper-case equivalents.
381 to the next input record from the current input file;
386 to the next record from
401 returns the next line of output from
405 returns 1 for a successful input,
406 0 for end of file, and \-1 for an error.
416 peform the corresponding bitwise operations on their
417 operands, which are first truncated to integer.
419 Patterns are arbitrary Boolean combinations
422 of regular expressions and
423 relational expressions.
424 Regular expressions are as in
428 Isolated regular expressions
429 in a pattern apply to the entire line.
430 Regular expressions may also occur in
431 relational expressions, using the operators
436 is a constant regular expression;
437 any string (constant or variable) may be used
438 as a regular expression, except in the position of an isolated regular expression
441 A pattern may consist of two patterns separated by a comma;
442 in this case, the action is performed for all lines
443 from an occurrence of the first pattern
444 though an occurrence of the second.
446 A relational expression is one of the following:
448 .I expression matchop regular-expression
450 .I expression relop expression
452 .IB expression " in " array-name
454 .BI ( expr , expr,... ") in " array-name
458 is any of the six relational operators in C,
467 A conditional is an arithmetic expression,
468 a relational expression,
469 or a Boolean combination
476 may be used to capture control before the first input line is read
481 do not combine with other patterns.
482 They may appear multiple times in a program and execute
483 in the order they are read by
486 Variable names with special meanings:
490 argument count, assignable.
493 argument array, assignable;
494 non-null members are taken as filenames.
497 conversion format used when converting numbers
502 array of environment variables; subscripts are names.
505 the name of the current input file.
508 ordinal number of the current record in the current file.
511 regular expression used to separate fields; also settable
516 number of fields in the current record.
519 ordinal number of the current record.
522 output format for numbers (default
526 output field separator (default space).
529 output record separator (default newline).
532 the length of a string matched by
536 input record separator (default newline).
537 If empty, blank lines separate records.
538 If more than one character long,
540 is treated as a regular expression, and records are
541 separated by text matching the expression.
544 the start position of a string matched by
548 separates multiple subscripts (default 034).
551 Functions may be defined (at the position of a pattern-action statement) thus:
554 function foo(a, b, c) { ...; return x }
556 Parameters are passed by value if scalar and by reference if array name;
557 functions may be called recursively.
558 Parameters are local to the function; all other variables are global.
559 Thus local variables may be created by providing excess parameters in
560 the function definition.
561 .SH ENVIRONMENT VARIABLES
564 is set in the environment, then
566 follows the POSIX rules for
570 with respect to consecutive backslashes and ampersands.
576 Print lines longer than 72 characters.
581 Print first two fields in opposite order.
584 BEGIN { FS = ",[ \et]*|[ \et]+" }
589 Same, with input fields separated by comma and/or spaces and tabs.
594 END { print "sum is", s, " average is", s/NR }
599 Add up first column, print sum and average.
604 Print all lines between start/stop pairs.
608 BEGIN { # Simulate echo(1)
609 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
619 A. V. Aho, B. W. Kernighan, P. J. Weinberger,
620 .IR "The AWK Programming Language" ,
621 Addison-Wesley, 1988. ISBN 0-201-07981-X.
623 There are no explicit conversions between numbers and strings.
624 To force an expression to be treated as a number add 0 to it;
625 to force it to be treated as a string concatenate
628 The scope rules for variables in functions are a botch;
631 Only eight-bit characters sets are handled correctly.
632 .SH UNUSUAL FLOATING-POINT VALUES
634 was designed before IEEE 754 arithmetic defined Not-A-Number (NaN)
635 and Infinity values, which are supported by all modern floating-point
644 to convert string values to double-precision floating-point values,
645 modern C libraries also convert strings starting with
649 into infinity and NaN values respectively. This led to strange results,
650 with something like this:
654 echo nancy | awk '{ print $1 + 0 }'
663 now follows GNU AWK, and prefilters string values before attempting
664 to convert them to numbers, as follows:
666 .I "Hexadecimal values"
667 Hexadecimal values (allowed since C99) convert to zero, as they did
675 (case independent) convert to NaN. No others do.
676 (NaNs can have signs.)
683 (case independent) convert to positive and negative infinity, respectively.