1 .\" $OpenBSD: awk.1,v 1.44 2015/09/14 20:06:58 schwarze Exp $
3 .\" Copyright (C) Lucent Technologies 1997
4 .\" All Rights Reserved
6 .\" Permission to use, copy, modify, and distribute this software and
7 .\" its documentation for any purpose and without fee is hereby
8 .\" granted, provided that the above copyright notice appear in all
9 .\" copies and that both that the copyright notice and this
10 .\" permission notice and warranty disclaimer appear in supporting
11 .\" documentation, and that the name Lucent Technologies or any of
12 .\" its entities not be used in advertising or publicity pertaining
13 .\" to distribution of the software without specific, written prior
16 .\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
17 .\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
18 .\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
19 .\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
20 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
21 .\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
22 .\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
31 .Nd pattern-directed scanning and processing language
38 .Op Fl v Ar var Ns = Ns Ar value
39 .Op Ar prog | Fl f Ar progfile
45 for lines that match any of a set of patterns specified literally in
47 or in one or more files specified as
49 With each pattern there can be an associated action that will be performed
53 Each line is matched against the
54 pattern portion of every pattern-action statement;
55 the associated action is performed for each matched pattern.
58 means the standard input.
62 .Ar var Ns = Ns Ar value
63 is treated as an assignment, not a filename,
64 and is executed at the time it would have been opened if it were a filename.
66 The options are as follows:
67 .Bl -tag -width "-safe "
75 A value greater than 1 causes
77 to dump core on fatal errors.
79 Define the input field separator to be the regular expression
82 Read program code from the specified file
84 instead of from the command line.
91 .Ar cmd | Ic getline ,
95 and access to the environment
97 see the section on variables below).
99 .Pq and not very reliable
105 Print the version number of
107 to standard output and exit.
108 .It Fl v Ar var Ns = Ns Ar value
118 options may be present.
121 The input is normally made up of input lines
123 separated by newlines, or by the value of
127 is null, then any number of blank lines are used as the record separator,
128 and newlines are used as field separators
129 (in addition to the value of
131 This is convenient when working with multi-line records.
133 An input line is normally made up of fields separated by whitespace,
134 or by the extended regular expression
137 The fields are denoted
141 refers to the entire line.
144 is null, the input line is split into one field per character.
145 While both gawk and mawk have the same behavior, it is unspecified in the
150 is a single space, then leading and trailing blank and newline characters are
152 Fields are delimited by one or more blank or newline characters.
153 A blank character is a space or a tab.
156 is a single character, other than space, fields are delimited by each single
157 occurrence of that character.
160 variable defaults to a single space.
162 Normally, any number of blanks separate fields.
163 In order to set the field separator to a single blank, use the
165 option with a value of
167 If a field separator of
173 had been specified and uses
175 as the field separator.
176 In order to use a literal
178 as the field separator, use the
180 option with a value of
183 A pattern-action statement has the form
185 .D1 Ar pattern Ic \&{ Ar action Ic \&}
188 .Ic \&{ Ar action Ic \&}
189 means print the line;
190 a missing pattern always matches.
191 Pattern-action statements are separated by newlines or semicolons.
193 Newlines are permitted after a terminating statement or following a comma
206 or after the closing parenthesis of an
212 Additionally, a backslash
214 can be used to escape a newline between tokens.
216 An action is a sequence of statements.
217 A statement can be one of the following:
219 .Bl -tag -width Ds -offset indent -compact
220 .It Ic if Ar ( expression ) Ar statement Op Ic else Ar statement
221 .It Ic while Ar ( expression ) Ar statement
222 .It Ic for Ar ( expression ; expression ; expression ) statement
223 .It Ic for Ar ( var Ic in Ar array ) statement
224 .It Ic do Ar statement Ic while Ar ( expression )
233 .Ar var No = Ar expression
236 .Op Ar expression-list
237 .Op > Ns Ar expression
239 .It Xo Ic printf Ar format
240 .Op Ar ... , expression-list
241 .Op > Ns Ar expression
243 .It Ic return Op Ar expression
245 .No # skip remaining patterns on this input line
248 .No # skip rest of this file, open next, start at top
252 .Ar array Ic \&[ Ar expression Ic \&]
254 .No # delete an array element
256 .It Xo Ic delete Ar array
257 .No # delete all elements of array
261 .No # exit immediately; status is Ar expression
265 Statements are terminated by
266 semicolons, newlines or right braces.
271 String constants are quoted
273 with the usual C escapes recognized within
276 for a complete list of these).
277 Expressions take on string or numeric values as appropriate,
278 and are built using the operators
282 .Pq indicated by whitespace .
284 .Ic \&! ++ \-\- += \-= *= /= %= ^=
285 .Ic > >= < <= == != ?\&:
286 are also available in expressions.
287 Variables may be scalars, array elements
291 Variables are initialized to the null string.
292 Array subscripts may be any string,
293 not necessarily numeric;
294 this allows for a form of associative memory.
295 Multiple subscripts such as
297 are permitted; the constituents are concatenated,
298 separated by the value of
300 .Pq see the section on variables below .
304 statement prints its arguments on the standard output
309 is present or on a pipe if
311 is present), separated by the current output field separator,
312 and terminated by the output record separator.
316 may be literal names or parenthesized expressions;
317 identical string values in different statements denote
321 statement formats its expression list according to the format
325 Patterns are arbitrary Boolean combinations
328 of regular expressions and
329 relational expressions.
331 supports extended regular expressions
335 for more information on regular expressions.
336 Isolated regular expressions
337 in a pattern apply to the entire line.
338 Regular expressions may also occur in
339 relational expressions, using the operators
344 is a constant regular expression;
345 any string (constant or variable) may be used
346 as a regular expression, except in the position of an isolated regular expression
349 A pattern may consist of two patterns separated by a comma;
350 in this case, the action is performed for all lines
351 from an occurrence of the first pattern
352 through an occurrence of the second.
354 A relational expression is one of the following:
356 .Bl -tag -width Ds -offset indent -compact
357 .It Ar expression matchop regular-expression
358 .It Ar expression relop expression
359 .It Ar expression Ic in Ar array-name
361 .Ar expr , expr , \&... Ns Ic \&) in
368 is any of the six relational operators in C, and a
376 A conditional is an arithmetic expression,
377 a relational expression,
378 or a Boolean combination
385 may be used to capture control before the first input line is read
390 do not combine with other patterns.
392 Variable names with special meanings:
394 .Bl -tag -width "FILENAME " -compact
396 Argument count, assignable.
398 Argument array, assignable;
399 non-null members are taken as filenames.
401 Conversion format when converting numbers
405 Array of environment variables; subscripts are names.
407 The name of the current input file.
409 Ordinal number of the current record in the current file.
411 Regular expression used to separate fields; also settable
415 Number of fields in the current record.
417 can be used to obtain the value of the last field in the current record.
419 Ordinal number of the current record.
421 Output format for numbers (default
424 Output field separator (default blank).
426 Output record separator (default newline).
428 The length of the string matched by the
432 Input record separator (default newline).
434 The starting position of the string matched by the
438 Separates multiple subscripts (default 034).
441 The awk language has a variety of built-in functions:
442 arithmetic, string, input/output, general, and bit-operation.
444 Functions may be defined (at the position of a pattern-action statement)
447 .Dl function foo(a, b, c) { ...; return x }
449 Parameters are passed by value if scalar, and by reference if array name;
450 functions may be called recursively.
451 Parameters are local to the function; all other variables are global.
452 Thus local variables may be created by providing excess parameters in
453 the function definition.
454 .Ss Arithmetic Functions
455 .Bl -tag -width "atan2(y, x)"
457 Return the arctangent of
467 Return the exponential of
472 truncated to an integer value.
474 Return the natural logarithm of
477 Return a random number,
481 .Pf 0 \*(Le Fa n No \*(Lt 1 .
490 Return the square root of
497 and returns the previous seed.
500 is omitted, the time of day is used instead.
503 .Bl -tag -width "split(s, a, fs)"
507 except that all occurrences of the regular expression are replaced.
509 returns the number of replacements.
515 occurs, or 0 if it does not.
522 if no argument is given.
526 where the regular expression
528 occurs, or 0 if it does not.
531 is set to the starting position of the matched string
532 .Pq which is the same as the returned value
533 or zero if no match is found.
536 is set to the length of the matched string,
537 or \-1 if no match is found.
542 .Va a[1] , a[2] , ... , a[n]
545 The separation is done with the regular expression
547 or with the field separator
552 An empty string as field separator splits the string
553 into one array element per character.
554 .It Fn sprintf fmt expr ...
555 The string resulting from formatting
564 for the first occurrence of the regular expression
577 is replaced in string
579 with regular expression
581 A literal ampersand can be specified by preceding it with two backslashes
583 A literal backslash can be specified by preceding it with another backslash
586 returns the number of replacements.
592 that begins at position
599 specifies more characters than are left in the string,
600 the length of the substring is limited by the length of
605 with all upper-case characters translated to their
606 corresponding lower-case equivalents.
610 with all lower-case characters translated to their
611 corresponding upper-case equivalents.
613 .Ss Input/Output and General Functions
614 .Bl -tag -width "getline [var] < file"
616 Closes the file or pipe
619 should match the string that was used to open the file or pipe.
620 .It Ar cmd | Ic getline Op Va var
621 Read a record of input from a stream piped from the output of
625 is omitted, the variables
633 If the stream is not open, it is opened.
634 As long as the stream remains open, subsequent calls
635 will read subsequent records from the stream.
636 The stream remains open until explicitly closed with a call to
639 returns 1 for a successful input, 0 for end of file, and \-1 for an error.
641 Flushes any buffered output for the file or pipe
643 or all open files or pipes if
647 should match the string that was used to open the file or pipe.
651 to the next input record from the current input file.
660 returns 1 for a successful input, 0 for end of file, and \-1 for an error.
661 .It Ic getline Va var
673 returns 1 for a successful input, 0 for end of file, and \-1 for an error.
675 .Ic getline Op Va var
680 to the next record from
684 is omitted, the variables
694 is not open, it is opened.
695 As long as the stream remains open, subsequent calls will read subsequent
699 remains open until explicitly closed with a call to
704 and returns its exit status.
706 .Ss Bit-Operation Functions
707 .Bl -tag -width "lshift(a, b)"
709 Returns the bitwise complement of integer argument x.
711 Performs a bitwise AND on all arguments provided, as integers.
712 There must be at least two values.
714 Performs a bitwise OR on all arguments provided, as integers.
715 There must be at least two values.
717 Performs a bitwise Exclusive-OR on all arguments provided, as integers.
718 There must be at least two values.
720 Returns integer argument x shifted by n bits to the left.
722 Returns integer argument x shifted by n bits to the right.
729 expression can modify the exit status.
731 Print lines longer than 72 characters:
735 Print first two fields in opposite order:
739 Same, with input fields separated by comma and/or blanks and tabs:
740 .Bd -literal -offset indent
741 BEGIN { FS = ",[ \et]*|[ \et]+" }
745 Add up first column, print sum and average:
746 .Bd -literal -offset indent
748 END { print "sum is", s, " average is", s/NR }
751 Print all lines between start/stop pairs:
756 .Bd -literal -offset indent
757 BEGIN { # Simulate echo(1)
758 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
763 Print an error message to standard error:
764 .Bd -literal -offset indent
765 { print "error!" > "/dev/stderr" }
777 .%T The AWK Programming Language
780 .%O ISBN 0-201-07981-X
785 utility is compliant with the
790 does not support {n,m} pattern matching.
797 as well as the commands
798 .Cm fflush , compl , and , or ,
799 .Cm xor , lshift , rshift ,
800 are extensions to that specification.
807 There are no explicit conversions between numbers and strings.
808 To force an expression to be treated as a number add 0 to it;
809 to force it to be treated as a string concatenate
813 The scope rules for variables in functions are a botch;
815 .Sh DEPRECATED BEHAVIOR
816 One True Awk has accpeted
820 to make it easier to specify tabs as the separator character.
821 Upstream One True Awk has deprecated this wart in the name of better
822 compatibility with other awk implementations like gawk and mawk.
829 However, since One True Awk used strtod to convert strings to floats, and since
831 is a valid hexadecimal representation of a floating point number,
835 has accepted this notation as an extension since One True Awk was imported in
837 Upstream One True Awk has restored the historical behavior for better
838 compatibility between the different awk implementations.
839 Both gawk and mawk already behave similarly.
843 will no longer accept this extension.
848 sets the locale for many years to match the environment it was running in.
849 This lead to pattern ranges, like
851 sometimes matching lower case characters in some locales.
852 This misbehavior was never in upstream One True Awk and has been removed as a