1 .\" $OpenBSD: awk.1,v 1.44 2015/09/14 20:06:58 schwarze Exp $
3 .\" Copyright (C) Lucent Technologies 1997
4 .\" All Rights Reserved
6 .\" Permission to use, copy, modify, and distribute this software and
7 .\" its documentation for any purpose and without fee is hereby
8 .\" granted, provided that the above copyright notice appear in all
9 .\" copies and that both that the copyright notice and this
10 .\" permission notice and warranty disclaimer appear in supporting
11 .\" documentation, and that the name Lucent Technologies or any of
12 .\" its entities not be used in advertising or publicity pertaining
13 .\" to distribution of the software without specific, written prior
16 .\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
17 .\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
18 .\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
19 .\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
20 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
21 .\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
22 .\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
29 .Nd pattern-directed scanning and processing language
36 .Op Fl v Ar var Ns = Ns Ar value
37 .Op Ar prog | Fl f Ar progfile
43 for lines that match any of a set of patterns specified literally in
45 or in one or more files specified as
47 With each pattern there can be an associated action that will be performed
51 Each line is matched against the
52 pattern portion of every pattern-action statement;
53 the associated action is performed for each matched pattern.
56 means the standard input.
60 .Ar var Ns = Ns Ar value
61 is treated as an assignment, not a filename,
62 and is executed at the time it would have been opened if it were a filename.
64 The options are as follows:
65 .Bl -tag -width "-safe "
73 A value greater than 1 causes
75 to dump core on fatal errors.
77 Define the input field separator to be the regular expression
80 Read program code from the specified file
82 instead of from the command line.
89 .Ar cmd | Ic getline ,
93 and access to the environment
95 see the section on variables below).
97 .Pq and not very reliable
103 Print the version number of
105 to standard output and exit.
106 .It Fl v Ar var Ns = Ns Ar value
116 options may be present.
119 The input is normally made up of input lines
121 separated by newlines, or by the value of
125 is null, then any number of blank lines are used as the record separator,
126 and newlines are used as field separators
127 (in addition to the value of
129 This is convenient when working with multi-line records.
131 An input line is normally made up of fields separated by whitespace,
132 or by the extended regular expression
135 The fields are denoted
139 refers to the entire line.
142 is null, the input line is split into one field per character.
143 While both gawk and mawk have the same behavior, it is unspecified in the
148 is a single space, then leading and trailing blank and newline characters are
150 Fields are delimited by one or more blank or newline characters.
151 A blank character is a space or a tab.
154 is a single character, other than space, fields are delimited by each single
155 occurrence of that character.
158 variable defaults to a single space.
160 Normally, any number of blanks separate fields.
161 In order to set the field separator to a single blank, use the
163 option with a value of
165 If a field separator of
171 had been specified and uses
173 as the field separator.
174 In order to use a literal
176 as the field separator, use the
178 option with a value of
181 A pattern-action statement has the form
183 .D1 Ar pattern Ic \&{ Ar action Ic \&}
186 .Ic \&{ Ar action Ic \&}
187 means print the line;
188 a missing pattern always matches.
189 Pattern-action statements are separated by newlines or semicolons.
191 Newlines are permitted after a terminating statement or following a comma
204 or after the closing parenthesis of an
210 Additionally, a backslash
212 can be used to escape a newline between tokens.
214 An action is a sequence of statements.
215 A statement can be one of the following:
217 .Bl -tag -width Ds -offset indent -compact
218 .It Ic if Ar ( expression ) Ar statement Op Ic else Ar statement
219 .It Ic while Ar ( expression ) Ar statement
220 .It Ic for Ar ( expression ; expression ; expression ) statement
221 .It Ic for Ar ( var Ic in Ar array ) statement
222 .It Ic do Ar statement Ic while Ar ( expression )
231 .Ar var No = Ar expression
234 .Op Ar expression-list
235 .Op > Ns Ar expression
237 .It Xo Ic printf Ar format
238 .Op Ar ... , expression-list
239 .Op > Ns Ar expression
241 .It Ic return Op Ar expression
243 .No # skip remaining patterns on this input line
246 .No # skip rest of this file, open next, start at top
250 .Ar array Ic \&[ Ar expression Ic \&]
252 .No # delete an array element
254 .It Xo Ic delete Ar array
255 .No # delete all elements of array
259 .No # exit immediately; status is Ar expression
263 Statements are terminated by
264 semicolons, newlines or right braces.
269 String constants are quoted
271 with the usual C escapes recognized within
274 for a complete list of these).
275 Expressions take on string or numeric values as appropriate,
276 and are built using the operators
280 .Pq indicated by whitespace .
282 .Ic \&! ++ \-\- += \-= *= /= %= ^=
283 .Ic > >= < <= == != ?\&:
284 are also available in expressions.
285 Variables may be scalars, array elements
289 Variables are initialized to the null string.
290 Array subscripts may be any string,
291 not necessarily numeric;
292 this allows for a form of associative memory.
293 Multiple subscripts such as
295 are permitted; the constituents are concatenated,
296 separated by the value of
298 .Pq see the section on variables below .
302 statement prints its arguments on the standard output
307 is present or on a pipe if
309 is present), separated by the current output field separator,
310 and terminated by the output record separator.
314 may be literal names or parenthesized expressions;
315 identical string values in different statements denote
319 statement formats its expression list according to the format
323 Patterns are arbitrary Boolean combinations
326 of regular expressions and
327 relational expressions.
329 supports extended regular expressions
333 for more information on regular expressions.
334 Isolated regular expressions
335 in a pattern apply to the entire line.
336 Regular expressions may also occur in
337 relational expressions, using the operators
342 is a constant regular expression;
343 any string (constant or variable) may be used
344 as a regular expression, except in the position of an isolated regular expression
347 A pattern may consist of two patterns separated by a comma;
348 in this case, the action is performed for all lines
349 from an occurrence of the first pattern
350 through an occurrence of the second.
352 A relational expression is one of the following:
354 .Bl -tag -width Ds -offset indent -compact
355 .It Ar expression matchop regular-expression
356 .It Ar expression relop expression
357 .It Ar expression Ic in Ar array-name
359 .Ar expr , expr , \&... Ns Ic \&) in
366 is any of the six relational operators in C, and a
374 A conditional is an arithmetic expression,
375 a relational expression,
376 or a Boolean combination
383 may be used to capture control before the first input line is read
388 do not combine with other patterns.
390 Variable names with special meanings:
392 .Bl -tag -width "FILENAME " -compact
394 Argument count, assignable.
396 Argument array, assignable;
397 non-null members are taken as filenames.
399 Conversion format when converting numbers
403 Array of environment variables; subscripts are names.
405 The name of the current input file.
407 Ordinal number of the current record in the current file.
409 Regular expression used to separate fields; also settable
413 Number of fields in the current record.
415 can be used to obtain the value of the last field in the current record.
417 Ordinal number of the current record.
419 Output format for numbers (default
422 Output field separator (default blank).
424 Output record separator (default newline).
426 The length of the string matched by the
430 Input record separator (default newline).
432 The starting position of the string matched by the
436 Separates multiple subscripts (default 034).
439 The awk language has a variety of built-in functions:
440 arithmetic, string, input/output, general, and bit-operation.
442 Functions may be defined (at the position of a pattern-action statement)
445 .Dl function foo(a, b, c) { ...; return x }
447 Parameters are passed by value if scalar, and by reference if array name;
448 functions may be called recursively.
449 Parameters are local to the function; all other variables are global.
450 Thus local variables may be created by providing excess parameters in
451 the function definition.
452 .Ss Arithmetic Functions
453 .Bl -tag -width "atan2(y, x)"
455 Return the arctangent of
465 Return the exponential of
470 truncated to an integer value.
472 Return the natural logarithm of
475 Return a random number,
479 .Pf 0 \*(Le Fa n No \*(Lt 1 .
488 Return the square root of
495 and returns the previous seed.
498 is omitted, the time of day is used instead.
501 .Bl -tag -width "split(s, a, fs)"
505 except that all occurrences of the regular expression are replaced.
507 returns the number of replacements.
513 occurs, or 0 if it does not.
520 if no argument is given.
524 where the regular expression
526 occurs, or 0 if it does not.
529 is set to the starting position of the matched string
530 .Pq which is the same as the returned value
531 or zero if no match is found.
534 is set to the length of the matched string,
535 or \-1 if no match is found.
540 .Va a[1] , a[2] , ... , a[n]
543 The separation is done with the regular expression
545 or with the field separator
550 An empty string as field separator splits the string
551 into one array element per character.
552 .It Fn sprintf fmt expr ...
553 The string resulting from formatting
562 for the first occurrence of the regular expression
575 is replaced in string
577 with regular expression
579 A literal ampersand can be specified by preceding it with two backslashes
581 A literal backslash can be specified by preceding it with another backslash
584 returns the number of replacements.
590 that begins at position
597 specifies more characters than are left in the string,
598 the length of the substring is limited by the length of
603 with all upper-case characters translated to their
604 corresponding lower-case equivalents.
608 with all lower-case characters translated to their
609 corresponding upper-case equivalents.
611 .Ss Input/Output and General Functions
612 .Bl -tag -width "getline [var] < file"
614 Closes the file or pipe
617 should match the string that was used to open the file or pipe.
618 .It Ar cmd | Ic getline Op Va var
619 Read a record of input from a stream piped from the output of
623 is omitted, the variables
631 If the stream is not open, it is opened.
632 As long as the stream remains open, subsequent calls
633 will read subsequent records from the stream.
634 The stream remains open until explicitly closed with a call to
637 returns 1 for a successful input, 0 for end of file, and \-1 for an error.
639 Flushes any buffered output for the file or pipe
641 or all open files or pipes if
645 should match the string that was used to open the file or pipe.
649 to the next input record from the current input file.
658 returns 1 for a successful input, 0 for end of file, and \-1 for an error.
659 .It Ic getline Va var
671 returns 1 for a successful input, 0 for end of file, and \-1 for an error.
673 .Ic getline Op Va var
678 to the next record from
682 is omitted, the variables
692 is not open, it is opened.
693 As long as the stream remains open, subsequent calls will read subsequent
697 remains open until explicitly closed with a call to
702 and returns its exit status.
704 .Ss Bit-Operation Functions
705 .Bl -tag -width "lshift(a, b)"
707 Returns the bitwise complement of integer argument x.
709 Performs a bitwise AND on all arguments provided, as integers.
710 There must be at least two values.
712 Performs a bitwise OR on all arguments provided, as integers.
713 There must be at least two values.
715 Performs a bitwise Exclusive-OR on all arguments provided, as integers.
716 There must be at least two values.
718 Returns integer argument x shifted by n bits to the left.
720 Returns integer argument x shifted by n bits to the right.
727 expression can modify the exit status.
729 Print lines longer than 72 characters:
733 Print first two fields in opposite order:
737 Same, with input fields separated by comma and/or blanks and tabs:
738 .Bd -literal -offset indent
739 BEGIN { FS = ",[ \et]*|[ \et]+" }
743 Add up first column, print sum and average:
744 .Bd -literal -offset indent
746 END { print "sum is", s, " average is", s/NR }
749 Print all lines between start/stop pairs:
754 .Bd -literal -offset indent
755 BEGIN { # Simulate echo(1)
756 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
761 Print an error message to standard error:
762 .Bd -literal -offset indent
763 { print "error!" > "/dev/stderr" }
775 .%T The AWK Programming Language
778 .%O ISBN 0-201-07981-X
783 utility is compliant with the
788 does not support {n,m} pattern matching.
795 as well as the commands
796 .Cm fflush , compl , and , or ,
797 .Cm xor , lshift , rshift ,
798 are extensions to that specification.
805 There are no explicit conversions between numbers and strings.
806 To force an expression to be treated as a number add 0 to it;
807 to force it to be treated as a string concatenate
811 The scope rules for variables in functions are a botch;
813 .Sh DEPRECATED BEHAVIOR
814 One True Awk has accpeted
818 to make it easier to specify tabs as the separator character.
819 Upstream One True Awk has deprecated this wart in the name of better
820 compatibility with other awk implementations like gawk and mawk.
827 However, since One True Awk used strtod to convert strings to floats, and since
829 is a valid hexadecimal representation of a floating point number,
833 has accepted this notation as an extension since One True Awk was imported in
835 Upstream One True Awk has restored the historical behavior for better
836 compatibility between the different awk implementations.
837 Both gawk and mawk already behave similarly.
841 will no longer accept this extension.
846 sets the locale for many years to match the environment it was running in.
847 This lead to pattern ranges, like
849 sometimes matching lower case characters in some locales.
850 This misbehavior was never in upstream One True Awk and has been removed as a