1 .\" $OpenBSD: awk.1,v 1.44 2015/09/14 20:06:58 schwarze Exp $
3 .\" Copyright (C) Lucent Technologies 1997
4 .\" All Rights Reserved
6 .\" Permission to use, copy, modify, and distribute this software and
7 .\" its documentation for any purpose and without fee is hereby
8 .\" granted, provided that the above copyright notice appear in all
9 .\" copies and that both that the copyright notice and this
10 .\" permission notice and warranty disclaimer appear in supporting
11 .\" documentation, and that the name Lucent Technologies or any of
12 .\" its entities not be used in advertising or publicity pertaining
13 .\" to distribution of the software without specific, written prior
16 .\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
17 .\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
18 .\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
19 .\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
20 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
21 .\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
22 .\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
26 .Dd $Mdocdate: June 6 2020 $
31 .Nd pattern-directed scanning and processing language
38 .Op Fl v Ar var Ns = Ns Ar value
39 .Op Ar prog | Fl f Ar progfile
45 for lines that match any of a set of patterns specified literally in
47 or in one or more files specified as
49 With each pattern there can be an associated action that will be performed
53 Each line is matched against the
54 pattern portion of every pattern-action statement;
55 the associated action is performed for each matched pattern.
58 means the standard input.
62 .Ar var Ns = Ns Ar value
63 is treated as an assignment, not a filename,
64 and is executed at the time it would have been opened if it were a filename.
66 The options are as follows:
67 .Bl -tag -width "-safe "
75 A value greater than 1 causes
77 to dump core on fatal errors.
79 Define the input field separator to be the regular expression
82 Read program code from the specified file
84 instead of from the command line.
91 .Ar cmd | Ic getline ,
95 and access to the environment
97 see the section on variables below).
99 .Pq and not very reliable
105 Print the version number of
107 to standard output and exit.
108 .It Fl v Ar var Ns = Ns Ar value
118 options may be present.
121 The input is normally made up of input lines
123 separated by newlines, or by the value of
127 is null, then any number of blank lines are used as the record separator,
128 and newlines are used as field separators
129 (in addition to the value of
131 This is convenient when working with multi-line records.
133 An input line is normally made up of fields separated by whitespace,
134 or by the regular expression
136 The fields are denoted
140 refers to the entire line.
143 is null, the input line is split into one field per character.
145 Normally, any number of blanks separate fields.
146 In order to set the field separator to a single blank, use the
148 option with a value of
150 If a field separator of
156 had been specified and uses
158 as the field separator.
159 In order to use a literal
161 as the field separator, use the
163 option with a value of
166 A pattern-action statement has the form
168 .D1 Ar pattern Ic \&{ Ar action Ic \&}
171 .Ic \&{ Ar action Ic \&}
172 means print the line;
173 a missing pattern always matches.
174 Pattern-action statements are separated by newlines or semicolons.
176 Newlines are permitted after a terminating statement or following a comma
189 or after the closing parenthesis of an
195 Additionally, a backslash
197 can be used to escape a newline between tokens.
199 An action is a sequence of statements.
200 A statement can be one of the following:
202 .Bl -tag -width Ds -offset indent -compact
203 .It Ic if Ar ( expression ) Ar statement Op Ic else Ar statement
204 .It Ic while Ar ( expression ) Ar statement
205 .It Ic for Ar ( expression ; expression ; expression ) statement
206 .It Ic for Ar ( var Ic in Ar array ) statement
207 .It Ic do Ar statement Ic while Ar ( expression )
216 .Ar var No = Ar expression
219 .Op Ar expression-list
220 .Op > Ns Ar expression
222 .It Xo Ic printf Ar format
223 .Op Ar ... , expression-list
224 .Op > Ns Ar expression
226 .It Ic return Op Ar expression
228 .No # skip remaining patterns on this input line
231 .No # skip rest of this file, open next, start at top
235 .Ar array Ic \&[ Ar expression Ic \&]
237 .No # delete an array element
239 .It Xo Ic delete Ar array
240 .No # delete all elements of array
244 .No # exit immediately; status is Ar expression
248 Statements are terminated by
249 semicolons, newlines or right braces.
254 String constants are quoted
256 with the usual C escapes recognized within
259 for a complete list of these).
260 Expressions take on string or numeric values as appropriate,
261 and are built using the operators
265 .Pq indicated by whitespace .
267 .Ic \&! ++ \-\- += \-= *= /= %= ^=
268 .Ic > >= < <= == != ?\&:
269 are also available in expressions.
270 Variables may be scalars, array elements
274 Variables are initialized to the null string.
275 Array subscripts may be any string,
276 not necessarily numeric;
277 this allows for a form of associative memory.
278 Multiple subscripts such as
280 are permitted; the constituents are concatenated,
281 separated by the value of
283 .Pq see the section on variables below .
287 statement prints its arguments on the standard output
292 is present or on a pipe if
294 is present), separated by the current output field separator,
295 and terminated by the output record separator.
299 may be literal names or parenthesized expressions;
300 identical string values in different statements denote
304 statement formats its expression list according to the format
308 Patterns are arbitrary Boolean combinations
311 of regular expressions and
312 relational expressions.
314 supports extended regular expressions
318 for more information on regular expressions.
319 Isolated regular expressions
320 in a pattern apply to the entire line.
321 Regular expressions may also occur in
322 relational expressions, using the operators
327 is a constant regular expression;
328 any string (constant or variable) may be used
329 as a regular expression, except in the position of an isolated regular expression
332 A pattern may consist of two patterns separated by a comma;
333 in this case, the action is performed for all lines
334 from an occurrence of the first pattern
335 through an occurrence of the second.
337 A relational expression is one of the following:
339 .Bl -tag -width Ds -offset indent -compact
340 .It Ar expression matchop regular-expression
341 .It Ar expression relop expression
342 .It Ar expression Ic in Ar array-name
344 .Ar expr , expr , \&... Ns Ic \&) in
351 is any of the six relational operators in C, and a
359 A conditional is an arithmetic expression,
360 a relational expression,
361 or a Boolean combination
368 may be used to capture control before the first input line is read
373 do not combine with other patterns.
375 Variable names with special meanings:
377 .Bl -tag -width "FILENAME " -compact
379 Argument count, assignable.
381 Argument array, assignable;
382 non-null members are taken as filenames.
384 Conversion format when converting numbers
388 Array of environment variables; subscripts are names.
390 The name of the current input file.
392 Ordinal number of the current record in the current file.
394 Regular expression used to separate fields; also settable
398 Number of fields in the current record.
400 can be used to obtain the value of the last field in the current record.
402 Ordinal number of the current record.
404 Output format for numbers (default
407 Output field separator (default blank).
409 Output record separator (default newline).
411 The length of the string matched by the
415 Input record separator (default newline).
417 The starting position of the string matched by the
421 Separates multiple subscripts (default 034).
424 The awk language has a variety of built-in functions:
425 arithmetic, string, input/output, general, and bit-operation.
427 Functions may be defined (at the position of a pattern-action statement)
430 .Dl function foo(a, b, c) { ...; return x }
432 Parameters are passed by value if scalar, and by reference if array name;
433 functions may be called recursively.
434 Parameters are local to the function; all other variables are global.
435 Thus local variables may be created by providing excess parameters in
436 the function definition.
437 .Ss Arithmetic Functions
438 .Bl -tag -width "atan2(y, x)"
440 Return the arctangent of
450 Return the exponential of
455 truncated to an integer value.
457 Return the natural logarithm of
460 Return a random number,
464 .Pf 0 \*(Le Fa n No \*(Lt 1 .
473 Return the square root of
480 and returns the previous seed.
483 is omitted, the time of day is used instead.
486 .Bl -tag -width "split(s, a, fs)"
490 except that all occurrences of the regular expression are replaced.
492 returns the number of replacements.
498 occurs, or 0 if it does not.
505 if no argument is given.
509 where the regular expression
511 occurs, or 0 if it does not.
514 is set to the starting position of the matched string
515 .Pq which is the same as the returned value
516 or zero if no match is found.
519 is set to the length of the matched string,
520 or \-1 if no match is found.
525 .Va a[1] , a[2] , ... , a[n]
528 The separation is done with the regular expression
530 or with the field separator
535 An empty string as field separator splits the string
536 into one array element per character.
537 .It Fn sprintf fmt expr ...
538 The string resulting from formatting
547 for the first occurrence of the regular expression
560 is replaced in string
562 with regular expression
564 A literal ampersand can be specified by preceding it with two backslashes
566 A literal backslash can be specified by preceding it with another backslash
569 returns the number of replacements.
575 that begins at position
582 specifies more characters than are left in the string,
583 the length of the substring is limited by the length of
588 with all upper-case characters translated to their
589 corresponding lower-case equivalents.
593 with all lower-case characters translated to their
594 corresponding upper-case equivalents.
596 .Ss Input/Output and General Functions
597 .Bl -tag -width "getline [var] < file"
599 Closes the file or pipe
602 should match the string that was used to open the file or pipe.
603 .It Ar cmd | Ic getline Op Va var
604 Read a record of input from a stream piped from the output of
608 is omitted, the variables
616 If the stream is not open, it is opened.
617 As long as the stream remains open, subsequent calls
618 will read subsequent records from the stream.
619 The stream remains open until explicitly closed with a call to
622 returns 1 for a successful input, 0 for end of file, and \-1 for an error.
624 Flushes any buffered output for the file or pipe
626 or all open files or pipes if
630 should match the string that was used to open the file or pipe.
634 to the next input record from the current input file.
643 returns 1 for a successful input, 0 for end of file, and \-1 for an error.
644 .It Ic getline Va var
656 returns 1 for a successful input, 0 for end of file, and \-1 for an error.
658 .Ic getline Op Va var
663 to the next record from
667 is omitted, the variables
677 is not open, it is opened.
678 As long as the stream remains open, subsequent calls will read subsequent
682 remains open until explicitly closed with a call to
687 and returns its exit status.
689 .Ss Bit-Operation Functions
690 .Bl -tag -width "lshift(a, b)"
692 Returns the bitwise complement of integer argument x.
694 Performs a bitwise AND on all arguments provided, as integers.
695 There must be at least two values.
697 Performs a bitwise OR on all arguments provided, as integers.
698 There must be at least two values.
700 Performs a bitwise Exclusive-OR on all arguments provided, as integers.
701 There must be at least two values.
703 Returns integer argument x shifted by n bits to the left.
705 Returns integer argument x shifted by n bits to the right.
712 expression can modify the exit status.
714 Print lines longer than 72 characters:
718 Print first two fields in opposite order:
722 Same, with input fields separated by comma and/or blanks and tabs:
723 .Bd -literal -offset indent
724 BEGIN { FS = ",[ \et]*|[ \et]+" }
728 Add up first column, print sum and average:
729 .Bd -literal -offset indent
731 END { print "sum is", s, " average is", s/NR }
734 Print all lines between start/stop pairs:
739 .Bd -literal -offset indent
740 BEGIN { # Simulate echo(1)
741 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
746 Print an error message to standard error:
747 .Bd -literal -offset indent
748 { print "error!" > "/dev/stderr" }
760 .%T The AWK Programming Language
763 .%O ISBN 0-201-07981-X
768 utility is compliant with the
773 does not support {n,m} pattern matching.
780 as well as the commands
781 .Cm fflush , compl , and , or ,
782 .Cm xor , lshift , rshift ,
783 are extensions to that specification.
790 There are no explicit conversions between numbers and strings.
791 To force an expression to be treated as a number add 0 to it;
792 to force it to be treated as a string concatenate
796 The scope rules for variables in functions are a botch;