The previous
subsection
discussed the use of single characters or simple strings as the
value of FS
.
More generally, the value of FS
may be a string containing any
regular expression. In this case, each match in the record for the regular
expression separates fields. For example, the assignment:
FS = ", \t"
makes every area of an input line that consists of a comma followed by a space and a tab, into a field separator. (`\t' is an escape sequence that stands for a tab; see section Escape Sequences, for the complete list of similar escape sequences.)
For a less trivial example of a regular expression, suppose you want
single spaces to separate fields the way single commas were used above.
You can set FS
to "[ ]"
(left bracket, space, right
bracket). This regular expression matches a single space and nothing else
(see section Regular Expressions).
There is an important difference between the two cases of `FS = " "'
(a single space) and `FS = "[ \t\n]+"' (left bracket, space,
backslash, "t", backslash, "n", right bracket, which is a regular
expression matching one or more spaces, tabs, or newlines). For both
values of FS
, fields are separated by runs of spaces, tabs
and/or newlines. However, when the value of FS
is "
"
, awk
will first strip leading and trailing whitespace from
the record, and then decide where the fields are.
For example, the following pipeline prints `b':
$ echo ' a b c d ' | awk '{ print $2 }' -| b
However, this pipeline prints `a' (note the extra spaces around each letter):
$ echo ' a b c d ' | awk 'BEGIN { FS = "[ \t]+" } > { print $2 }' -| a
In this case, the first field is null, or empty.
The stripping of leading and trailing whitespace also comes into
play whenever $0
is recomputed. For instance, study this pipeline:
$ echo ' a b c d' | awk '{ print; $2 = $2; print }' -| a b c d -| a b c d
The first print
statement prints the record as it was read,
with leading whitespace intact. The assignment to $2
rebuilds
$0
by concatenating $1
through $NF
together,
separated by the value of OFS
. Since the leading whitespace
was ignored when finding $1
, it is not part of the new $0
.
Finally, the last print
statement prints the new $0
.
Go to the first, previous, next, last section, table of contents.