The GNU Awk User's Guide - Field Splitting Summary

Go to the first, previous, next, last section, table of contents.

Field Splitting Summary

According to the POSIX standard, awk is supposed to behave as if each record is split into fields at the time that it is read. In particular, this means that you can change the value of FS after a record is read, and the value of the fields (i.e. how they were split) should reflect the old value of FS, not the new one.

However, many implementations of awk do not work this way. Instead, they defer splitting the fields until a field is actually referenced. The fields will be split using the current value of FS! (d.c.) This behavior can be difficult to diagnose. The following example illustrates the difference between the two methods. (The sed(6) command prints just the first line of `/etc/passwd'.)

sed 1q /etc/passwd | awk '{ FS = ":" ; print $1 }'

will usually print

root

on an incorrect implementation of awk, while gawk will print something like

root:nSijPlPhZZwgE:0:0:Root:/:

The following table summarizes how fields are split, based on the value of FS. (`==' means "is equal to.")

FS == " ": Fields are separated by runs of whitespace. Leading and trailing whitespace are ignored. This is the default.
FS == any other single character: Fields are separated by each occurrence of the character. Multiple successive occurrences delimit empty fields, as do leading and trailing occurrences. The character can even be a regexp metacharacter; it does not need to be escaped.
FS == regexp: Fields are separated by occurrences of characters that match regexp. Leading and trailing matches of regexp delimit empty fields.
FS == "": Each individual character in the record becomes a separate field.

Go to the first, previous, next, last section, table of contents.