The GNU Awk User's Guide - Leftmost Longest

Go to the first, previous, next, last section, table of contents.

How Much Text Matches?

Consider the following example:

echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'

This example uses the sub function (which we haven't discussed yet, see section Built-in Functions for String Manipulation) to make a change to the input record. Here, the regexp /a+/ indicates "one or more `a' characters," and the replacement text is `<A>'.

The input contains four `a' characters. What will the output be? In other words, how many is "one or more"---will awk match two, three, or all four `a' characters?

The answer is, awk (and POSIX) regular expressions always match the leftmost, longest sequence of input characters that can match. Thus, in this example, all four `a' characters are replaced with `<A>'.

$ echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
-| <A>bcd

For simple match/no-match tests, this is not so important. But when doing regexp-based field and record splitting, and text matching and substitutions with the match, sub, gsub, and gensub functions, it is very important. Understanding this principle is also important for regexp-based record and field splitting (see section How Input is Split into Records, and also see section Specifying How Fields are Separated).

Go to the first, previous, next, last section, table of contents.