Go to the first, previous, next, last section, table of contents.


Using Regular Expression Constants

When used on the right hand side of the `~' or `!~' operators, a regexp constant merely stands for the regexp that is to be matched.

Regexp constants (such as /foo/) may be used like simple expressions. When a regexp constant appears by itself, it has the same meaning as if it appeared in a pattern, i.e. `($0 ~ /foo/)' (d.c.) (see section Expressions as Patterns). This means that the two code segments,

if ($0 ~ /barfly/ || $0 ~ /camelot/)
    print "found"

and

if (/barfly/ || /camelot/)
    print "found"

are exactly equivalent.

One rather bizarre consequence of this rule is that the following boolean expression is valid, but does not do what the user probably intended:

# note that /foo/ is on the left of the ~
if (/foo/ ~ $1) print "found foo"

This code is "obviously" testing $1 for a match against the regexp /foo/. But in fact, the expression `/foo/ ~ $1' actually means `($0 ~ /foo/) ~ $1'. In other words, first match the input record against the regexp /foo/. The result will be either zero or one, depending upon the success or failure of the match. Then match that result against the first field in the record.

Since it is unlikely that you would ever really wish to make this kind of test, gawk will issue a warning when it sees this construct in a program.

Another consequence of this rule is that the assignment statement

matches = /foo/

will assign either zero or one to the variable matches, depending upon the contents of the current input record.

This feature of the language was never well documented until the POSIX specification.

Constant regular expressions are also used as the first argument for the gensub, sub and gsub functions, and as the second argument of the match function (see section Built-in Functions for String Manipulation). Modern implementations of awk, including gawk, allow the third argument of split to be a regexp constant, while some older implementations do not (d.c.).

This can lead to confusion when attempting to use regexp constants as arguments to user defined functions (see section User-defined Functions). For example:

function mysub(pat, repl, str, global)
{
    if (global)
        gsub(pat, repl, str)
    else
        sub(pat, repl, str)
    return str
}

{
    ...
    text = "hi! hi yourself!"
    mysub(/hi/, "howdy", text, 1)
    ...
}

In this example, the programmer wishes to pass a regexp constant to the user-defined function mysub, which will in turn pass it on to either sub or gsub. However, what really happens is that the pat parameter will be either one or zero, depending upon whether or not $0 matches /hi/.

As it is unlikely that you would ever really wish to pass a truth value in this way, gawk will issue a warning when it sees a regexp constant used as a parameter to a user-defined function.


Go to the first, previous, next, last section, table of contents.