Go to the first, previous, next, last section, table of contents.


Using Dynamic Regexps

The right hand side of a `~' or `!~' operator need not be a regexp constant (i.e. a string of characters between slashes). It may be any expression. The expression is evaluated, and converted if necessary to a string; the contents of the string are used as the regexp. A regexp that is computed in this way is called a dynamic regexp. For example:

BEGIN { identifier_regexp = "[A-Za-z_][A-Za-z_0-9]+" }
$0 ~ identifier_regexp    { print }

sets identifier_regexp to a regexp that describes awk variable names, and tests if the input record matches this regexp.

Caution: When using the `~' and `!~' operators, there is a difference between a regexp constant enclosed in slashes, and a string constant enclosed in double quotes. If you are going to use a string constant, you have to understand that the string is in essence scanned twice; the first time when awk reads your program, and the second time when it goes to match the string on the left-hand side of the operator with the pattern on the right. This is true of any string valued expression (such as identifier_regexp above), not just string constants.

What difference does it make if the string is scanned twice? The answer has to do with escape sequences, and particularly with backslashes. To get a backslash into a regular expression inside a string, you have to type two backslashes.

For example, /\*/ is a regexp constant for a literal `*'. Only one backslash is needed. To do the same thing with a string, you would have to type "\\*". The first backslash escapes the second one, so that the string actually contains the two characters `\' and `*'.

Given that you can use both regexp and string constants to describe regular expressions, which should you use? The answer is "regexp constants," for several reasons.

  1. String constants are more complicated to write, and more difficult to read. Using regexp constants makes your programs less error-prone. Not understanding the difference between the two kinds of constants is a common source of errors.
  2. It is also more efficient to use regexp constants: awk can note that you have supplied a regexp and store it internally in a form that makes pattern matching more efficient. When using a string constant, awk must first convert the string into this internal form, and then perform the pattern matching.
  3. Using regexp constants is better style; it shows clearly that you intend a regexp match.


Go to the first, previous, next, last section, table of contents.