The GNU Awk User's Guide - Regexp Summary

Go to the first, previous, next, last section, table of contents.

Regular Expressions

Regular expressions are based on POSIX EREs (extended regular expressions). The escape sequences allowed in string constants are also valid in regular expressions (see section Escape Sequences). Regexps are composed of characters as follows:

c: matches the character c (assuming c is none of the characters listed below).
\c: matches the literal character c.
.: matches any character, including newline. In strict POSIX mode, `.' does not match the NUL character, which is a character with all bits equal to zero.
^: matches the beginning of a string.
$: matches the end of a string.
[abc...]: matches any of the characters abc... (character list).
[[:class:]]: matches any character in the character class class. Allowable classes are alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, and xdigit.
[[.symbol.]]: matches the multi-character collating symbol symbol. gawk does not currently support collating symbols.
[[=classname=]]: matches any of the equivalent characters in the current locale named by the equivalence class classname. gawk does not currently support equivalence classes.
[^abc...]: matches any character except abc... (negated character list).
r1|r2: matches either r1 or r2 (alternation).
r1r2: matches r1, and then r2 (concatenation).
r+: matches one or more r's.
r*: matches zero or more r's.
r?: matches zero or one r's.
(r): matches r (grouping).
r{n}
r{n,}
r{n,m}: matches at least n, n to any number, or n to m occurrences of r (interval expressions).
\y: matches the empty string at either the beginning or the end of a word.
\B: matches the empty string within a word.
\<: matches the empty string at the beginning of a word.
\>: matches the empty string at the end of a word.
\w: matches any word-constituent character (alphanumeric characters and the underscore).
\W: matches any character that is not word-constituent.
\`: matches the empty string at the beginning of a buffer (same as a string in gawk).
\': matches the empty string at the end of a buffer.

The various command line options control how gawk interprets characters in regexps.

No options: In the default case, gawk provide all the facilities of POSIX regexps and the GNU regexp operators described above. However, interval expressions are not supported.
--posix: Only POSIX regexps are supported, the GNU operators are not special (e.g., `\w' matches a literal `w'). Interval expressions are allowed.
--traditional: Traditional Unix awk regexps are matched. The GNU operators are not special, interval expressions are not available, and neither are the POSIX character classes ([[:alnum:]] and so on). Characters described by octal and hexadecimal escape sequences are treated literally, even if they represent regexp metacharacters.
--re-interval: Allow interval expressions in regexps, even if `--traditional' has been provided.

See section Regular Expressions.

Go to the first, previous, next, last section, table of contents.