Go to the first, previous, next, last section, table of contents.
Regular expressions are based on POSIX EREs (extended regular expressions).
The escape sequences allowed in string constants are also valid in
regular expressions (see section Escape Sequences).
Regexps are composed of characters as follows:
c
-
matches the character c (assuming c is none of the characters
listed below).
\c
-
matches the literal character c.
.
-
matches any character, including newline.
In strict POSIX mode, `.' does not match the NUL
character, which is a character with all bits equal to zero.
^
-
matches the beginning of a string.
$
-
matches the end of a string.
[abc...]
-
matches any of the characters abc... (character list).
[[:class:]]
-
matches any character in the character class class. Allowable classes
are
alnum
, alpha
, blank
, cntrl
,
digit
, graph
, lower
, print
, punct
,
space
, upper
, and xdigit
.
[[.symbol.]]
-
matches the multi-character collating symbol symbol.
gawk
does not currently support collating symbols.
[[=classname=]]
-
matches any of the equivalent characters in the current locale named by the
equivalence class classname.
gawk
does not currently support equivalence classes.
[^abc...]
-
matches any character except abc... (negated
character list).
r1|r2
-
matches either r1 or r2 (alternation).
r1r2
-
matches r1, and then r2 (concatenation).
r+
-
matches one or more r's.
r*
-
matches zero or more r's.
r?
-
matches zero or one r's.
(r)
-
matches r (grouping).
r{n}
-
r{n,}
-
r{n,m}
-
matches at least n, n to any number, or n to m
occurrences of r (interval expressions).
\y
-
matches the empty string at either the beginning or the
end of a word.
\B
-
matches the empty string within a word.
\<
-
matches the empty string at the beginning of a word.
\>
-
matches the empty string at the end of a word.
\w
-
matches any word-constituent character (alphanumeric characters and
the underscore).
\W
-
matches any character that is not word-constituent.
\`
-
matches the empty string at the beginning of a buffer (same as a string
in
gawk
).
\'
-
matches the empty string at the end of a buffer.
The various command line options
control how gawk
interprets characters in regexps.
- No options
-
In the default case,
gawk
provide all the facilities of
POSIX regexps and the GNU regexp operators described above.
However, interval expressions are not supported.
--posix
-
Only POSIX regexps are supported, the GNU operators are not special
(e.g., `\w' matches a literal `w'). Interval expressions
are allowed.
--traditional
-
Traditional Unix
awk
regexps are matched. The GNU operators
are not special, interval expressions are not available, and neither
are the POSIX character classes ([[:alnum:]]
and so on).
Characters described by octal and hexadecimal escape sequences are
treated literally, even if they represent regexp metacharacters.
--re-interval
-
Allow interval expressions in regexps, even if `--traditional'
has been provided.
See section Regular Expressions.
Go to the first, previous, next, last section, table of contents.