Some characters cannot be included literally in string constants
("foo"
) or regexp constants (/foo/
). You represent them
instead with escape sequences, which are character sequences
beginning with a backslash (`\').
One use of an escape sequence is to include a double-quote character in a string constant. Since a plain double-quote would end the string, you must use `\"' to represent an actual double-quote character as a part of the string. For example:
$ awk 'BEGIN { print "He said \"hi!\" to her." }' -| He said "hi!" to her.
The backslash character itself is another character that cannot be
included normally; you write `\\' to put one backslash in the
string or regexp. Thus, the string whose contents are the two characters
`"' and `\' must be written "\"\\"
.
Another use of backslash is to represent unprintable characters such as tab or newline. While there is nothing to stop you from entering most unprintable characters directly in a string constant or regexp constant, they may look ugly.
Here is a table of all the escape sequences used in awk
, and
what they represent. Unless noted otherwise, all of these escape
sequences apply to both string constants and regexp constants.
\\
\a
\b
\f
\n
\r
\t
\v
\nnn
\xhh...
awk
.)
\/
awk
to keep processing the rest of the regexp.
\"
awk
to keep processing the rest of the string.
In gawk
, there are additional two character sequences that begin
with backslash that have special meaning in regexps.
See section Additional Regexp Operators Only in gawk
.
In a string constant,
what happens if you place a backslash before something that is not one of
the characters listed above? POSIX awk
purposely leaves this case
undefined. There are two choices.
awk
and gawk
both do.
For example, "a\qc"
is the same as "aqc"
.
awk
implementations do this.
In such implementations, "a\qc"
is the same as if you had typed
"a\\qc"
.
In a regexp, a backslash before any character that is not in the above table,
and not listed in
section Additional Regexp Operators Only in gawk
,
means that the next character should be taken literally, even if it would
normally be a regexp operator. E.g., /a\+b/
matches the three
characters `a+b'.
For complete portability, do not use a backslash before any character not listed in the table above.
Another interesting question arises. Suppose you use an octal or hexadecimal
escape to represent a regexp metacharacter
(see section Regular Expression Operators).
Does awk
treat the character as literal character, or as a regexp
operator?
It turns out that historically, such characters were taken literally (d.c.).
However, the POSIX standard indicates that they should be treated
as real metacharacters, and this is what gawk
does.
However, in compatibility mode (see section Command Line Options),
gawk
treats the characters represented by octal and hexadecimal
escape sequences literally when used in regexp constants. Thus,
/a\52b/
is equivalent to /a\*b/
.
To summarize:
awk
reads your program.
gawk
processes both regexp constants and dynamic regexps
(see section Using Dynamic Regexps),
for the special operators listed in
section Additional Regexp Operators Only in gawk
.
Go to the first, previous, next, last section, table of contents.