Go to the first, previous, next, last section, table of contents.


Escape Sequences

Some characters cannot be included literally in string constants ("foo") or regexp constants (/foo/). You represent them instead with escape sequences, which are character sequences beginning with a backslash (`\').

One use of an escape sequence is to include a double-quote character in a string constant. Since a plain double-quote would end the string, you must use `\"' to represent an actual double-quote character as a part of the string. For example:

$ awk 'BEGIN { print "He said \"hi!\" to her." }'
-| He said "hi!" to her.

The backslash character itself is another character that cannot be included normally; you write `\\' to put one backslash in the string or regexp. Thus, the string whose contents are the two characters `"' and `\' must be written "\"\\".

Another use of backslash is to represent unprintable characters such as tab or newline. While there is nothing to stop you from entering most unprintable characters directly in a string constant or regexp constant, they may look ugly.

Here is a table of all the escape sequences used in awk, and what they represent. Unless noted otherwise, all of these escape sequences apply to both string constants and regexp constants.

\\
A literal backslash, `\'.
\a
The "alert" character, Control-g, ASCII code 7 (BEL).
\b
Backspace, Control-h, ASCII code 8 (BS).
\f
Formfeed, Control-l, ASCII code 12 (FF).
\n
Newline, Control-j, ASCII code 10 (LF).
\r
Carriage return, Control-m, ASCII code 13 (CR).
\t
Horizontal tab, Control-i, ASCII code 9 (HT).
\v
Vertical tab, Control-k, ASCII code 11 (VT).
\nnn
The octal value nnn, where nnn are one to three digits between `0' and `7'. For example, the code for the ASCII ESC (escape) character is `\033'.
\xhh...
The hexadecimal value hh, where hh are hexadecimal digits (`0' through `9' and either `A' through `F' or `a' through `f'). Like the same construct in ANSI C, the escape sequence continues until the first non-hexadecimal digit is seen. However, using more than two hexadecimal digits produces undefined results. (The `\x' escape sequence is not allowed in POSIX awk.)
\/
A literal slash (necessary for regexp constants only). You use this when you wish to write a regexp constant that contains a slash. Since the regexp is delimited by slashes, you need to escape the slash that is part of the pattern, in order to tell awk to keep processing the rest of the regexp.
\"
A literal double-quote (necessary for string constants only). You use this when you wish to write a string constant that contains a double-quote. Since the string is delimited by double-quotes, you need to escape the quote that is part of the string, in order to tell awk to keep processing the rest of the string.

In gawk, there are additional two character sequences that begin with backslash that have special meaning in regexps. See section Additional Regexp Operators Only in gawk.

In a string constant, what happens if you place a backslash before something that is not one of the characters listed above? POSIX awk purposely leaves this case undefined. There are two choices.

In a regexp, a backslash before any character that is not in the above table, and not listed in section Additional Regexp Operators Only in gawk, means that the next character should be taken literally, even if it would normally be a regexp operator. E.g., /a\+b/ matches the three characters `a+b'.

For complete portability, do not use a backslash before any character not listed in the table above.

Another interesting question arises. Suppose you use an octal or hexadecimal escape to represent a regexp metacharacter (see section Regular Expression Operators). Does awk treat the character as literal character, or as a regexp operator?

It turns out that historically, such characters were taken literally (d.c.). However, the POSIX standard indicates that they should be treated as real metacharacters, and this is what gawk does. However, in compatibility mode (see section Command Line Options), gawk treats the characters represented by octal and hexadecimal escape sequences literally when used in regexp constants. Thus, /a\52b/ is equivalent to /a\*b/.

To summarize:

  1. The escape sequences in the table above are always processed first, for both string constants and regexp constants. This happens very early, as soon as awk reads your program.
  2. gawk processes both regexp constants and dynamic regexps (see section Using Dynamic Regexps), for the special operators listed in section Additional Regexp Operators Only in gawk.
  3. A backslash before any other character means to treat that character literally.


Go to the first, previous, next, last section, table of contents.