Case is normally significant in regular expressions, both when matching ordinary characters (i.e. not metacharacters), and inside character sets. Thus a `w' in a regular expression matches only a lower-case `w' and not an upper-case `W'.
The simplest way to do a case-independent match is to use a character list: `[Ww]'. However, this can be cumbersome if you need to use it often; and it can make the regular expressions harder to read. There are two alternatives that you might prefer.
One way to do a case-insensitive match at a particular point in the
program is to convert the data to a single case, using the
tolower
or toupper
built-in string functions (which we
haven't discussed yet;
see section Built-in Functions for String Manipulation).
For example:
tolower($1) ~ /foo/ { ... }
converts the first field to lower-case before matching against it.
This will work in any POSIX-compliant implementation of awk
.
Another method, specific to gawk
, is to set the variable
IGNORECASE
to a non-zero value (see section Built-in Variables).
When IGNORECASE
is not zero, all regexp and string
operations ignore case. Changing the value of
IGNORECASE
dynamically controls the case sensitivity of your
program as it runs. Case is significant by default because
IGNORECASE
(like most variables) is initialized to zero.
x = "aB" if (x ~ /ab/) ... # this test will fail IGNORECASE = 1 if (x ~ /ab/) ... # now it will succeed
In general, you cannot use IGNORECASE
to make certain rules
case-insensitive and other rules case-sensitive, because there is no way
to set IGNORECASE
just for the pattern of a particular rule.
To do this, you must use character lists or tolower
. However, one
thing you can do only with IGNORECASE
is turn case-sensitivity on
or off dynamically for all the rules at once.
IGNORECASE
can be set on the command line, or in a BEGIN
rule
(see section Other Command Line Arguments; also
see section Startup and Cleanup Actions).
Setting IGNORECASE
from the command line is a way to make
a program case-insensitive without having to edit it.
Prior to version 3.0 of gawk
, the value of IGNORECASE
only affected regexp operations. It did not affect string comparison
with `==', `!=', and so on.
Beginning with version 3.0, both regexp and string comparison
operations are affected by IGNORECASE
.
Beginning with version 3.0 of gawk
, the equivalences between upper-case
and lower-case characters are based on the ISO-8859-1 (ISO Latin-1)
character set. This character set is a superset of the traditional 128
ASCII characters, that also provides a number of characters suitable
for use with European languages.
The value of IGNORECASE
has no effect if gawk
is in
compatibility mode (see section Command Line Options).
Case is always significant in compatibility mode.
Go to the first, previous, next, last section, table of contents.