Go to the first, previous, next, last section, table of contents.


Case-sensitivity in Matching

Case is normally significant in regular expressions, both when matching ordinary characters (i.e. not metacharacters), and inside character sets. Thus a `w' in a regular expression matches only a lower-case `w' and not an upper-case `W'.

The simplest way to do a case-independent match is to use a character list: `[Ww]'. However, this can be cumbersome if you need to use it often; and it can make the regular expressions harder to read. There are two alternatives that you might prefer.

One way to do a case-insensitive match at a particular point in the program is to convert the data to a single case, using the tolower or toupper built-in string functions (which we haven't discussed yet; see section Built-in Functions for String Manipulation). For example:

tolower($1) ~ /foo/  { ... }

converts the first field to lower-case before matching against it. This will work in any POSIX-compliant implementation of awk.

Another method, specific to gawk, is to set the variable IGNORECASE to a non-zero value (see section Built-in Variables). When IGNORECASE is not zero, all regexp and string operations ignore case. Changing the value of IGNORECASE dynamically controls the case sensitivity of your program as it runs. Case is significant by default because IGNORECASE (like most variables) is initialized to zero.

x = "aB"
if (x ~ /ab/) ...   # this test will fail

IGNORECASE = 1
if (x ~ /ab/) ...   # now it will succeed

In general, you cannot use IGNORECASE to make certain rules case-insensitive and other rules case-sensitive, because there is no way to set IGNORECASE just for the pattern of a particular rule. To do this, you must use character lists or tolower. However, one thing you can do only with IGNORECASE is turn case-sensitivity on or off dynamically for all the rules at once.

IGNORECASE can be set on the command line, or in a BEGIN rule (see section Other Command Line Arguments; also see section Startup and Cleanup Actions). Setting IGNORECASE from the command line is a way to make a program case-insensitive without having to edit it.

Prior to version 3.0 of gawk, the value of IGNORECASE only affected regexp operations. It did not affect string comparison with `==', `!=', and so on. Beginning with version 3.0, both regexp and string comparison operations are affected by IGNORECASE.

Beginning with version 3.0 of gawk, the equivalences between upper-case and lower-case characters are based on the ISO-8859-1 (ISO Latin-1) character set. This character set is a superset of the traditional 128 ASCII characters, that also provides a number of characters suitable for use with European languages.

The value of IGNORECASE has no effect if gawk is in compatibility mode (see section Command Line Options). Case is always significant in compatibility mode.


Go to the first, previous, next, last section, table of contents.