Go to the first, previous, next, last section, table of contents.


Variable Typing and Comparison Expressions

The Guide is definitive. Reality is frequently inaccurate.
The Hitchhiker's Guide to the Galaxy

Unlike other programming languages, awk variables do not have a fixed type. Instead, they can be either a number or a string, depending upon the value that is assigned to them.

The 1992 POSIX standard introduced the concept of a numeric string, which is simply a string that looks like a number, for example, " +2". This concept is used for determining the type of a variable.

The type of the variable is important, since the types of two variables determine how they are compared.

In gawk, variable typing follows these rules.

  1. A numeric literal or the result of a numeric operation has the numeric attribute.
  2. A string literal or the result of a string operation has the string attribute.
  3. Fields, getline input, FILENAME, ARGV elements, ENVIRON elements and the elements of an array created by split that are numeric strings have the strnum attribute. Otherwise, they have the string attribute. Uninitialized variables also have the strnum attribute.
  4. Attributes propagate across assignments, but are not changed by any use.

The last rule is particularly important. In the following program, a has numeric type, even though it is later used in a string operation.

BEGIN {
         a = 12.345
         b = a " is a cute number"
         print b
}

When two operands are compared, either string comparison or numeric comparison may be used, depending on the attributes of the operands, according to the following, symmetric, matrix:

The basic idea is that user input that looks numeric, and only user input, should be treated as numeric, even though it is actually made of characters, and is therefore also a string.

Comparison expressions compare strings or numbers for relationships such as equality. They are written using relational operators, which are a superset of those in C. Here is a table of them:

x < y
True if x is less than y.
x <= y
True if x is less than or equal to y.
x > y
True if x is greater than y.
x >= y
True if x is greater than or equal to y.
x == y
True if x is equal to y.
x != y
True if x is not equal to y.
x ~ y
True if the string x matches the regexp denoted by y.
x !~ y
True if the string x does not match the regexp denoted by y.
subscript in array
True if the array array has an element with the subscript subscript.

Comparison expressions have the value one if true and zero if false.

When comparing operands of mixed types, numeric operands are converted to strings using the value of CONVFMT (see section Conversion of Strings and Numbers).

Strings are compared by comparing the first character of each, then the second character of each, and so on. Thus "10" is less than "9". If there are two strings where one is a prefix of the other, the shorter string is less than the longer one. Thus "abc" is less than "abcd".

It is very easy to accidentally mistype the `==' operator, and leave off one of the `='s. The result is still valid awk code, but the program will not do what you mean:

if (a = b)   # oops! should be a == b
   ...
else
   ...

Unless b happens to be zero or the null string, the if part of the test will always succeed. Because the operators are so similar, this kind of error is very difficult to spot when scanning the source code.

Here are some sample expressions, how gawk compares them, and what the result of the comparison is.

1.5 <= 2.0
numeric comparison (true)
"abc" >= "xyz"
string comparison (false)
1.5 != " +2"
string comparison (true)
"1e2" < "3"
string comparison (true)
a = 2; b = "2"
a == b
string comparison (true)
a = 2; b = " +2"
a == b
string comparison (false)

In this example,

$ echo 1e2 3 | awk '{ print ($1 < $2) ? "true" : "false" }'
-| false

the result is `false' since both $1 and $2 are numeric strings and thus both have the strnum attribute, dictating a numeric comparison.

The purpose of the comparison rules and the use of numeric strings is to attempt to produce the behavior that is "least surprising," while still "doing the right thing."

String comparisons and regular expression comparisons are very different. For example,

x == "foo"

has the value of one, or is true, if the variable x is precisely `foo'. By contrast,

x ~ /foo/

has the value one if x contains `foo', such as "Oh, what a fool am I!".

The right hand operand of the `~' and `!~' operators may be either a regexp constant (/.../), or an ordinary expression, in which case the value of the expression as a string is used as a dynamic regexp (see section How to Use Regular Expressions; also see section Using Dynamic Regexps).

In recent implementations of awk, a constant regular expression in slashes by itself is also an expression. The regexp /regexp/ is an abbreviation for this comparison expression:

$0 ~ /regexp/

One special place where /foo/ is not an abbreviation for `$0 ~ /foo/' is when it is the right-hand operand of `~' or `!~'! See section Using Regular Expression Constants, where this is discussed in more detail.


Go to the first, previous, next, last section, table of contents.