The GNU Awk User's Guide - Scanning an Array

Go to the first, previous, next, last section, table of contents.

Scanning All Elements of an Array

In programs that use arrays, you often need a loop that executes once for each element of an array. In other languages, where arrays are contiguous and indices are limited to positive integers, this is easy: you can find all the valid indices by counting from the lowest index up to the highest. This technique won't do the job in awk, since any number or string can be an array index. So awk has a special kind of for statement for scanning an array:

for (var in array)
  body

This loop executes body once for each index in array that your program has previously used, with the variable var set to that index.

Here is a program that uses this form of the for statement. The first rule scans the input records and notes which words appear (at least once) in the input, by storing a one into the array used with the word as index. The second rule scans the elements of used to find all the distinct words that appear in the input. It prints each word that is more than 10 characters long, and also prints the number of such words. See section Built-in Functions for String Manipulation, for more information on the built-in function length.

# Record a 1 for each word that is used at least once.
{
    for (i = 1; i <= NF; i++)
        used[$i] = 1
}

# Find number of distinct words more than 10 characters long.
END {
    for (x in used)
        if (length(x) > 10) {
            ++num_long_words
            print x
        }
    print num_long_words, "words longer than 10 characters"
}

See section Generating Word Usage Counts, for a more detailed example of this type.

The order in which elements of the array are accessed by this statement is determined by the internal arrangement of the array elements within awk and cannot be controlled or changed. This can lead to problems if new elements are added to array by statements in the loop body; you cannot predict whether or not the for loop will reach them. Similarly, changing var inside the loop may produce strange results. It is best to avoid such things.

Go to the first, previous, next, last section, table of contents.