In programs that use arrays, you often need a loop that executes
once for each element of an array. In other languages, where arrays are
contiguous and indices are limited to positive integers, this is
easy: you can
find all the valid indices by counting from the lowest index
up to the highest. This
technique won't do the job in awk
, since any number or string
can be an array index. So awk
has a special kind of for
statement for scanning an array:
for (var in array) body
This loop executes body once for each index in array that your program has previously used, with the variable var set to that index.
Here is a program that uses this form of the for
statement. The
first rule scans the input records and notes which words appear (at
least once) in the input, by storing a one into the array used
with
the word as index. The second rule scans the elements of used
to
find all the distinct words that appear in the input. It prints each
word that is more than 10 characters long, and also prints the number of
such words. See section Built-in Functions for String Manipulation, for more information
on the built-in function length
.
# Record a 1 for each word that is used at least once. { for (i = 1; i <= NF; i++) used[$i] = 1 } # Find number of distinct words more than 10 characters long. END { for (x in used) if (length(x) > 10) { ++num_long_words print x } print num_long_words, "words longer than 10 characters" }
See section Generating Word Usage Counts, for a more detailed example of this type.
The order in which elements of the array are accessed by this statement
is determined by the internal arrangement of the array elements within
awk
and cannot be controlled or changed. This can lead to
problems if new elements are added to array by statements in
the loop body; you cannot predict whether or not the for
loop will
reach them. Similarly, changing var inside the loop may produce
strange results. It is best to avoid such things.
Go to the first, previous, next, last section, table of contents.