nextfile
as a Function
The nextfile
statement presented in
section The nextfile
Statement,
is a gawk
-specific extension. It is not available in other
implementations of awk
. This section shows two versions of a
nextfile
function that you can use to simulate gawk
's
nextfile
statement if you cannot use gawk
.
Here is a first attempt at writing a nextfile
function.
# nextfile --- skip remaining records in current file # this should be read in before the "main" awk program function nextfile() { _abandon_ = FILENAME; next } _abandon_ == FILENAME { next }
This file should be included before the main program, because it supplies
a rule that must be executed first. This rule compares the current data
file's name (which is always in the FILENAME
variable) to a private
variable named _abandon_
. If the file name matches, then the action
part of the rule executes a next
statement, to go on to the next
record. (The use of `_' in the variable name is a convention.
It is discussed more fully in
section Naming Library Function Global Variables.)
The use of the next
statement effectively creates a loop that reads
all the records from the current data file.
Eventually, the end of the file is reached, and
a new data file is opened, changing the value of FILENAME
.
Once this happens, the comparison of _abandon_
to FILENAME
fails, and execution continues with the first rule of the "real" program.
The nextfile
function itself simply sets the value of _abandon_
and then executes a next
statement to start the loop
going.(19)
This initial version has a subtle problem. What happens if the same data file is listed twice on the command line, one right after the other, or even with just a variable assignment between the two occurrences of the file name?
In such a case,
this code will skip right through the file, a second time, even though
it should stop when it gets to the end of the first occurrence.
Here is a second version of nextfile
that remedies this problem.
# nextfile --- skip remaining records in current file # correctly handle successive occurrences of the same file # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain # May, 1993 # this should be read in before the "main" awk program function nextfile() { _abandon_ = FILENAME; next } _abandon_ == FILENAME { if (FNR == 1) _abandon_ = "" else next }
The nextfile
function has not changed. It sets _abandon_
equal to the current file name and then executes a next
satement.
The next
statement reads the next record and increments FNR
,
so FNR
is guaranteed to have a value of at least two.
However, if nextfile
is called for the last record in the file,
then awk
will close the current data file and move on to the next
one. Upon doing so, FILENAME
will be set to the name of the new file,
and FNR
will be reset to one. If this next file is the same as
the previous one, _abandon_
will still be equal to FILENAME
.
However, FNR
will be equal to one, telling us that this is a new
occurrence of the file, and not the one we were reading when the
nextfile
function was executed. In that case, _abandon_
is reset to the empty string, so that further executions of this rule
will fail (until the next time that nextfile
is called).
If FNR
is not one, then we are still in the original data file,
and the program executes a next
statement to skip through it.
An important question to ask at this point is: "Given that the
functionality of nextfile
can be provided with a library file,
why is it built into gawk
?" This is an important question. Adding
features for little reason leads to larger, slower programs that are
harder to maintain.
The answer is that building nextfile
into gawk
provides
significant gains in efficiency. If the nextfile
function is executed
at the beginning of a large data file, awk
still has to scan the entire
file, splitting it up into records, just to skip over it. The built-in
nextfile
can simply close the file immediately and proceed to the
next one, saving a lot of time. This is particularly important in
awk
, since awk
programs are generally I/O bound (i.e.
they spend most of their time doing input and output, instead of performing
computations).
Go to the first, previous, next, last section, table of contents.