Go to the first, previous, next, last section, table of contents.


Implementing nextfile as a Function

The nextfile statement presented in section The nextfile Statement, is a gawk-specific extension. It is not available in other implementations of awk. This section shows two versions of a nextfile function that you can use to simulate gawk's nextfile statement if you cannot use gawk.

Here is a first attempt at writing a nextfile function.

# nextfile --- skip remaining records in current file

# this should be read in before the "main" awk program

function nextfile()    { _abandon_ = FILENAME; next }

_abandon_ == FILENAME  { next }

This file should be included before the main program, because it supplies a rule that must be executed first. This rule compares the current data file's name (which is always in the FILENAME variable) to a private variable named _abandon_. If the file name matches, then the action part of the rule executes a next statement, to go on to the next record. (The use of `_' in the variable name is a convention. It is discussed more fully in section Naming Library Function Global Variables.)

The use of the next statement effectively creates a loop that reads all the records from the current data file. Eventually, the end of the file is reached, and a new data file is opened, changing the value of FILENAME. Once this happens, the comparison of _abandon_ to FILENAME fails, and execution continues with the first rule of the "real" program.

The nextfile function itself simply sets the value of _abandon_ and then executes a next statement to start the loop going.(19)

This initial version has a subtle problem. What happens if the same data file is listed twice on the command line, one right after the other, or even with just a variable assignment between the two occurrences of the file name?

In such a case, this code will skip right through the file, a second time, even though it should stop when it gets to the end of the first occurrence. Here is a second version of nextfile that remedies this problem.

# nextfile --- skip remaining records in current file
# correctly handle successive occurrences of the same file
# Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
# May, 1993

# this should be read in before the "main" awk program

function nextfile()   { _abandon_ = FILENAME; next }

_abandon_ == FILENAME {
      if (FNR == 1)
          _abandon_ = ""
      else
          next
}

The nextfile function has not changed. It sets _abandon_ equal to the current file name and then executes a next satement. The next statement reads the next record and increments FNR, so FNR is guaranteed to have a value of at least two. However, if nextfile is called for the last record in the file, then awk will close the current data file and move on to the next one. Upon doing so, FILENAME will be set to the name of the new file, and FNR will be reset to one. If this next file is the same as the previous one, _abandon_ will still be equal to FILENAME. However, FNR will be equal to one, telling us that this is a new occurrence of the file, and not the one we were reading when the nextfile function was executed. In that case, _abandon_ is reset to the empty string, so that further executions of this rule will fail (until the next time that nextfile is called).

If FNR is not one, then we are still in the original data file, and the program executes a next statement to skip through it.

An important question to ask at this point is: "Given that the functionality of nextfile can be provided with a library file, why is it built into gawk?" This is an important question. Adding features for little reason leads to larger, slower programs that are harder to maintain.

The answer is that building nextfile into gawk provides significant gains in efficiency. If the nextfile function is executed at the beginning of a large data file, awk still has to scan the entire file, splitting it up into records, just to skip over it. The built-in nextfile can simply close the file immediately and proceed to the next one, saving a lot of time. This is particularly important in awk, since awk programs are generally I/O bound (i.e. they spend most of their time doing input and output, instead of performing computations).


Go to the first, previous, next, last section, table of contents.