Go to the first, previous, next, last section, table of contents.


Reading Fixed-width Data

(This section discusses an advanced, experimental feature. If you are a novice awk user, you may wish to skip it on the first reading.)

gawk version 2.13 introduced a new facility for dealing with fixed-width fields with no distinctive field separator. Data of this nature arises, for example, in the input for old FORTRAN programs where numbers are run together; or in the output of programs that did not anticipate the use of their output as input for other programs.

An example of the latter is a table where all the columns are lined up by the use of a variable number of spaces and empty fields are just spaces. Clearly, awk's normal field splitting based on FS will not work well in this case. Although a portable awk program can use a series of substr calls on $0 (see section Built-in Functions for String Manipulation), this is awkward and inefficient for a large number of fields.

The splitting of an input record into fixed-width fields is specified by assigning a string containing space-separated numbers to the built-in variable FIELDWIDTHS. Each number specifies the width of the field including columns between fields. If you want to ignore the columns between fields, you can specify the width as a separate field that is subsequently ignored.

The following data is the output of the Unix w utility. It is useful to illustrate the use of FIELDWIDTHS.

 10:06pm  up 21 days, 14:04,  23 users
User     tty       login  idle   JCPU   PCPU  what
hzuo     ttyV0     8:58pm            9      5  vi p24.tex 
hzang    ttyV3     6:37pm    50                -csh 
eklye    ttyV5     9:53pm            7      1  em thes.tex 
dportein ttyV6     8:17pm  1:47                -csh 
gierd    ttyD3    10:00pm     1                elm 
dave     ttyD4     9:47pm            4      4  w 
brent    ttyp0    26Jun91  4:46  26:46   4:41  bash 
dave     ttyq4    26Jun9115days     46     46  wnewmail

The following program takes the above input, converts the idle time to number of seconds and prints out the first two fields and the calculated idle time. (This program uses a number of awk features that haven't been introduced yet.)

BEGIN  { FIELDWIDTHS = "9 6 10 6 7 7 35" }
NR > 2 {
    idle = $4
    sub(/^  */, "", idle)   # strip leading spaces
    if (idle == "")
        idle = 0
    if (idle ~ /:/) {
        split(idle, t, ":")
        idle = t[1] * 60 + t[2]
    }
    if (idle ~ /days/)
        idle *= 24 * 60 * 60
 
    print $1, $2, idle
}

Here is the result of running the program on the data:

hzuo      ttyV0  0
hzang     ttyV3  50
eklye     ttyV5  0
dportein  ttyV6  107
gierd     ttyD3  1
dave      ttyD4  0
brent     ttyp0  286
dave      ttyq4  1296000

Another (possibly more practical) example of fixed-width input data would be the input from a deck of balloting cards. In some parts of the United States, voters mark their choices by punching holes in computer cards. These cards are then processed to count the votes for any particular candidate or on any particular issue. Since a voter may choose not to vote on some issue, any column on the card may be empty. An awk program for processing such data could use the FIELDWIDTHS feature to simplify reading the data. (Of course, getting gawk to run on a system with card readers is another story!)

Assigning a value to FS causes gawk to return to using FS for field splitting. Use `FS = FS' to make this happen, without having to know the current value of FS.

This feature is still experimental, and may evolve over time. Note that in particular, gawk does not attempt to verify the sanity of the values used in the value of FIELDWIDTHS.


Go to the first, previous, next, last section, table of contents.