The wc
(word count) utility counts lines, words, and characters in
one or more input files. Its usage is:
wc [-lwc] [ files ... ]
If no files are specified on the command line, wc
reads its standard
input. If there are multiple files, it will also print total counts for all
the files. The options and their meanings are:
-l
-w
awk
separates
fields in its input data.
-c
Implementing wc
in awk
is particularly elegant, since
awk
does a lot of the work for us; it splits lines into words (i.e.
fields) and counts them, it counts lines (i.e. records) for us, and it can
easily tell us how long a line is.
This version uses the getopt
library function
(see section Processing Command Line Options),
and the file transition functions
(see section Noting Data File Boundaries).
This version has one major difference from traditional versions of wc
.
Our version always prints the counts in the order lines, words,
and characters. Traditional versions note the order of the `-l',
`-w', and `-c' options on the command line, and print the counts
in that order.
The BEGIN
rule does the argument processing.
The variable print_total
will
be true if more than one file was named on the command line.
# wc.awk --- count lines, words, characters # Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain # May 1993 # Options: # -l only count lines # -w only count words # -c only count characters # # Default is to count lines, words, characters BEGIN { # let getopt print a message about # invalid options. we ignore them while ((c = getopt(ARGC, ARGV, "lwc")) != -1) { if (c == "l") do_lines = 1 else if (c == "w") do_words = 1 else if (c == "c") do_chars = 1 } for (i = 1; i < Optind; i++) ARGV[i] = "" # if no options, do all if (! do_lines && ! do_words && ! do_chars) do_lines = do_words = do_chars = 1 print_total = (ARGC - i > 2) }
The beginfile
function is simple; it just resets the counts of lines,
words, and characters to zero, and saves the current file name in
fname
.
The endfile
function adds the current file's numbers to the running
totals of lines, words, and characters. It then prints out those numbers
for the file that was just read. It relies on beginfile
to reset the
numbers for the following data file.
function beginfile(file) { chars = lines = words = 0 fname = FILENAME } function endfile(file) { tchars += chars tlines += lines twords += words if (do_lines) printf "\t%d", lines if (do_words) printf "\t%d", words if (do_chars) printf "\t%d", chars printf "\t%s\n", fname }
There is one rule that is executed for each line. It adds the length of the
record to chars
. It has to add one, since the newline character
separating records (the value of RS
) is not part of the record
itself. lines
is incremented for each line read, and words
is
incremented by the value of NF
, the number of "words" on this
line.(22)
Finally, the END
rule simply prints the totals for all the files.
# do per line { chars += length($0) + 1 # get newline lines++ words += NF } END { if (print_total) { if (do_lines) printf "\t%d", tlines if (do_words) printf "\t%d", twords if (do_chars) printf "\t%d", tchars print "\ttotal" } }
Go to the first, previous, next, last section, table of contents.