Most utilities on POSIX compatible systems take options or "switches" on
the command line that can be used to change the way a program behaves.
awk
is an example of such a program
(see section Command Line Options).
Often, options take arguments, data that the program needs to
correctly obey the command line option. For example, awk
's
`-F' option requires a string to use as the field separator.
The first occurrence on the command line of either `--' or a
string that does not begin with `-' ends the options.
Most Unix systems provide a C function named getopt
for processing
command line arguments. The programmer provides a string describing the one
letter options. If an option requires an argument, it is followed in the
string with a colon. getopt
is also passed the
count and values of the command line arguments, and is called in a loop.
getopt
processes the command line arguments for option letters.
Each time around the loop, it returns a single character representing the
next option letter that it found, or `?' if it found an invalid option.
When it returns -1, there are no options left on the command line.
When using getopt
, options that do not take arguments can be
grouped together. Furthermore, options that take arguments require that the
argument be present. The argument can immediately follow the option letter,
or it can be a separate command line argument.
Given a hypothetical program that takes three command line options, `-a', `-b', and `-c', and `-b' requires an argument, all of the following are valid ways of invoking the program:
prog -a -b foo -c data1 data2 data3 prog -ac -bfoo -- data1 data2 data3 prog -acbfoo data1 data2 data3
Notice that when the argument is grouped with its option, the rest of the command line argument is considered to be the option's argument. In the above example, `-acbfoo' indicates that all of the `-a', `-b', and `-c' options were supplied, and that `foo' is the argument to the `-b' option.
getopt
provides four external variables that the programmer can use.
optind
argv
) where the first
non-option command line argument can be found.
optarg
opterr
getopt
prints an error message when it finds an invalid
option. Setting opterr
to zero disables this feature. (An
application might wish to print its own error message.)
optopt
The following C fragment shows how getopt
might process command line
arguments for awk
.
int main(int argc, char *argv[]) { ... /* print our own message */ opterr = 0; while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) { switch (c) { case 'f': /* file */ ... break; case 'F': /* field separator */ ... break; case 'v': /* variable assignment */ ... break; case 'W': /* extension */ ... break; case '?': default: usage(); break; } } ... }
As a side point, gawk
actually uses the GNU getopt_long
function to process both normal and GNU-style long options
(see section Command Line Options).
The abstraction provided by getopt
is very useful, and would be quite
handy in awk
programs as well. Here is an awk
version of
getopt
. This function highlights one of the greatest weaknesses in
awk
, which is that it is very poor at manipulating single characters.
Repeated calls to substr
are necessary for accessing individual
characters (see section Built-in Functions for String Manipulation).
The discussion walks through the code a bit at a time.
# getopt --- do C library getopt(3) function in awk # # arnold@gnu.ai.mit.edu # Public domain # # Initial version: March, 1991 # Revised: May, 1993 # External variables: # Optind -- index of ARGV for first non-option argument # Optarg -- string value of argument to current option # Opterr -- if non-zero, print our own diagnostic # Optopt -- current option letter # Returns # -1 at end of options # ? for unrecognized option # <c> a character representing the current option # Private Data # _opti index in multi-flag option, e.g., -abc
The function starts out with some documentation: who wrote the code, and when it was revised, followed by a list of the global variables it uses, what the return values are and what they mean, and any global variables that are "private" to this library function. Such documentation is essential for any program, and particularly for library functions.
function getopt(argc, argv, options, optl, thisopt, i) { optl = length(options) if (optl == 0) # no options given return -1 if (argv[Optind] == "--") { # all done Optind++ _opti = 0 return -1 } else if (argv[Optind] !~ /^-[^: \t\n\f\r\v\b]/) { _opti = 0 return -1 }
The function first checks that it was indeed called with a string of options
(the options
parameter). If options
has a zero length,
getopt
immediately returns -1.
The next thing to check for is the end of the options. A `--' ends the
command line options, as does any command line argument that does not begin
with a `-'. Optind
is used to step through the array of command
line arguments; it retains its value across calls to getopt
, since it
is a global variable.
The regexp used, /^-[^: \t\n\f\r\v\b]/
, is
perhaps a bit of overkill; it checks for a `-' followed by anything
that is not whitespace and not a colon.
If the current command line argument does not match this pattern,
it is not an option, and it ends option processing.
if (_opti == 0) _opti = 2 thisopt = substr(argv[Optind], _opti, 1) Optopt = thisopt i = index(options, thisopt) if (i == 0) { if (Opterr) printf("%c -- invalid option\n", thisopt) > "/dev/stderr" if (_opti >= length(argv[Optind])) { Optind++ _opti = 0 } else _opti++ return "?" }
The _opti
variable tracks the position in the current command line
argument (argv[Optind]
). In the case that multiple options were
grouped together with one `-' (e.g., `-abx'), it is necessary
to return them to the user one at a time.
If _opti
is equal to zero, it is set to two, the index in the string
of the next character to look at (we skip the `-', which is at position
one). The variable thisopt
holds the character, obtained with
substr
. It is saved in Optopt
for the main program to use.
If thisopt
is not in the options
string, then it is an
invalid option. If Opterr
is non-zero, getopt
prints an error
message on the standard error that is similar to the message from the C
version of getopt
.
Since the option is invalid, it is necessary to skip it and move on to the
next option character. If _opti
is greater than or equal to the
length of the current command line argument, then it is necessary to move on
to the next one, so Optind
is incremented and _opti
is reset
to zero. Otherwise, Optind
is left alone and _opti
is merely
incremented.
In any case, since the option was invalid, getopt
returns `?'.
The main program can examine Optopt
if it needs to know what the
invalid option letter actually was.
if (substr(options, i + 1, 1) == ":") { # get option argument if (length(substr(argv[Optind], _opti + 1)) > 0) Optarg = substr(argv[Optind], _opti + 1) else Optarg = argv[++Optind] _opti = 0 } else Optarg = ""
If the option requires an argument, the option letter is followed by a colon
in the options
string. If there are remaining characters in the
current command line argument (argv[Optind]
), then the rest of that
string is assigned to Optarg
. Otherwise, the next command line
argument is used (`-xFOO' vs. `-x FOO'). In either case,
_opti
is reset to zero, since there are no more characters left to
examine in the current command line argument.
if (_opti == 0 || _opti >= length(argv[Optind])) { Optind++ _opti = 0 } else _opti++ return thisopt }
Finally, if _opti
is either zero or greater than the length of the
current command line argument, it means this element in argv
is
through being processed, so Optind
is incremented to point to the
next element in argv
. If neither condition is true, then only
_opti
is incremented, so that the next option letter can be processed
on the next call to getopt
.
BEGIN { Opterr = 1 # default is to diagnose Optind = 1 # skip ARGV[0] # test program if (_getopt_test) { while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1) printf("c = <%c>, optarg = <%s>\n", _go_c, Optarg) printf("non-option arguments:\n") for (; Optind < ARGC; Optind++) printf("\tARGV[%d] = <%s>\n", Optind, ARGV[Optind]) } }
The BEGIN
rule initializes both Opterr
and Optind
to one.
Opterr
is set to one, since the default behavior is for getopt
to print a diagnostic message upon seeing an invalid option. Optind
is set to one, since there's no reason to look at the program name, which is
in ARGV[0]
.
The rest of the BEGIN
rule is a simple test program. Here is the
result of two sample runs of the test program.
$ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x -| c = <a>, optarg = <> -| c = <c>, optarg = <> -| c = <b>, optarg = <ARG> -| non-option arguments: -| ARGV[3] = <bax> -| ARGV[4] = <-x> $ awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc -| c = <a>, optarg = <> error--> x -- invalid option -| c = <?>, optarg = <> -| non-option arguments: -| ARGV[4] = <xyz> -| ARGV[5] = <abc>
The first `--' terminates the arguments to awk
, so that it does
not try to interpret the `-a' etc. as its own options.
Several of the sample programs presented in
section Practical awk
Programs,
use getopt
to process their arguments.
Go to the first, previous, next, last section, table of contents.