The GNU Awk User's Guide - Group Functions

Go to the first, previous, next, last section, table of contents.

Reading the Group Database

Much of the discussion presented in section Reading the User Database, applies to the group database as well. Although there has traditionally been a well known file, `/etc/group', in a well known format, the POSIX standard only provides a set of C library routines (<grp.h> and getgrent) for accessing the information. Even though this file may exist, it likely does not have complete information. Therefore, as with the user database, it is necessary to have a small C program that generates the group database as its output.

Here is grcat, a C program that "cats" the group database.

/*
 * grcat.c
 *
 * Generate a printable version of the group database
 *
 * Arnold Robbins, arnold@gnu.ai.mit.edu
 * May 1993
 * Public Domain
 */

#include <stdio.h>
#include <grp.h>

int
main(argc, argv)
int argc;
char **argv;
{
    struct group *g;
    int i;

    while ((g = getgrent()) != NULL) {
        printf("%s:%s:%d:", g->gr_name, g->gr_passwd,
                                            g->gr_gid);
        for (i = 0; g->gr_mem[i] != NULL; i++) {
            printf("%s", g->gr_mem[i]);
            if (g->gr_mem[i+1] != NULL)
                putchar(',');
        }
        putchar('\n');
    }
    endgrent();
    exit(0);
}

Each line in the group database represent one group. The fields are separated with colons, and represent the following information.

Group Name: The name of the group.
Group Password: The encrypted group password. In practice, this field is never used. It is usually empty, or set to `*'.
Group ID Number: The numeric group-id number. This number should be unique within the file.
Group Member List: A comma-separated list of user names. These users are members of the group. Most Unix systems allow users to be members of several groups simultaneously. If your system does, then reading `/dev/user' will return those group-id numbers in $5 through $NF. (Note that `/dev/user' is a gawk extension; see section Special File Names in gawk.)

Here is what running grcat might produce:

$ grcat
-| wheel:*:0:arnold
-| nogroup:*:65534:
-| daemon:*:1:
-| kmem:*:2:
-| staff:*:10:arnold,miriam,andy
-| other:*:20:
...

Here are the functions for obtaining information from the group database. There are several, modeled after the C library functions of the same names.

# group.awk --- functions for dealing with the group file
# Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
# May 1993

BEGIN    \
{
    # Change to suit your system
    _gr_awklib = "/usr/local/libexec/awk/"
}

function _gr_init(    oldfs, oldrs, olddol0, grcat, n, a, i)
{
    if (_gr_inited)
        return

    oldfs = FS
    oldrs = RS
    olddol0 = $0
    FS = ":"
    RS = "\n"

    grcat = _gr_awklib "grcat"
    while ((grcat | getline) > 0) {
        if ($1 in _gr_byname)
            _gr_byname[$1] = _gr_byname[$1] "," $4
        else
            _gr_byname[$1] = $0
        if ($3 in _gr_bygid)
            _gr_bygid[$3] = _gr_bygid[$3] "," $4
        else
            _gr_bygid[$3] = $0

        n = split($4, a, "[ \t]*,[ \t]*")
        for (i = 1; i <= n; i++)
            if (a[i] in _gr_groupsbyuser)
                _gr_groupsbyuser[a[i]] = \
                    _gr_groupsbyuser[a[i]] " " $1
            else
                _gr_groupsbyuser[a[i]] = $1

        _gr_bycount[++_gr_count] = $0
    }
    close(grcat)
    _gr_count = 0
    _gr_inited++
    FS = oldfs
    RS = oldrs
    $0 = olddol0
}

The BEGIN rule sets a private variable to the directory where grcat is stored. Since it is used to help out an awk library routine, we have chosen to put it in `/usr/local/libexec/awk'. You might want it to be in a different directory on your system.

These routines follow the same general outline as the user database routines (see section Reading the User Database). The _gr_inited variable is used to ensure that the database is scanned no more than once. The _gr_init function first saves FS, RS, and $0, and then sets FS and RS to the correct values for scanning the group information.

The group information is stored is several associative arrays. The arrays are indexed by group name (_gr_byname), by group-id number (_gr_bygid), and by position in the database (_gr_bycount). There is an additional array indexed by user name (_gr_groupsbyuser), that is a space separated list of groups that each user belongs to.

Unlike the user database, it is possible to have multiple records in the database for the same group. This is common when a group has a large number of members. Such a pair of entries might look like:

tvpeople:*:101:johny,jay,arsenio
tvpeople:*:101:david,conan,tom,joan

For this reason, _gr_init looks to see if a group name or group-id number has already been seen. If it has, then the user names are simply concatenated onto the previous list of users. (There is actually a subtle problem with the code presented above. Suppose that the first time there were no names. This code adds the names with a leading comma. It also doesn't check that there is a $4.)

Finally, _gr_init closes the pipeline to grcat, restores FS, RS, and $0, initializes _gr_count to zero (it is used later), and makes _gr_inited non-zero.

function getgrnam(group)
{
    _gr_init()
    if (group in _gr_byname)
        return _gr_byname[group]
    return ""
}

The getgrnam function takes a group name as its argument, and if that group exists, it is returned. Otherwise, getgrnam returns the null string.

function getgrgid(gid)
{
    _gr_init()
    if (gid in _gr_bygid)
        return _gr_bygid[gid]
    return ""
}

The getgrgid function is similar, it takes a numeric group-id, and looks up the information associated with that group-id.

function getgruser(user)
{
    _gr_init()
    if (user in _gr_groupsbyuser)
        return _gr_groupsbyuser[user]
    return ""
}

The getgruser function does not have a C counterpart. It takes a user name, and returns the list of groups that have the user as a member.

function getgrent()
{
    _gr_init()
    if (++gr_count in _gr_bycount)
        return _gr_bycount[_gr_count]
    return ""
}

The getgrent function steps through the database one entry at a time. It uses _gr_count to track its position in the list.

function endgrent()
{
    _gr_count = 0
}

endgrent resets _gr_count to zero so that getgrent can start over again.

As with the user database routines, each function calls _gr_init to initialize the arrays. Doing so only incurs the extra overhead of running grcat if these functions are used (as opposed to moving the body of _gr_init into a BEGIN rule).

Most of the work is in scanning the database and building the various associative arrays. The functions that the user calls are themselves very simple, relying on awk's associative arrays to do work.

The id program in section Printing Out User Information, uses these functions.

Go to the first, previous, next, last section, table of contents.