The GNU Awk User's Guide

Go to the first, previous, next, last section, table of contents.

A Simple Stream Editor

The sed utility is a "stream editor," a program that reads a stream of data, makes changes to it, and passes the modified data on. It is often used to make global changes to a large file, or to a stream of data generated by a pipeline of commands.

While sed is a complicated program in its own right, its most common use is to perform global substitutions in the middle of a pipeline:

command1 < orig.data | sed 's/old/new/g' | command2 > result

Here, the `s/old/new/g' tells sed to look for the regexp `old' on each input line, and replace it with the text `new', globally (i.e. all the occurrences on a line). This is similar to awk's gsub function (see section Built-in Functions for String Manipulation).

The following program, `awksed.awk', accepts at least two command line arguments; the pattern to look for and the text to replace it with. Any additional arguments are treated as data file names to process. If none are provided, the standard input is used.

# awksed.awk --- do s/foo/bar/g using just print
#    Thanks to Michael Brennan for the idea

# Arnold Robbins, arnold@gnu.ai.mit.edu, Public Domain
# August 1995

function usage()
{
    print "usage: awksed pat repl [files...]" > "/dev/stderr"
    exit 1
}

BEGIN {
    # validate arguments
    if (ARGC < 3)
        usage()

    RS = ARGV[1]
    ORS = ARGV[2]

    # don't use arguments as files
    ARGV[1] = ARGV[2] = ""
}

# look ma, no hands!
{
    if (RT == "")
        printf "%s", $0
    else
        print
}

The program relies on gawk's ability to have RS be a regexp and on the setting of RT to the actual text that terminated the record (see section How Input is Split into Records).

The idea is to have RS be the pattern to look for. gawk will automatically set $0 to the text between matches of the pattern. This is text that we wish to keep, unmodified. Then, by setting ORS to the replacement text, a simple print statement will output the text we wish to keep, followed by the replacement text.

There is one wrinkle to this scheme, which is what to do if the last record doesn't end with text that matches RS? Using a print statement unconditionally prints the replacement text, which is not correct.

However, if the file did not end in text that matches RS, RT will be set to the null string. In this case, we can print $0 using printf (see section Using printf Statements for Fancier Printing).

The BEGIN rule handles the setup, checking for the right number of arguments, and calling usage if there is a problem. Then it sets RS and ORS from the command line arguments, and sets ARGV[1] and ARGV[2] to the null string, so that they will not be treated as file names (see section Using ARGC and ARGV).

The usage function prints an error message and exits.

Finally, the single rule handles the printing scheme outlined above, using print or printf as appropriate, depending upon the value of RT.

Go to the first, previous, next, last section, table of contents.