Go to the first, previous, next, last section, table of contents.


Translating Between Characters and Numbers

One commercial implementation of awk supplies a built-in function, ord, which takes a character and returns the numeric value for that character in the machine's character set. If the string passed to ord has more than one character, only the first one is used.

The inverse of this function is chr (from the function of the same name in Pascal), which takes a number and returns the corresponding character.

Both functions can be written very nicely in awk; there is no real reason to build them into the awk interpreter.

# ord.awk --- do ord and chr
#
# Global identifiers:
#    _ord_:        numerical values indexed by characters
#    _ord_init:    function to initialize _ord_
#
# Arnold Robbins
# arnold@gnu.ai.mit.edu
# Public Domain
# 16 January, 1992
# 20 July, 1992, revised

BEGIN    { _ord_init() }

function _ord_init(    low, high, i, t)
{
    low = sprintf("%c", 7) # BEL is ascii 7
    if (low == "\a") {    # regular ascii
        low = 0
        high = 127
    } else if (sprintf("%c", 128 + 7) == "\a") {
        # ascii, mark parity
        low = 128
        high = 255
    } else {        # ebcdic(!)
        low = 0
        high = 255
    }

    for (i = low; i <= high; i++) {
        t = sprintf("%c", i)
        _ord_[t] = i
    }
}

Some explanation of the numbers used by chr is worthwhile. The most prominent character set in use today is ASCII. Although an eight-bit byte can hold 256 distinct values (from zero to 255), ASCII only defines characters that use the values from zero to 127.(20) At least one computer manufacturer that we know of uses ASCII, but with mark parity, meaning that the leftmost bit in the byte is always one. What this means is that on those systems, characters have numeric values from 128 to 255. Finally, large mainframe systems use the EBCDIC character set, which uses all 256 values. While there are other character sets in use on some older systems, they are not really worth worrying about.

function ord(str,    c)
{
    # only first character is of interest
    c = substr(str, 1, 1)
    return _ord_[c]
}

function chr(c)
{
    # force c to be numeric by adding 0
    return sprintf("%c", c + 0)
}

#### test code ####
# BEGIN    \
# {
#    for (;;) {
#        printf("enter a character: ")
#        if (getline var <= 0)
#            break
#        printf("ord(%s) = %d\n", var, ord(var))
#    }
# }

An obvious improvement to these functions would be to move the code for the _ord_init function into the body of the BEGIN rule. It was written this way initially for ease of development.

There is a "test program" in a BEGIN rule, for testing the function. It is commented out for production use.


Go to the first, previous, next, last section, table of contents.