One commercial implementation of awk
supplies a built-in function,
ord
, which takes a character and returns the numeric value for that
character in the machine's character set. If the string passed to
ord
has more than one character, only the first one is used.
The inverse of this function is chr
(from the function of the same
name in Pascal), which takes a number and returns the corresponding character.
Both functions can be written very nicely in awk
; there is no real
reason to build them into the awk
interpreter.
# ord.awk --- do ord and chr # # Global identifiers: # _ord_: numerical values indexed by characters # _ord_init: function to initialize _ord_ # # Arnold Robbins # arnold@gnu.ai.mit.edu # Public Domain # 16 January, 1992 # 20 July, 1992, revised BEGIN { _ord_init() } function _ord_init( low, high, i, t) { low = sprintf("%c", 7) # BEL is ascii 7 if (low == "\a") { # regular ascii low = 0 high = 127 } else if (sprintf("%c", 128 + 7) == "\a") { # ascii, mark parity low = 128 high = 255 } else { # ebcdic(!) low = 0 high = 255 } for (i = low; i <= high; i++) { t = sprintf("%c", i) _ord_[t] = i } }
Some explanation of the numbers used by chr
is worthwhile.
The most prominent character set in use today is ASCII. Although an
eight-bit byte can hold 256 distinct values (from zero to 255), ASCII only
defines characters that use the values from zero to 127.(20)
At least one computer manufacturer that we know of
uses ASCII, but with mark parity, meaning that the leftmost bit in the byte
is always one. What this means is that on those systems, characters
have numeric values from 128 to 255.
Finally, large mainframe systems use the EBCDIC character set, which
uses all 256 values.
While there are other character sets in use on some older systems,
they are not really worth worrying about.
function ord(str, c) { # only first character is of interest c = substr(str, 1, 1) return _ord_[c] } function chr(c) { # force c to be numeric by adding 0 return sprintf("%c", c + 0) } #### test code #### # BEGIN \ # { # for (;;) { # printf("enter a character: ") # if (getline var <= 0) # break # printf("ord(%s) = %d\n", var, ord(var)) # } # }
An obvious improvement to these functions would be to move the code for the
_ord_init
function into the body of the BEGIN
rule. It was
written this way initially for ease of development.
There is a "test program" in a BEGIN
rule, for testing the
function. It is commented out for production use.
Go to the first, previous, next, last section, table of contents.