The Plotutils Package - Cyrillic and Japanese

Go to the first, previous, next, last section, table of contents.

Cyrillic and Japanese fonts

The built-in fonts discussed in the previous section include Cyrillic and Japanese vector fonts. This section explains how these fonts are encoded, i.e., how their character maps are laid out. You may use the plotfont utility to display the character map for any font, including the Cyrillic and Japanese vector fonts. See section The plotfont Utility.

The HersheyCyrillic and HersheyCyrillic-Oblique fonts use an encoding called KOI8-R, a superset of ASCII that has become the de facto standard for Unix and networking applications in the former Soviet Union. Insofar as printable ASCII characters go, they resemble the HersheySerif vector font. But their upper halves are different. The byte range 0xc0...0xdf contains lower-case Cyrillic characters and the byte range 0xe0...0xff contains upper case Cyrillic characters. Additional Cyrillic characters are located at 0xa3 and 0xb3. For more on the encoding scheme, see the official KOI8-R Web page and Internet RFC 1489, which is available from the Information Sciences Institute.

The HersheyEUC font is a vector font that is is used for displaying Japanese text. It uses the 8-bit EUC-JP encoding. EUC stands for `extended Unix code', which is a scheme for encoding Japanese, and also other character sets (e.g., Greek and Cyrillic) as multibyte character strings. The format of EUC strings is explained in Ken Lunde's Understanding Japanese Information Processing (O'Reilly, 1993), which contains much additional information on Japanese text processing. See also his on-line supplement.

In the HersheyEUC font, characters in the printable ASCII range, 0x20...0x7e, are similar to HersheySerif (their encoding is `JIS Roman', an ASCII variant standardized by the Japanese Industrial Standards Committee). Also, each successive pair of bytes in the 0xa1...0xfe range defines a single character in the JIS X0208 standard. The characters in the JIS X0208 standard include Japanese syllabic characters (Hiragana and Katakana), ideographic characters (Kanji), Roman, Greek, and Cyrillic alphabets, punctuation marks, and miscellaneous symbols. For example, the JIS X0208 standard indexes the 83 Hiragana as 0x2421...0x2473. To obtain the EUC code for any JIS X0208 character, you would add 0x80 to each byte (i.e., `set the high bit' on each byte). So the first of the 83 Hiragana (0x2421) would be encoded as the successive pair of bytes 0xa4 and 0xa1.

The implementation of the JIS X0208 standard in the HersheyEUC font is based on Dr. Hershey's digitizations, and is complete enough to be useful. All 83 Hiragana and 86 Katakana are available, though the little-used `half-width Katakana' are not supported. Also, 603 Kanji are available, including 596 of the 2965 JIS Level 1 (i.e., frequently used) Kanji. The Hiragana, the Katakana, and the available Kanji all have the same width. The file `kanji.doc', which on most systems is installed in `/usr/share/libplot' or `/usr/local/share/libplot', lists the 603 available Kanji. Each JIS X0208 character that is unavailable will be drawn as an `undefined character' glyph (a bundle of horizontal lines).

The eight Hewlett--Packard vector fonts in the ArcANK and StickANK typefaces are also used for displaying Japanese text. They are available when producing HP-GL output for the HP7550A graphics plotter and the HP758x, HP7595A and HP7596A drafting plotters. To ensure that they are available, you must set the environment variable or driver parameter HPGL_VERSION to "1.5".

ANK stands for Alphabet, Numerals, and Katakana. The ANK fonts use the `Kana-8' encoding. The lower half of each font uses the JIS Roman encoding, and the upper half contains half-width Katakana. Half-width Katakana are simplified Katakana that may need to be equipped with diacritical marks. The diacritical marks are included in the encoding, as separate characters.

Go to the first, previous, next, last section, table of contents.