Go to the first, previous, next, last section, table of contents.


Text Representations

Emacs has two text representations---two ways to represent text in a string or buffer. These are called unibyte and multibyte. Each string, and each buffer, uses one of these two representations. For most purposes, you can ignore the issue of representations, because Emacs converts text between them as appropriate. Occasionally in Lisp programming you will need to pay attention to the difference.

In unibyte representation, each character occupies one byte and therefore the possible character codes range from 0 to 255. Codes 0 through 127 are ASCII characters; the codes from 128 through 255 are used for one non-ASCII character set (you can choose which character set by setting the variable nonascii-insert-offset).

In multibyte representation, a character may occupy more than one byte, and as a result, the full range of Emacs character codes can be stored. The first byte of a multibyte character is always in the range 128 through 159 (octal 0200 through 0237). These values are called leading codes. The second and subsequent bytes of a multibyte character are always in the range 160 through 255 (octal 0240 through 0377); these values are trailing codes.

In a buffer, the buffer-local value of the variable enable-multibyte-characters specifies the representation used. The representation for a string is determined based on the string contents when the string is constructed.

Variable: enable-multibyte-characters
This variable specifies the current buffer's text representation. If it is non-nil, the buffer contains multibyte text; otherwise, it contains unibyte text.

You cannot set this variable directly; instead, use the function set-buffer-multibyte to change a buffer's representation.

Variable: default-enable-multibyte-characters
This variable's value is entirely equivalent to (default-value 'enable-multibyte-characters), and setting this variable changes that default value. Setting the local binding of enable-multibyte-characters in a specific buffer is not allowed, but changing the default value is supported, and it is a reasonable thing to do, because it has no effect on existing buffers.

The `--unibyte' command line option does its job by setting the default value to nil early in startup.

Function: multibyte-string-p string
Return t if string contains multibyte characters.


Go to the first, previous, next, last section, table of contents.