Oracle® Database Globalization Support Guide 10g Release 1 (10.1) Part Number B10749-01 |
|
|
View PDF |
This appendix offers an introduction to Unicode character assignments. This appendix contains:
Table B-1 contains code ranges that have been allocated in Unicode for UTF-16 character codes.
Table B-2 contains code ranges that have been allocated in Unicode for UTF-8 character codes.
Note: Blank spaces represent nonapplicable code assignments. Character codes are shown in hexadecimal representation. |
As shown in Table B-1, UTF-16 character codes for some characters (Additional Chinese/Japanese/Korean characters and Private Use Area #2) are represented in two units of 16-bits. These are supplementary characters. A supplementary character consists of two 16-bit values. The first 16-bit value is encoded in the range from 0xD800 to 0xDBFF. The second 16-bit value is encoded in the range from 0xDC00 to 0xDFFF. With supplementary characters, UTF-16 character codes can represent more than one million characters. Without supplementary characters, only 65,536 characters can be represented. Oracle's AL16UTF16 character set supports supplementary characters.
The UTF-8 character codes in Table B-2 show that the following conditions are true:
Oracle's AL32UTF8 character set supports 1-byte, 2-byte, 3-byte, and 4-byte values. Oracle's UTF8 character set supports 1-byte, 2-byte, and 3-byte values, but not 4-byte values.