Skip Headers

Oracle® Database Globalization Support Guide
10g Release 1 (10.1)

Part Number B10749-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page
Previous
Go to next page
Next
View PDF

B
Unicode Character Code Assignments

This appendix offers an introduction to Unicode character assignments. This appendix contains:

Unicode Code Ranges

Table B-1 contains code ranges that have been allocated in Unicode for UTF-16 character codes.

Table B-1 Unicode Character Code Ranges for UTF-16 Character Codes  
Types of Characters First 16 Bits Second 16 Bits

ASCII

0000-007F

-

European (except ASCII), Arabic, Hebrew

0080-07FF

-

Iindic, Thai, certain symbols (such as the euro symbol), Chinese, Japanese, Korean

0800-0FFF

1000 - CFFF

D000 - D7FF

F900 - FFFF

-

Private Use Area #1

E000 - EFFF

F000 - F8FF

-

Supplementary characters: Additional Chinese, Japanese, and Korean characters; historic characters; musical symbols; mathematical symbols

D800 - D8BF

D8CO - DABF

DAC0 - DB7F

DC00 - DFFF

DC00 - DFFF

DC00 - DFFF

Private Use Area #2

DB80 - DBBF

DBC0 - DBFF

DC00 - DFFF

DC00 - DFFF

Table B-2 contains code ranges that have been allocated in Unicode for UTF-8 character codes.

Table B-2 Unicode Character Code Ranges for UTF-8 Character Codes  
Types of Characters First Byte Second Byte Third Byte Fourth Byte

ASCII

00 - 7F

-

-

-

European (except ASCII), Arabic, Hebrew

C2 - DF

80 - BF

-

-

Indic, Thai, certain symbols (such as the euro symbol), Chinese, Japanese, Korean

E0

E1 - EC

ED

EF

A0 - BF

80 - BF

80 - 9F

A4 - BF

80 - BF

80 - BF

80 - BF

80 - BF

-

Private Use Area #1

EE

EF

80 - BF

80 - A3

80 - BF

80 - BF

-

Supplementary characters: Additional Chinese, Japanese, and Korean characters; historic characters; musical symbols; mathematical symbols

F0

F1 - F2

F3

90 - BF

80 - BF

80 - AF

80 - BF

80 - BF

80 - BF

80 - BF

80 - BF

80 - BF

Private Use Area #2

F3

F4

B0 - BF

80 - 8F

80 - BF

80 - BF

80 - BF

80 - BF


Note:

Blank spaces represent nonapplicable code assignments. Character codes are shown in hexadecimal representation.


UTF-16 Encoding

As shown in Table B-1, UTF-16 character codes for some characters (Additional Chinese/Japanese/Korean characters and Private Use Area #2) are represented in two units of 16-bits. These are supplementary characters. A supplementary character consists of two 16-bit values. The first 16-bit value is encoded in the range from 0xD800 to 0xDBFF. The second 16-bit value is encoded in the range from 0xDC00 to 0xDFFF. With supplementary characters, UTF-16 character codes can represent more than one million characters. Without supplementary characters, only 65,536 characters can be represented. Oracle's AL16UTF16 character set supports supplementary characters.

See Also:

"Supplementary Characters"

UTF-8 Encoding

The UTF-8 character codes in Table B-2 show that the following conditions are true:

Oracle's AL32UTF8 character set supports 1-byte, 2-byte, 3-byte, and 4-byte values. Oracle's UTF8 character set supports 1-byte, 2-byte, and 3-byte values, but not 4-byte values.