Skip Headers

Oracle® Database Globalization Support Guide
10g Release 1 (10.1)

Part Number B10749-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page
Previous
Go to next page
Next
View PDF

13
Customizing Locale

This chapter shows how to customize locale data. It includes the following topics:

Overview of the Oracle Locale Builder Utility

The Oracle Locale Builder offers an easy and efficient way to customize locale data. It provides a graphical user interface through which you can easily view, modify, and define locale-specific data. It extracts data from the text and binary definition files and presents them in a readable format so that you can process the information without worrying about the formats used in these files.

The Oracle Locale Builder manages four types of locale definitions: language, territory, character set, and linguistic sort. It also supports user-defined characters and customized linguistic rules. You can view definitions in existing text and binary definition files and make changes to them or create your own definitions.

This section contains the following topics:

Configuring Unicode Fonts for the Oracle Locale Builder

The Oracle Locale Builder uses Unicode characters in many of its functions. For example, it shows the mapping of local character code points to Unicode code points. Oracle Locale Builder depends on the local fonts that are available on the operating system where the characters are rendered. Therefore, Oracle Corporation recommends that you use a Unicode font to fully support the Oracle Locale Builder. If a character cannot be rendered with your local fonts, then it will probably be displayed as an empty box.

Font Configuration on Windows

There are many Windows TrueType and OpenType fonts that support Unicode. Oracle Corporation recommends using the Arial Unicode MS font from Microsoft, because it includes about 51,000 glyphs and supports most of the characters in Unicode 3.2.

After installing the Unicode font, add the font to the Java Runtime font.properties file so it can be used by the Oracle Locale Builder. The font.properties file is located in the $JAVAHOME/jre/lib directory. For example, for the Arial Unicode MS font, add the following entries to the font.properties file:

dialog.n=Arial Unicode MS, DEFAULT_CHARSET
dialoginput.n=Arial Unicode MS, DEFAULT_CHARSET
serif.n=Arial Unicode MS, DEFAULT_CHARSET
sansserif.n=Arial Unicode MS, DEFAULT_CHARSET

n is the next available sequence number to assign to the Arial Unicode MS font in the font list. Java Runtime searches the font mapping list for each virtual font and use the first font available on your system.

After you edit the font.properties file, restart the Oracle Locale Builder.

See Also:

Sun's internationalization Web site for more information about the font.properties file

Font Configuration on Other Platforms

There are fewer choices of Unicode fonts for non-Windows platforms than for Windows platforms. If you cannot find a Unicode font with satisfactory character coverage, then use multiple fonts for different languages. Install each font and add the font entries into the font.properties file using the steps described for the Windows platform.

For example, to display Japanese characters on Sun Solaris using the font ricoh-hg mincho, add entries to the existing font.properties file in $JAVAHOME/lib in the dialog, dialoginput, serif, and sansserif sections. For example:

serif.plain.3=-ricoh-hg mincho l-medium-r-normal--*-%d-*-*-m-*-jisx0201.1976-0

Note:

Depending on the operating system locale, the locale-specific font.properties file might be used. For example, if the current operating system locale is ja_JP.eucJP on Sun Solaris, then font.properties.ja may be used.


See Also:

Your operating system documentation for more information about available fonts

The Oracle Locale Builder User Interface

Ensure that the ORACLE_HOME parameter is set before starting Oracle Locale Builder.

In the UNIX operating system, start the Oracle Locale Builder by changing into the $ORACLE_HOME/nls/lbuilder directory and issuing the following command:

% ./lbuilder

In a Windows operating system, start the Oracle Locale Builder from the Start menu as follows: Start > Programs > Oracle-OraHome10 > Configuration and Migration Tools > Locale Builder. You can also start it from the DOS prompt by entering the %ORACLE_HOME%\nls\lbuilder directory and executing the lbuilder.bat command.

When you start the Oracle Locale Builder, the screen shown in Figure 13-1 appears.

Figure 13-1 Oracle Locale Builder Utility

Text description of startup.gif follows.

Text description of the illustration startup.gif

Oracle Locale Builder Windows and Dialog Boxes

Before using Oracle Locale Builder for a specific task, you should become familiar with tab pages and dialog boxes that include the following:

Existing Definitions Dialog Box

When you choose New Language, New Territory, New Character Set, or New Linguistic Sort, the first tab page that you see is labelled General. Click Show Existing Definitions to see the Existing Definitions dialog box.

The Existing Definitions dialog box enables you to open locale objects by name. If you know a specific language, territory, linguistic sort (collation), or character set that you want to start with, then click its displayed name. For example, you can open the AMERICAN language definition file as shown in Figure 13-2.

Figure 13-2 Existing Definitions Dialog Box

Text description of pic17.gif follows.

Text description of the illustration pic17.gif

Choosing AMERICAN opens the lx00001.nlb file. An NLB file is a binary file that contains the settings for a specific language, territory, character set, or linguistic sort.

Language and territory abbreviations are for reference only and cannot be opened.

Session Log Dialog Box

Choose Tools > View Log to see the Session Log dialog box. The Session Log dialog box shows what actions have been taken in the current session. Click Save Log to keep a record of all changes. Figure 13-3 shows an example of a session log.

Figure 13-3 Session Log Dialog Box

Text description of pic22.gif follows.

Text description of the illustration pic22.gif

Preview NLT Tab Page

The NLT file is an XML file with the file extension .nlt that shows the settings for a specific language, territory, character set, or linguistic sort. The Preview NLT tab page presents a readable form of the file so that you can see whether the changes you have made look correct. You cannot modify the NLT file from the Preview NLT tab page. You must use the specific elements of the Oracle Locale Builder to modify the NLT file.

Figure 13-4 shows an example of the Preview NLT tab page for a user-defined language called AMERICAN FRENCH.

Figure 13-4 Previewing the NLT File

Text description of langnlt.gif follows.

Text description of the illustration langnlt.gif

Open File Dialog Box

You can see the Open File dialog box by choosing File > Open > By File Name. Then choose the NLB file that you want to modify or use as a template. An NLB file is a binary file with the file extension .nlb that contains the binary equivalent of the information in the NLT file. Figure 13-5 shows the Open File dialog box with the lx00001.nlb file selected. The Preview pane shows that this NLB file is for the AMERICAN language.

Figure 13-5 Open File Dialog Box

Text description of pic16.gif follows.

Text description of the illustration pic16.gif

Creating a New Language Definition with the Oracle Locale Builder

This section shows how to create a new language based on French. This new language is called AMERICAN FRENCH. First, open FRENCH from the Existing Definitions dialog box. Then change the language name to AMERICAN FRENCH and the Language Abbreviation to AF in the General tab page. Retain the default values for the other fields. Figure 13-6 shows the resulting General tab page.

Figure 13-6 Language General Information

Text description of lang.gif follows.

Text description of the illustration lang.gif

The following restrictions apply when choosing names for locale objects such as languages:

The valid range for the Language ID field for a user-defined language is 1,000 to 10,000. You can accept the value provided by Oracle Locale Builder or you can specify a value within the range.


Note:

Only certain ID ranges are valid values for user-defined LANGUAGE, TERRITORY, CHARACTER SET, MONOLINGUAL COLLATION, and MULTILINGUAL COLLATION definitions. The ranges are specified in the sections of this chapter that concern each type of user-defined locale object.


Figure 13-7 shows how to set month names using the Month Names tab page.

Figure 13-7 Month Names Tab Page

Text description of langmon.gif follows.

Text description of the illustration langmon.gif

All names are shown as they appear in the NLT file. If you choose Yes for capitalization, then the month names are capitalized in your application, but they do not appear capitalized in the Month Names tab page.

Figure 13-8 shows the Day Names tab page.

Figure 13-8 Day Names Tab Page

Text description of langdays.gif follows.

Text description of the illustration langdays.gif

You can choose day names for your user-defined language. All names are shown as they appear in the NLT file. If you choose Yes for capitalization, then the day names are capitalized in your application, but they do not appear capitalized in the Day Names tab page.

Creating a New Territory Definition with the Oracle Locale Builder

This section shows how to create a new territory called REDWOOD SHORES and use RS as a territory abbreviation. The new territory is not based on an existing territory definition.

The basic tasks are as follows:

Figure 13-9 shows the General tab page with REDWOOD SHORES specified as the Territory Name, 1001 specified as the Territory ID, and RS specified as the Territory Abbreviation.

Figure 13-9 General Tab Page for Territories

Text description of terr.gif follows.

Text description of the illustration terr.gif

The valid range for Territory ID for a user-defined territory is 1000 to 10000.

Figure 13-10 shows settings for calendar formats in the Calendar tab page.

Figure 13-10 Choosing Calendar Formats

Text description of terrcal.gif follows.

Text description of the illustration terrcal.gif

Tuesday is set as the first day of the week, and the first week of the calendar year is set as an ISO week. The screen displays a sample calendar.

See Also:

Figure 13-11 shows the Date&Time tab page.

Figure 13-11 Choosing Date and Time Formats

Text description of terrdate.gif follows.

Text description of the illustration terrdate.gif

When you choose a format from a list, Oracle Locale Builder displays an example of the format. In this case, the Short Date Format is set to DD-MM-YY. The Short Time Format is set to HH24:MI:SS. The Oracle Date Format is set to DD-MM-YY. The Long Date Format is set to fmDay, Month dd, yyyy. The TimeStamp Timezone Format is not set.

You can also enter your own formats instead of using the selection from the drop-down menus.

See Also:

Figure 13-12 shows the Number tab page.

Figure 13-12 Choosing Number Formats

Text description of terrnum.gif follows.

Text description of the illustration terrnum.gif

A period has been chosen for the Decimal Symbol. The Negative Sign Location is specified to be on the left of the number. The Numeric Group Separator is a comma. The Number Grouping is specified as 3 digits. The List Separator is a comma. The Measurement System is metric. The Rounding Indicator is 4.

You can enter your own values instead of using values in the lists.

When you choose a format from a list, Oracle Locale Builder displays an example of the format.

See Also:

"Numeric Formats"

Figure 13-13 shows settings for currency formats in the Monetary tab page.

Figure 13-13 Choosing Currency Formats

Text description of terrmon.gif follows.

Text description of the illustration terrmon.gif

The Local Currency Symbol is set to $. The Alternative Currency Symbol is the euro symbol. The Currency Presentation shows one of several possible sequences of the local currency symbol, the debit symbol, and the number. The Decimal Symbol is the period. The Group Separator is the comma. The Monetary Number Grouping is 3. The Monetary Precision, or number of digits after the decimal symbol, is 3. The Credit Symbol is +. The Debit Symbol is -. The International Currency Separator is a blank space, so it is not visible in the field. The International Currency Symbol (ISO currency symbol) is USD. Oracle Locale Builder displays examples of the currency formats you have selected.

You can enter your own values instead of using the lists.

See Also:

"Currency Formats"

The rest of this section contains the following topics:

Customizing Time Zone Data

The time zone files contain the valid time zone names. The following information is included for each time zone:

Two time zone files are included in the Oracle home directory. The default file is oracore/zoneinfo/timezone.dat. More time zones are included in oracore/zoneinfo/timezlrg.dat.

See Also:

"Choosing a Time Zone File" for more information about the contents of the time zone files and how to install the larger time zone file

Customizing Calendars with the NLS Calendar Utility

Oracle supports several calendars. All of them are defined with data derived from Oracle's globalization support, but some of them may require the addition of ruler eras or deviation days in the future. To add this information without waiting for a new release of the Oracle database server, you can use an external file that is automatically loaded when the calendar functions are executed.

Calendar data is first defined in a text file. The text definition file must be converted into binary format. You can use the NLS Calendar Utility (lxegen) to convert the text definition file into binary format.

The name of the text definition file and its location are hard-coded and depend on the platform. On UNIX platforms, the file name is lxecal.nlt. It is located in the $ORACLE_HOME/nls/demo directory. A sample text definition file is included in the directory.

The lxegen utility produces a binary file from the text definition file. The name of the binary file is also hard-coded and depends on the platform. On UNIX platforms, the name of the binary file is lxecal.nlb. The binary file is generated in the same directory as the text file and overwrites an existing binary file.

After the binary file has been generated, it is automatically loaded during system initialization. Do not move or rename the file.

Invoke the calendar utility from the command line as follows:

% lxegen
See Also:
  • Operating system documentation for the location of the files on your system
  • "Calendar Systems"

Displaying a Code Chart with the Oracle Locale Builder

You can display and print the code charts of character sets with the Oracle Locale Builder. From the opening screen for Oracle Locale Builder, choose File > New > Character Set. Figure 13-14 shows the resulting screen.

Figure 13-14 General Tab Page for Character Sets

Text description of cs.gif follows.

Text description of the illustration cs.gif

Click Show Existing Definitions. Highlight the character set you wish to display. Figure 13-15 shows the Existing Definitions combo box with US7ASCII highlighted.

Figure 13-15 Choosing US7ASCII in the Existing Definitions Dialog Box

Text description of charsets.gif follows.

Text description of the illustration charsets.gif

Click Open to choose the character set. Figure 13-16 shows the General tab page when US7ASCII has been chosen.

Figure 13-16 General Tab Page When US7ASCII Has Been Chosen

Text description of csus.gif follows.

Text description of the illustration csus.gif

Click the Character Data Mapping tab. Figure 13-17 shows the Character Data Mapping tab page for US7ASCII.

Figure 13-17 Character Data Mapping Tab Page for US7ASCII

Text description of cschar.gif follows.

Text description of the illustration cschar.gif

Click View CodeChart. Figure 13-18 shows the code chart for US7ASCII.

Figure 13-18 US7ASCII Code Chart

Text description of cschart.gif follows.

Text description of the illustration cschart.gif

It shows the encoded value of each character in the local character set, the glyph associated with each character, and the Unicode value of each character in the local character set.

If you want to print the code chart, then click Print Page.

Creating a New Character Set Definition with the Oracle Locale Builder

You can customize a character set to meet specific user needs. You can extend an existing encoded character set definition. User-defined characters are often used to encode special characters that represent the following:

This section describes how Oracle supports user-defined characters. It includes the following topics:

Character Sets with User-Defined Characters

User-defined characters are typically supported within East Asian character sets. These East Asian character sets have at least one range of reserved code points for user-defined characters. For example, Japanese Shift-JIS preserves 1880 code points for user-defined characters. They are shown in Table 13-1.

Table 13-1 Shift JIS User-Defined Character Ranges  
Japanese Shift JIS User-Defined Character Range Number of Code Points

F040-F07E, F080-F0FC

188

F140-F17E, F180-F1FC

188

F240-F27E, F280-F2FC

188

F340-F37E, F380-F3FC

188

F440-F47E, F480-F4FC

188

F540-F57E, F580-F5FC

188

FF640-F67E, F680-F6FC

188

F740-F77E, F780-F7FC

188

F840-F87E, F880-F8FC

188

F940-F97E, F980-F9FC

188

The Oracle character sets listed in Table 13-2 contain predefined ranges that support user-defined characters.

Table 13-2 Oracle Character Sets with User-Defined Character Ranges  
Character Set Name Number of Code Points Available for User-Defined Characters

JA16DBCS

4370

JA16EBCDIC930

4370

JA16SJIS

1880

JA16SJISYEN

1880

KO16DBCS

1880

KO16MSWIN949

1880

ZHS16DBCS

1880

ZHS16GBK

2149

ZHT16DBCS

6204

ZHT16MSWIN950

6217

Oracle Character Set Conversion Architecture

The code point value that represents a particular character can vary among different character sets. A Japanese kanji character is shown in Figure 13-19.

Figure 13-19 Japanese Kanji Character

Text description of char2.gif follows.

Text description of the illustration char2.gif

The following table shows how the character is encoded in different character sets.

Unicode Encoding JA16SJIS Encoding JA16EUC Encoding JA16DBCS Encoding

4E9C

889F

B0A1

4867

In Oracle, all character sets are defined in terms of Unicode 3.2 code points. That is, each character is defined as a Unicode 3.2 code value. Character conversion takes place transparently to users by using Unicode as the intermediate form. For example, when a JA16SJIS client connects to a JA16EUC database, the character shown in Figure 13-19 has the code point value 889F when it is entered from the JA16SJIS client. It is internally converted to Unicode (with code point value 4E9C) and then converted to JA16EUC (code point value B0A1).

Unicode 3.2 Private Use Area

Unicode 3.2 reserves the range E000-F8FF for the Private Use Area (PUA). The PUA is intended for private use character definition by end users or vendors.

User-defined characters can be converted between two Oracle character sets by using Unicode 3.2 PUA as the intermediate form, the same as standard characters.

User-Defined Character Cross-References Between Character Sets

Cross-references between different character sets are required when registering user-defined characters across operating systems. Cross-references ensure that the user-defined characters can be converted correctly across the different character sets.

For example, when registering a user-defined character on both a Japanese Shift-JIS operating system and a Japanese IBM Host operating system, you may want to assign the F040 code point on the Shift-JIS operating system and the 6941 code point on the IBM Host operating system for this character so that Oracle can map this character correctly between the character sets JA16SJIS and JA16DBCS.

User-defined character cross-reference information can be found by viewing the character set definitions using the Oracle Locale Builder. For example, you can determine that both the Shift-JIS UDC value F040 and the IBM Host UDC value 6941 are mapped to the same Unicode PUA value E000.

See Also:

Appendix B, "Unicode Character Code Assignments"

Guidelines for Creating a New Character Set from an Existing Character Set

By default, the Oracle Locale Builder generates the next available character set ID for you. You can also choose your own character set ID. Use the following format for naming character set definition NLT files:

lx2dddd.nlt

dddd is the 4-digit character set ID in hex.

When you modify a character set, observe the following guidelines:

If you derive a new character set from an existing Oracle character set, then Oracle Corporation recommends using the following character set naming convention:

<Oracle_character_set_name><organization_name>EXT<version>

For example, if a company such as Sun Microsystems adds user-defined characters to the JA16EUC character set, then the following character set name is appropriate:

JA16EUCSUNWEXT1

The character set name contains the following parts:

Example: Creating a New Character Set Definition with the Oracle Locale Builder

This section shows how to create a new character set called MYCHARSET with 10001 for its Character Set ID. The example uses the WE8ISO8859P1 character set and adds 10 Chinese characters.

Figure 13-20 shows the General tab page for MYCHARSET.

Figure 13-20 General Tab Page for MYCHARSET

Text description of csmychar.gif follows.

Text description of the illustration csmychar.gif

Click Show Existing Definitions and choose the WE8ISO8859P1 character set from the Existing Definitions dialog box.

The ISO Character Set ID and Base Character Set ID fields are optional. The Base Character Set ID is used for inheriting values so that the properties of the base character set are used as a template. The Character Set ID is automatically generated, but you can override it. The valid range for a user-defined character set ID is 8000 to 8999 or 10000 to 20000.


Note:

If you are using Pro*COBOL, then choose a character set ID between 8000 and 8999.


The ISO Character Set ID field remains blank for user-defined character sets.

Figure 13-21 shows the Type Specification tab page.

Figure 13-21 Type Specification Tab Page

Text description of cstype.gif follows.

Text description of the illustration cstype.gif

The Character Set Category is ASCII_BASED. The BYTE_UNIQUE button is checked.

When you have chosen an existing character set, the fields for the Type Specification tab page should already be set to appropriate values. You should keep these values unless you have a specific reason for changing them. If you need to change the settings, then use the following guidelines:

Figure 13-22 shows how to add user-defined characters.

Figure 13-22 Importing User-Defined Character Data

Text description of csuser.gif follows.

Text description of the illustration csuser.gif

Open the Character Data Mapping tab page. Highlight the character that you want to add characters after in the character set. In this example, the 0xff local character value is highlighted.

You can add one character at a time or use a text file to import a large number of characters. In this example, a text file is imported. The first column is the local character value. The second column is the Unicode value. The file contains the following character values:

88a2 963f
88a3 54c0
88a4 611b
88a5 6328
88a6 59f6
88a7 9022
88a8 8475
88a9 831c
88aa 7a50
88ab 60aa

Choose File > Import > User-Defined Characters Data.

Figure 13-23 shows that the imported characters are added after 0xff in the character set.

Figure 13-23 New Characters in the Character Set

Text description of csnewchr.gif follows.

Text description of the illustration csnewchr.gif

Creating a New Linguistic Sort with the Oracle Locale Builder

This section shows how to create a new multilingual linguistic sort called MY_GENERIC_M with a collation ID of 10001. The GENERIC_M linguistic sort is used as the basis for the new linguistic sort. Figure 13-24 shows how to begin.

Figure 13-24 General Tab Page for Collation

Text description of co.gif follows.

Text description of the illustration co.gif

Settings for the flags are automatically derived. SWAP_WITH_NEXT is relevant for Thai and Lao sorts. REVERSE_SECONDARY is for French sorts. CANONICAL_EQUIVALENCE determines whether canonical rules are used. In this example, CANONICAL_EQUIVALENCE is checked.

The valid range for Collation ID (sort ID) for a user-defined sort is 1000 to 2000 for monolingual collation and 10000 to 11000 for multilingual collation.

See Also:

Figure 13-25 shows the Unicode Collation Sequence tab page.

Figure 13-25 Unicode Collation Sequence Tab Page

Text description of couni.gif follows.

Text description of the illustration couni.gif

This example customizes the linguistic sort by moving digits so that they sort after letters. Complete the following steps:

  1. Highlight the Unicode value that you want to move. In Figure 13-25, the \x0034 Unicode value is highlighted. Its location in the Unicode Collation Sequence is called a node.
  2. Click Cut. Select the location where you want to move the node.
  3. Click Paste. Clicking Paste opens the Paste Node dialog box, shown in Figure 13-26.

Figure 13-26 Paste Node Dialog Box

Text description of pastenod.gif follows.

Text description of the illustration pastenod.gif