Skip Headers

PL/SQL Packages and Types Reference
10g Release 1 (10.1)

Part Number B10802-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page
Previous
Go to next page
Next
View PDF

157
UTL_I18N

UTL_I18N is a set of services that help developers build multilingual applications. The Globalization Development Kit provides a set of tools that are designed to help developers with minimal experience in internationalization development effectively write multilingual applications.

See Also:

Oracle Database Globalization Support Guide



The chapter contains the following topics:


Using UTL_I18n


Overview

The UTL_I18N PL/SQL package consists of the following categories of services:


Constants

SHIFT_IN    CONSTANT PLS_INTEGER :=0;
SHIFT_OUT   CONSTANT PLS_INTEGER :=1;

Flags

ORACLE_TO_IANA  CONSTANT PLS_INTEGER :=0;
IANA_TO_ORACLE  CONSTANT PLS_INTEGER :=1;
MAIL_GENERIC    CONSTANT PLS_INTEGER :=0;
MAIL_WINDOWS    CONSTANT PLS_INTEGER :=1;
GENERIC_CONTEXT CONSTANT PLS_INTEGER :=0;
MAIL_CONTEXT    CONSTANT PLS_INTEGER :=1;

Summary of UTL_I18N Subprograms

Table 157-1  UTL_I18N Package Subprograms
Procedure Description

ESCAPE_REFERENCE Function

Specifies an escape sequence for predefined characters and multibyte characters that cannot be converted to the character set used by an HTML or XML document

GET_DEFAULT_CHARSET Function

Returns the default Oracle character set name or the default e-mail safe character set name from an Oracle language name.

MAP_CHARSET Function

  • Maps an Oracle character set name to an IANA character set name
  • Maps an IANA character set name to an Oracle character set name
  • Maps an Oracle character set name to an e-mail safe character set name

MAP_LANGUAGE_FROM_ISO Function

Returns an Oracle language name from an ISO locale name

MAP_LOCALE_TO_ISO Function

Returns an ISO locale name from the Oracle language and territory name

MAP_TERRITORY_FROM_ISO Function

Returns an Oracle territory name from an ISO locale name

RAW_TO_CHAR Functions

Converts RAW data that is not encoded in the database character set into a VARCHAR2 string

RAW_TO_NCHAR Functions

Converts RAW data that is not encoded in the national character set into an NVARCHAR2 string

STRING_TO_RAW Function

Converts a VARCHAR2 or NVARCHAR2 string to another character set. The result is returned as a RAW datatype.

UNESCAPE_REFERENCE Function

Returns a string from an input string that contains escape sequences


ESCAPE_REFERENCE Function

This function provides a way to specify an escape sequence for predefined characters and multibyte characters that cannot be converted to the character set used by an HTML or XML document.

For example, < (less than symbol) has a special meaning in HTML. To display < as a character, encode it as the escape sequence &lt; . In the same way, you can specify how multibyte characters are displayed when they are not part of the character set encoding of an HTML or XML document. For example, if you encode a page in the ZHT16BIG5 character set, then this function checks every character. If it finds a character that is not a Chinese character, then it returns an escape character.

Syntax

UTL_I18N.ESCAPE_REFERENCE( str            IN VARCHAR2 CHARCTER SET ANY_CS,
                           page_cs_name   IN VARCHAR2 DEFAULT NULL)
RETURN VARCHAR2 CHARACTER SET str%CHARSET;

Parameters

Table 157-2  ESCAPE_REFERENCE Function Parameters  
Parameter Description

str

Specifies the input string

page_cs_name

Specifies the character set encoding of the HTML or XML document. If page_cs_name is NULL, then the database character set is used for CHAR data and the national character set is used for NCHAR data.

Usage Notes

If the user specifies an invalid character set or a NULL string, then the function returns a NULL string.

Examples

UTL_I18N.ESCAPE_REFERENCE('ab'||chr(170),'us7ascii')

This returns 'ab&#xaa;'.


GET_DEFAULT_CHARSET Function

This function returns the default Oracle character set name or the default e-mail safe character set name from an Oracle language name.

See Also:

"MAP_CHARSET Function" for an explanation of an e-mail safe character set

Syntax

UTL_I18N.GET_DEFAULT_CHARSET( language  IN VARCHAR2,
                              context   IN PLS_INTEGER DEFAULT GENERIC_CONTEXT,
                              iswindows IN BOOLEAN DEFAULT FALSE)
RETURN VARCHAR2;

Parameters

Table 157-3  GET_DEFAULT_CHARSET Function Parameters  
Parameter Description

language

Specifies a valid Oracle language

context

GENERIC_CONTEXT | MAIL_CONTEXT

GENERIC_CONTEXT: Return the default character set for general cases

MAIL_CONTEXT: Return the default e-mail safe character set name

iswindows

If context is set as MAIL_CONTEXT, then iswindows should be set to TRUE if the platform is Windows and FALSE if the platform is not Windows. The default is FALSE.

iswindows has no effect if context is set as GENERIC_CONTEXT.

Usage Notes

If the user specifies an invalid language name or an invalid flag, then the function returns a NULL string.

Examples

GENERIC_CONTEXT, iswindows=FALSE

UTL_I18N.GET_DEFAULT_CHARSET('French', UTL_I18N.GENERIC_CONTEXT, FALSE)

This returns 'WE8ISO8859P1'.

MAIL_CONTEXT, iswindows=TRUE

UTL_I18N.GET_DEFAULT_CHARSET('French', UTL_I18N.MAIL_CONTEXT, TRUE)

This returns 'WE8MSWIN1252'.

MAIL_CONTEXT, iswindows=FALSE

UTL_I18N.GET_DEFAULT_CHARSET('French', UTL_I18N.MAIL_CONTEXT, FALSE)

This returns 'WE8ISO8859P1'.


MAP_CHARSET Function

This function:

Syntax

UTL_I18N.MAP_CHARSET( charset   IN VARCHAR2,
                      context   IN PLS_INTEGER DEFAULT GENERIC_CONTEXT,
                      flag      IN PLS_INTEGER DEFAULT ORACLE_TO_IANA)
RETURN VARCHAR2;

Parameters

Table 157-4  MAP_CHARSET Function Parameters  
Parameter Description

charset

Specifies the character set name to be mapped. The mapping is case-insensitive.

context

GENERIC_CONTEXT | MAIL_CONTEXT

GENERIC_CONTEXT: The mapping is between an Oracle character set name and an IANA character set name. This is the default value.

MAIL_CONTEXT: The mapping is between an Oracle character set name and an e-mail safe character set name.

flag

  • ORACLE_TO_IANA | IANA_TO_ORACLE if GENERIC_CONTEXT is set

    ORACLE_TO_IANA: Map from an Oracle character set name to an IANA character set name. This is the default.

    IANA_TO_ORACLE: Map from an IANA character set name to an Oracle character set name.

  • MAIL_GENERIC | MAIL_WINDOWS if MAIL_CONTEXT is set

    MAIL_GENERIC: Map from an Oracle character set name to an e-mail safe character set name on a non-Windows platform

    MAIL_WINDOWS: Map from an Oracle character set name to an e-mail safe character set name on a Windows platform

Usage Notes

An e-mail safe character set is an Oracle character set that is commonly used by applications when they submit e-mail messages. The character set is usually used to convert contents in the database character set to e-mail safe contents. To specify the character set name in the mail header, you should use the corresponding IANA character set name obtained by calling the MAP_CHARSET function with the ORACLE_TO_IANA option, providing the e-mail safe character set name as input.

For example, no e-mail client recognizes message contents in the WE8DEC character set, whose corresponding IANA name is DEC-MCS. If WE8DEC is passed to the MAP_CHARSET function with the MAIL_CONTEXT option, then the function returns WE8ISO8859P1. Its corresponding IANA name, ISO-8859-1, is recognized by most e-mail clients.

The steps in this example are as follows:

  1. Call the MAP_CHARSET function with the MAIL_CONTEXT | MAIL_GENERIC option with the database character set name, WE8DEC. The result is WE8ISO8859P1.
  2. Convert the contents stored in the database to WE8ISO8859P1.
  3. Call the MAP_CHARSET function with the ORACLE_TO_IANA | GENERIC_CONTEXT option with the e-mail safe character set, WE8ISO8859P1. The result is ISO-8859-1.
  4. Specify ISO-8859-1 in the mail header when the e-mail message is submitted.

The function returns a character set name if a match is found. If no match is found or if the flag is invalid, then it returns NULL.


Note:

Many Oracle character sets can map to one e-mail safe character set. There is no function that maps an e-mail safe character set to an Oracle character set name.


Examples

Generic Context

UTL_I18N.MAP_CHARSET('iso-8859-1',UTL_I18N.GENERIC_CONTEXT,UTL_I18N.IANA_TO_
ORACLE)

This returns 'WE8ISO8859P1'.

Context

UTL_I18N.MAP_CHARSET('WE8DEC', utl_i18n.mail_context,  utl_i18n.mail_generic) 

This returns 'WE8ISO8859P1'.

See Also:

Oracle Database Globalization Support Guide for a list of valid Oracle character sets


MAP_LANGUAGE_FROM_ISO Function

This function returns an Oracle language name from an ISO locale name.

Syntax

UTL_I18N.MAP_LANGUAGE_FROM_ISO( isolocale IN VARCHAR2)
RETURN VARCHAR2;

Parameters

Table 157-5  MAP_LANGUAGE_FROM_ISO Function Parameters  
Parameter Description

isolocale

Specifies the ISO locale. The mapping is case-insensitive.

Usage Notes

If the user specifies an invalid locale string, then the function returns a NULL string.

If the user specifies a locale string that includes only the language (for example, en_ instead of en_US), then the function returns the default language name for the specified language (for example, American).

Examples

UTL_I18N.MAP_LANGUAGE_FROM_ISO('en_US')

This returns 'American'.

See Also:

Oracle Database Globalization Support Guide for a list of valid Oracle languages


MAP_LOCALE_TO_ISO Function

This function returns an ISO locale name from an Oracle language name and an Oracle territory name. A valid string must include at least one of the following: a valid Oracle language name or a valid Oracle territory name.

Syntax

UTL_I18N.MAP_LOCALE_TO_ISO( ora_language   IN VARCHAR2,
                            ora_territory  IN VARCHAR2)
RETURN VARCHAR2;

Parameters

Table 157-6  MAP_LOCALE_TO_ISO Function Parameters  
Parameter Description

ora_language

Specifies an Oracle language name. It is case-insensitive.

ora_territory

Specifies an Oracle territory name. It is case-insensitive.

Usage Notes

If the user specifies an invalid string, then the function returns a NULL string.

Examples

UTL_I18N.MAP_LOCALE_TO_ISO('American','America')

This returns 'en_US'.

See Also:

Oracle Database Globalization Support Guide for a list of valid Oracle languages and territories


MAP_TERRITORY_FROM_ISO Function

This function returns an Oracle territory name from an ISO locale.

Syntax

UTL_I18N.MAP_TERRITORY_FROM_ISO( isolocale IN VARCHAR2)
RETURN VARCHAR2;

Parameters

Table 157-7  MAP_TERRITORY_FROM_ISO Function Parameters  
Parameter Description

isolocale

Specifies the ISO locale. The mapping is case-insensitive.

Usage Notes

If the user specifies an invalid locale string, then the function returns a NULL string.

If the user specifies a locale string that includes only the territory (for example, _fr instead of fr_fr), then the function returns the default territory name for the specified territory (for example, French).

Examples

UTL_I18N.MAP_TERRITORY_FROM_ISO('en_US')

This returns 'America'.

See Also:

Oracle Database Globalization Support Guide for a list of valid Oracle territories


RAW_TO_CHAR Functions

This function converts RAW data from a valid Oracle character set to a VARCHAR2 string in the database character set.

The function is overloaded. The different forms of functionality are described along with the syntax declarations.

Syntax

Buffer Conversion:

UTL_I18N.RAW_TO_CHAR( data          IN RAW,
                      src_charset   IN VARCHAR2 DEFAULT NULL)
RETURN VARCHAR2;

Piecewise conversion converts raw data into character data piece by piece:

UTL_I18N.RAW_TO_CHAR( data            IN RAW,
                      src_charset     IN VARCHAR2 DEFAULT NULL,
                      scanned_length  OUT PLS_INTEGER,
                      shift_status    IN OUT PLS_INTEGER)
RETURN VARCHAR2;

Parameters

Table 157-8  RAW_TO_CHAR Function Parameters  
Parameter Description

data

Specifies the RAW data to be converted to a VARCHAR2 string

src_charset

Specifies the character set that the RAW data was derived from. If src_charset is NULL, then the database character set is used.

scanned_length

Specifies the number of bytes of source data scanned

shift_status

Specifies the shift status at the end of the scan. The user must set it to SHIFT_IN the first time it is called in piecewise conversion.

Note: ISO 2022 character sets use escape sequences instead of shift characters to indicate the encoding method. shift_status cannot hold the encoding method information that is provided by the escape sequences for the next function call. As a result, this function cannot be used to reconstruct ISO 2022 character from raw data in a piecewise way unless each unit of input can be guaranteed to be a closed string. A closed string begins and ends in a 7-bit escape state.

Usage Notes

If the user specifies an invalid character set, NULL data, or data whose length is 0, then the function returns a NULL string.

Examples

Buffer Conversion

UTL_I18N.RAW_TO_CHAR(hextoraw('616263646566C2AA'), 'utf8')

This returns the following string in the database character set:

'abcde'||chr(170)

Piecewise Conversion

UTL_I18N.RAW_TO_CHAR(hextoraw('616263646566C2AA'),'utf8',shf,slen)

This expression returns the following string in the database character set:

'abcde'||chr(170)

It also sets shf to SHIFT_IN and slen to 8.

The following example converts data from the Internet piece by piece to the database character set.

rvalue RAW(1050); 
  nvalue VARCHAR2(1024); 
  conversion_state  PLS_INTEGER = 0; 
  converted_len   PLS_INTEGER; 
  rtemp  RAW(10) = ''; 
  conn   utl_tcp.connection; 
  tlen PLS_INTEGER;

  ... 
  conn := utl_tcp.open_connection ( remote_host => 'localhost', 
                                    remote_port => 2000); 
  LOOP 
      tlen := utl_tcp.read_raw(conn, rvalue, 1024); 
      rvalue := utl_raw.concat(rtemp, rvalue); 
      nvalue := utl_i18n.raw_to_char(rvalue, 'JA16SJIS', converted_len, 
conversion_stat); 
      if (converted_len < utl_raw.length(rvalue) ) 
      then 
        rtemp := utl_raw.substr(rvalue, converted_len+1); 
      else 
        rtemp := ''; 
      end if; 
      /* do anything you want with nvalue */ 
      /* e.g htp.prn(nvalue); */ 
    END LOOP; 
    utl_tcp.close_connection(conn); 
  EXCEPTION 
    WHEN utl_tcp.end_of_input THEN 
      utl_tcp.close_connection(conn); 

END;


RAW_TO_NCHAR Functions

This function converts RAW data from a valid Oracle character set to an NVARCHAR2 string in the national character set.

The function is overloaded. The different forms of functionality are described along with the syntax declarations.

Syntax

Buffer Conversion:

UTL_I18N.RAW_TO_NCHAR( data         IN RAW,
                       src_charset  IN VARCHAR2 DEFAULT NULL)
 RETURN NVARCHAR2;

Piecewise conversion converts raw data into character data piece by piece:

UTL_I18N.RAW_TO_NCHAR( data            IN RAW,
                      src_charset      IN VARCHAR2 DEFAULT NULL,
                      scanned_length   OUT PLS_INTEGER,
                      shift_status     IN OUT PLS_INTEGER)

RETURN NVARCHAR2;

Parameters

Table 157-9  RAW_TO_NCHAR Function Parameters  
Parameter Description

data

Specifies the RAW data to be converted to an NVARCHAR2 string

src_charset

Specifies the character set that the RAW data was derived from. If src_charset is NULL, then the database character set is used.

scanned_length

Specifies the number of bytes of source data scanned

shift_status

Specifies the shift status at the end of the scan. The user must set it to SHIFT_IN the first time it is called in piecewise conversion.

Note: ISO 2022 character sets use escape sequences instead of shift characters to indicate the encoding method. shift_status cannot hold the encoding method information that is provided by the escape sequences for the next function call. As a result, this function cannot be used to reconstruct ISO 2022 character from raw data in a piecewise way unless each unit of input can be guaranteed to be a closed string. A closed string begins and ends in a 7-bit escape state.

Usage Notes

If the user specifies an invalid character set, NULL data, or data whose length is 0, then the function returns a NULL string.

Examples

Buffer Conversion

UTL_I18N.RAW_TO_NCHAR(hextoraw('616263646566C2AA'),'utf8')

This returns the following string in the national character set:

'abcde'||chr(170)

Piecewise Conversion

UTL_I18N.RAW_TO_NCHAR(hextoraw('616263646566C2AA'),'utf8', shf, slen)

This expression returns the following string in the national character set:

'abcde'||chr(170)

It also sets shf to SHIFT_IN and slen to 8.

The following example converts data from the Internet piece by piece to the national character set.

rvalue RAW(1050); 
  nvalue NVARCHAR2(1024); 
  converstion_state  PLS_INTEGER = 0; 
  converted_len   PLS_INTEGER; 
  rtemp  RAW(10) = ''; 
  conn   utl_tcp.connection; 
  tlen PLS_INTEGER;

  ... 
  conn := utl_tcp.open_connection ( remote_host => 'localhost', 
                                    remote_port => 2000); 
  LOOP 
      tlen := utl_tcp.read_raw(conn, rvalue, 1024); 
      rvalue := utl_raw.concat(rtemp, rvalue); 
      nvalue := utl_i18n.raw_to_nchar(rvalue, 'JA16SJIS', converted_len, 
conversion_stat); 
      if (converted_len < utl_raw.length(rvalue) ) 
      then 
        rtemp := utl_raw.substr(rvalue, converted_len+1); 
      else 
        rtemp := ''; 
      end if; 
      /* do anything you want with nvalue */ 
      /* e.g htp.prn(nvalue); */ 
    END LOOP; 
    utl_tcp.close_connection(conn); 
  EXCEPTION 
    WHEN utl_tcp.end_of_input THEN 
      utl_tcp.close_connection(conn); 
  END; 

STRING_TO_RAW Function

This function converts a VARCHAR2 or NVARCHAR2 string to another valid Oracle character set and returns the result as RAW data.

Syntax

UTL_I18N.STRING_TO_RAW( data          IN VARCHAR2 CHARACTER SET ANY_CS,
                        dst_charset   IN VARCHAR2 DEFAULT NULL)
RETURN RAW;

Parameters

Table 157-10  STRING_TO_RAW Function Parameters
Parameter Description

data

Specifies the VARCHAR2 or NVARCHAR2 string to convert

dst_charset

Specifies the destination character set. If dst_charset is NULL, then the database character set is used for CHAR data and the national character set is used for NCHAR data.

Usage Notes

If the user specifies an invalid character set, a NULL string, or a string whose length is 0, then the function returns a NULL string.

Examples

DECLARE 
    r raw(50); 
    s varchar2(20); 
  BEGIN 
    s:='abcdef'||chr(170); 
    r:=utl_i18n.string_to_raw(s,'utf8'); 
    dbms_output.put_line(rawtohex(r)); 
  end; 
/ 

This returns a hex value of '616263646566C2AA'.


UNESCAPE_REFERENCE Function

This function returns a string from an input string that contains escape sequences. It decodes each escape sequence to the corresponding character value.

See Also:

"ESCAPE_REFERENCE Function" for more information about escape sequences

Syntax

UTL_I18N.UNESCAPE_REFERENCE( str IN VARCHAR2 CHARACTER SET ANY_CS)
RETURN VARCHAR2 CHARACTER SET str%CHARSET;

Parameters

Table 157-11  UNESCAPE_REFERENCE Function Parameters  
Parameter Description

str

Specifies the input string

Usage Notes

If the user specifies a NULL string or a string whose length is 0, then the function returns a NULL string. If the function fails, then it returns the original string.

Examples

UTL_I18N.UNESCAPE_REFERENCE('ab&#xaa;') 

This returns 'ab'||chr(170).