Use From...‎ > ‎

Java Native Interface (JNI)

Overview

ICU4JNI is a subproject of ICU for Java™ (ICU4J). ICU4JNI provides full conformance with Unicode 3.1.1, enhanced functionality, increased performance, better cross language, and increased cross platform stability of results. ICU4JNI also provides greater flexibility, customization, and access to certain ICU4C native services from Java using the Java Native Interface (JNI). Currently, the following services are accessible through JNI:

  1. Character Conversion

  2. Collation

  3. Normalization

Character Conversion

Character conversion is the conversion of bytes in one charset specification to another. One of the problems in character conversion is that the mappings vary and are imprecise across various platforms. For example, the results of a conversion for a Shift-JIS byte stream to Unicode on an IBM® platform will not match the conversion on a Sun® Solaris platform. This service is useful in a situation where an application is multi-language and cannot afford differences in conversion output. It can also be used when an application requires a higher level of customization and flexibility of character conversion. The requirement for realizing performance gains is that the buffers passed to the converters should be large enough to offset the JNI overhead.

Conversion service can be accessed through the following APIs:

CharToByteConverterICU and ByteToCharConverterICU classes in the com.ibm.icu4jni converters package. These classes inherit from the CharToByteConverter and the ByteToCharConverter classes in the com.sun.converters package. This interface is limited in its functionality since the public conversion APIs like String, InputStream, and OutputStream cannot access ICU's converters unless the converters are integrated into the Java Virtual Machine (JVM). However, this requires access to JVM's source code ( please refer to the Readme for more information). If operations on byte arrays and char arrays can be afforded by the application (instead of relying on the Java API's conversion routines), then ICU's classes provide methods to instantiate converter objects and to perform the conversion. The following example shows this conversion:

try{

     CharToByteConverter cbConv =
CharToByteConverterICU.createConverter("gb-18030");
     char[] source = { '\u9001','\u3005','\u6458'} ;
     byte[] result =  new byte[source.length * cbConv.getMaxBytesPerChar()];
     cbConv.convert(source, 0, source.length,result,0,result.length);

}catch(Exception e){
... //do something interesting
}

The Charset, CharsetEncoderICU, CharsetDecoderICU, and CharsetProviderICU classes in the com.ibm.icu4jni.charset package. In Java 1.4, a new public API for character conversions will be added to provide a method for third party implementers to plug in their converters and enable the other public APIs to use them as well. ICU4JNI's classes are based on this new character conversion API. The following example uses ICU4JNI's classes:

try{
     Charset cs = Charset.forName("gb-18030");
     char[] source = { '\u9001','\u3005','\u6458'} ;
     CharBuffer cb = CharBuffer.wrap(source);
     ByteBuffer result = cs.encode(cb)

}catch(Exception e){
... //do something interesting
}
ByteBuffer bb = ByteBuffer.allocate(cs.newEncoder().maxBytesPerChar()));


try{

     Charset cs = Charset.forName("gb-18030");
     CharsetEncoder encoder = cs.newEncoder();
     char[] source = { '\u9001','\u3005','\u6458'} ;
     CharBuffer cb = CharBuffer.wrap(source);
     ByteBuffer bb = ByteBuffer.allocate(cs.newEncoder().maxBytesPerChar()));
     
     for (i=0; i<=temp.length; i++) {
         cb.limit(i);
         CoderResult result = encoder.encode(cb,bb,false);
     }
}catch(Exception e){
... //do something interesting
}

For more information on character conversion, see the ICU Conversion chapter.

Collation

Collation service provided by ICU is fully Unicode Collation Algorithm (UCA) and ISO 14651 compliant. The following lists some of the advantages of the ICU collation service over Java:

The following demonstrates how to create a collator:

try{
     Collator coll = Collator.createInstance(Locale("en", "US"));
}catch(ParseException e){
... //do something interesting
}

The following demonstrates how to compare strings:

try{
     Collator coll = Collator.createInstance(Locale("th", "TH"));
     String jp1 = new String("\u0e01");
     String jp2 = new String("\u0e01\u0e01");
     if(coll.compare(jp1,jp2)==Collator.RESULT_LESS){
            ...//compare succeeded do something
     }else{
            ...//failed do something
     }
}catch(ParseException e){
... //do something interesting
}

Normalization

Normalization converts text into a unique, equivalent form. Systems can normalize Unicode-encoded text into one particular sequence, such as normalizing composite character sequences into pre-composed characters. The semantics and use are similar to ICU4J Normalization service, except for character iteration functionality.

The following demonstrates how to use a normalizer:

try{
     String source = "\u00e0ardvark";
     String decomposed = "a\u0300ardvark";
     String composed =   "\u00e0ardvark";
     If(Normalizer.normalize(source,Normalizer.UNORM_NFC).equals(composed){
            ...// do something interesting
     }
     if(Normalizer.normalize(source,Normalizer.UNORM_NFD).equals(decomposed){
               ...// do something interesting
     }
}catch(ParseException e){
... //do something interesting
}

Comments