Summary: Nested Classes | Constants | Methods | Inherited Methods | [Expand All]

public final class

UCharacter

extends Object

java.lang.Object
↳	sun.text.normalizer.UCharacter

Class Overview

The UCharacter class provides extensions to the java.lang.Character class. These extensions provide support for Unicode 3.2 properties and together with the UTF16 class, provide support for supplementary characters (those with code points above U+FFFF).

Code points are represented in these API using ints. While it would be more convenient in Java to have a separate primitive datatype for them, ints suffice in the meantime.

To use this class please add the jar file name icu4j.jar to the class path, since it contains data files which supply the information used by this file.
E.g. In Windows
set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar.
Otherwise, another method would be to copy the files uprops.dat and unames.icu from the icu4j source subdirectory $ICU4J_SRC/src/com.ibm.icu.impl.data to your class directory $ICU4J_CLASS/com.ibm.icu.impl.data.

Aside from the additions for UTF-16 support, and the updated Unicode 3.1 properties, the main differences between UCharacter and Character are:

UCharacter is not designed to be a char wrapper and does not have APIs to which involves management of that single char.
These include:
- char charValue(),
- int compareTo(java.lang.Character, java.lang.Character), etc.
UCharacter does not include Character APIs that are deprecated, not does it include the Java-specific character information, such as boolean isJavaIdentifierPart(char ch).
Character maps characters 'A' - 'Z' and 'a' - 'z' to the numeric values '10' - '35'. UCharacter also does this in digit and getNumericValue, to adhere to the java semantics of these methods. New methods unicodeDigit, and getUnicodeNumericValue do not treat the above code points as having numeric values. This is a semantic change from ICU4J 1.3.1.

Further detail differences can be determined from the program com.ibm.icu.dev.test.lang.UCharacterCompare

This class is not subclassable

Summary

Nested Classes
interface	UCharacter.ECharacterCategory	This interface is deprecated. This is a draft API and might change in a future release of ICU.
interface	UCharacter.HangulSyllableType	Hangul Syllable Type constants.
interface	UCharacter.NumericType	Numeric Type constants.

Constants
int	MAX_VALUE	The highest Unicode code point value (scalar value) according to the Unicode Standard.
int	MIN_VALUE	The lowest Unicode code point value.
double	NO_NUMERIC_VALUE	Special value that is returned by getUnicodeNumericValue(int) when no numeric value is defined for a code point.
int	SUPPLEMENTARY_MIN_VALUE	The minimum value for Supplementary code points

Public Methods
static int	digit(int ch, int radix) Retrieves the numeric value of a decimal digit code point.
static String	foldCase(String str, boolean defaultmapping) The given string is mapped to its case folding equivalent according to UnicodeData.txt and CaseFolding.txt; if any character has no case folding equivalent, the character itself is returned.
static VersionInfo	getAge(int ch) Get the "age" of the code point.
static int	getCodePoint(char lead, char trail) Returns a code point corresponding to the two UTF16 characters.
static int	getDirection(int ch) Returns the Bidirection property of a code point.
static int	getIntPropertyValue(int ch, int type) Gets the property value for an Unicode property type of a code point.
static int	getType(int ch) Returns a value indicating a code point's Unicode category.
static double	getUnicodeNumericValue(int ch) Get the numeric value for a Unicode code point as defined in the Unicode Character Database.

[Expand]

Inherited Methods

From class java.lang.Object

Object	clone() Creates and returns a copy of this object.
boolean	equals(Object obj) Indicates whether some other object is "equal to" this one.
void	finalize() Called by the garbage collector on an object when garbage collection determines that there are no more references to the object.
final Class<?>	getClass() Returns the runtime class of this `Object`.
int	hashCode() Returns a hash code value for the object.
final void	notify() Wakes up a single thread that is waiting on this object's monitor.
final void	notifyAll() Wakes up all threads that are waiting on this object's monitor.
String	toString() Returns a string representation of the object.
final void	wait() Causes the current thread to wait until another thread invokes the `notify()` method or the `notifyAll()` method for this object.
final void	wait(long timeout, int nanos) Causes the current thread to wait until another thread invokes the `notify()` method or the `notifyAll()` method for this object, or some other thread interrupts the current thread, or a certain amount of real time has elapsed.
final void	wait(long timeout) Causes the current thread to wait until either another thread invokes the `notify()` method or the `notifyAll()` method for this object, or a specified amount of time has elapsed.

Constants

public static final int MAX_VALUE

The highest Unicode code point value (scalar value) according to the Unicode Standard. This is a 21-bit value (21 bits, rounded up).
Up-to-date Unicode implementation of java.lang.Character.MIN_VALUE

Constant Value: 1114111 (0x0010ffff)

public static final int MIN_VALUE

The lowest Unicode code point value.

Constant Value: 0 (0x00000000)

public static final double NO_NUMERIC_VALUE

Special value that is returned by getUnicodeNumericValue(int) when no numeric value is defined for a code point.

public static final int SUPPLEMENTARY_MIN_VALUE

The minimum value for Supplementary code points

Constant Value: 65536 (0x00010000)

Public Methods

public static int digit (int ch, int radix)

Retrieves the numeric value of a decimal digit code point.
This method observes the semantics of java.lang.Character.digit(). Note that this will return positive values for code points for which isDigit returns false, just like java.lang.Character.
Semantic Change: In release 1.3.1 and prior, this did not treat the European letters as having a digit value, and also treated numeric letters and other numbers as digits. This has been changed to conform to the java semantics.
A code point is a valid digit if and only if:

ch is a decimal digit or one of the european letters, and
the value of ch is less than the specified radix.

Parameters

ch	the code point to query
radix	the radix

Returns

the numeric value represented by the code point in the specified radix, or -1 if the code point is not a decimal digit or if its value is too large for the radix

public static String foldCase (String str, boolean defaultmapping)

The given string is mapped to its case folding equivalent according to UnicodeData.txt and CaseFolding.txt; if any character has no case folding equivalent, the character itself is returned. "Full", multiple-code point case folding mappings are returned here. For "simple" single-code point mappings use the API foldCase(int ch, boolean defaultmapping).

Parameters

str	the String to be converted
defaultmapping	Indicates if all mappings defined in CaseFolding.txt is to be used, otherwise the mappings for dotted I and dotless i marked with 'I' in CaseFolding.txt will be skipped.

Returns

the case folding equivalent of the character, if any; otherwise the character itself.

public static VersionInfo getAge (int ch)

Get the "age" of the code point.

The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character.

This can be useful to avoid emitting code points to receiving processes that do not accept newer characters.

The data is from the UCD file DerivedAge.txt.

Parameters

ch	The code point.

Returns

the Unicode version number

public static int getCodePoint (char lead, char trail)

Returns a code point corresponding to the two UTF16 characters.

Parameters

lead	the lead char
trail	the trail char

Returns

code point if surrogate characters are valid.

Throws

IllegalArgumentException	thrown when argument characters do not form a valid codepoint

public static int getDirection (int ch)

Returns the Bidirection property of a code point. For example, 0x0041 (letter A) has the LEFT_TO_RIGHT directional property.
Result returned belongs to the interface UCharacterDirection

Parameters

ch	the code point to be determined its direction

Returns

direction constant from UCharacterDirection.

public static int getIntPropertyValue (int ch, int type)

Gets the property value for an Unicode property type of a code point. Also returns binary and mask property values.

Unicode, especially in version 3.2, defines many more properties than the original set in UnicodeData.txt.

The properties APIs are intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR). For details about the properties see http://www.unicode.org/.

For names of Unicode properties see the UCD file PropertyAliases.txt.

 Sample usage:
 int ea = UCharacter.getIntPropertyValue(c, UProperty.EAST_ASIAN_WIDTH);
 int ideo = UCharacter.getIntPropertyValue(c, UProperty.IDEOGRAPHIC);
 boolean b = (ideo == 1) ? true : false;

Parameters

ch	code point to test.
type	UProperty selector constant, identifies which binary property to check. Must be UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or UProperty.INT_START <= type < UProperty.INT_LIMIT or UProperty.MASK_START <= type < UProperty.MASK_LIMIT.

Returns

numeric value that is directly the property value or, for enumerated properties, corresponds to the numeric value of the enumerated constant of the respective property value enumeration type (cast to enum type if necessary). Returns 0 or 1 (for false / true) for binary Unicode properties. Returns a bit-mask for mask properties. Returns 0 if 'type' is out of bounds or if the Unicode version does not have data for the property at all, or not for this code point.

public static int getType (int ch)

Returns a value indicating a code point's Unicode category. Up-to-date Unicode implementation of java.lang.Character.getType() except for the above mentioned code points that had their category changed.
Return results are constants from the interface UCharacterCategory
NOTE: the UCharacterCategory values are not compatible with those returned by java.lang.Character.getType. UCharacterCategory values match the ones used in ICU4C, while java.lang.Character type values, though similar, skip the value 17.

Parameters

ch	code point whose type is to be determined

Returns

category which is a value of UCharacterCategory

public static double getUnicodeNumericValue (int ch)

Get the numeric value for a Unicode code point as defined in the Unicode Character Database.

A "double" return type is necessary because some numeric values are fractions, negative, or too large for int.

For characters without any numeric values in the Unicode Character Database, this function will return NO_NUMERIC_VALUE.

API Change: In release 2.2 and prior, this API has a return type int and returns -1 when the argument ch does not have a corresponding numeric value. This has been changed to synch with ICU4C

This corresponds to the ICU4C function u_getNumericValue.

Parameters

ch	Code point to get the numeric value for.

Returns

numeric value of ch, or NO_NUMERIC_VALUE if none is defined.

Interfaces

Classes

UCharacter

Class Overview

See Also

Summary

Constants

public static final int MAX_VALUE

public static final int MIN_VALUE

public static final double NO_NUMERIC_VALUE

See Also

public static final int SUPPLEMENTARY_MIN_VALUE

Public Methods

public static int digit (int ch, int radix)

Parameters

Returns

public static String foldCase (String str, boolean defaultmapping)

Parameters

Returns

See Also

public static VersionInfo getAge (int ch)

Parameters

Returns

public static int getCodePoint (char lead, char trail)

Parameters

Returns

Throws

public static int getDirection (int ch)

Parameters

Returns

public static int getIntPropertyValue (int ch, int type)

Parameters

Returns

See Also

public static int getType (int ch)

Parameters

Returns

public static double getUnicodeNumericValue (int ch)

Parameters

Returns