public final class

UnicodeUtil

extends Object
java.lang.Object
   ↳ org.apache.lucene.util.UnicodeUtil

Class Overview

Class to encode java's UTF16 char[] into UTF8 byte[] without always allocating a new byte[] as String.getBytes("UTF-8") does.

WARNING: This API is a new and experimental and may suddenly change.

Summary

Nested Classes
class UnicodeUtil.UTF16Result  
class UnicodeUtil.UTF8Result  
Constants
int UNI_REPLACEMENT_CHAR
int UNI_SUR_HIGH_END
int UNI_SUR_HIGH_START
int UNI_SUR_LOW_END
int UNI_SUR_LOW_START
Public Constructors
UnicodeUtil()
Public Methods
static void UTF16toUTF8(String s, int offset, int length, UnicodeUtil.UTF8Result result)
Encode characters from this String, starting at offset for length characters.
static void UTF16toUTF8(char[] source, int offset, int length, UnicodeUtil.UTF8Result result)
Encode characters from a char[] source, starting at offset for length chars.
static void UTF16toUTF8(char[] source, int offset, UnicodeUtil.UTF8Result result)
Encode characters from a char[] source, starting at offset and stopping when the character 0xffff is seen.
static void UTF8toUTF16(byte[] utf8, int offset, int length, UnicodeUtil.UTF16Result result)
Convert UTF8 bytes into UTF16 characters.
[Expand]
Inherited Methods
From class java.lang.Object

Constants

public static final int UNI_REPLACEMENT_CHAR

Constant Value: 65533 (0x0000fffd)

public static final int UNI_SUR_HIGH_END

Constant Value: 56319 (0x0000dbff)

public static final int UNI_SUR_HIGH_START

Constant Value: 55296 (0x0000d800)

public static final int UNI_SUR_LOW_END

Constant Value: 57343 (0x0000dfff)

public static final int UNI_SUR_LOW_START

Constant Value: 56320 (0x0000dc00)

Public Constructors

public UnicodeUtil ()

Public Methods

public static void UTF16toUTF8 (String s, int offset, int length, UnicodeUtil.UTF8Result result)

Encode characters from this String, starting at offset for length characters. Returns the number of bytes written to bytesOut.

public static void UTF16toUTF8 (char[] source, int offset, int length, UnicodeUtil.UTF8Result result)

Encode characters from a char[] source, starting at offset for length chars. Returns the number of bytes written to bytesOut.

public static void UTF16toUTF8 (char[] source, int offset, UnicodeUtil.UTF8Result result)

Encode characters from a char[] source, starting at offset and stopping when the character 0xffff is seen. Returns the number of bytes written to bytesOut.

public static void UTF8toUTF16 (byte[] utf8, int offset, int length, UnicodeUtil.UTF16Result result)

Convert UTF8 bytes into UTF16 characters. If offset is non-zero, conversion starts at that starting point in utf8, re-using the results from the previous call up until offset.