java.lang.Object | |
↳ | org.apache.lucene.util.NumericUtils |
This is a helper class to generate prefix-encoded representations for numerical values and supplies converters to represent float/double values as sortable integers/longs.
To quickly execute range queries in Apache Lucene, a range is divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. This reduces the number of terms dramatically.
This class generates terms to achieve this: First the numerical integer values need to
be converted to strings. For that integer values (32 bit or 64 bit) are made unsigned
and the bits are converted to ASCII chars with each 7 bit. The resulting string is
sortable like the original integer value. Each value is also prefixed
(in the first char) by the shift
value (number of bits removed) used
during encoding.
To also index floating point numbers, this class supplies two methods to convert them
to integer values by changing their bit layout: doubleToSortableLong(double)
,
floatToSortableInt(float)
. You will have no precision loss by
converting floating point numbers to integers and back (only that the integer form
is not usable). Other data types like dates can easily converted to longs or ints (e.g.
date to long: getTime()
).
For easy usage, the trie algorithm is implemented for indexing inside
NumericTokenStream
that can index int
, long
,
float
, and double
. For querying,
NumericRangeQuery
and NumericRangeFilter
implement the query part
for the same data types.
This class can also be used, to generate lexicographically sortable (according
compareTo(String)
) representations of numeric data types for other
usages (e.g. sorting).
NOTE: This API is experimental and might change in incompatible ways in the next release.
Nested Classes | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
NumericUtils.IntRangeBuilder | Expert: Callback for splitIntRange(NumericUtils.IntRangeBuilder, int, int, int) . |
||||||||||
NumericUtils.LongRangeBuilder | Expert: Callback for splitLongRange(NumericUtils.LongRangeBuilder, int, long, long) . |
Constants | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
int | BUF_SIZE_INT | Expert: The maximum term length (used for char[] buffer size)
for encoding int values. |
|||||||||
int | BUF_SIZE_LONG | Expert: The maximum term length (used for char[] buffer size)
for encoding long values. |
|||||||||
int | PRECISION_STEP_DEFAULT | The default precision step used by NumericField , NumericTokenStream ,
NumericRangeQuery , and NumericRangeFilter as default
|
|||||||||
char | SHIFT_START_INT | Expert: Integers are stored at lower precision by shifting off lower bits. | |||||||||
char | SHIFT_START_LONG | Expert: Longs are stored at lower precision by shifting off lower bits. |
Public Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Convenience method: this just returns:
longToPrefixCoded(doubleToSortableLong(val))
| |||||||||||
Converts a
double value to a sortable signed long . | |||||||||||
Convenience method: this just returns:
intToPrefixCoded(floatToSortableInt(val))
| |||||||||||
Converts a
float value to a sortable signed int . | |||||||||||
Expert: Returns prefix coded bits after reducing the precision by
shift bits. | |||||||||||
Expert: Returns prefix coded bits after reducing the precision by
shift bits. | |||||||||||
This is a convenience method, that returns prefix coded bits of an int without
reducing the precision.
| |||||||||||
Expert: Returns prefix coded bits after reducing the precision by
shift bits. | |||||||||||
This is a convenience method, that returns prefix coded bits of a long without
reducing the precision.
| |||||||||||
Expert: Returns prefix coded bits after reducing the precision by
shift bits. | |||||||||||
Convenience method: this just returns:
sortableLongToDouble(prefixCodedToLong(val))
| |||||||||||
Convenience method: this just returns:
sortableIntToFloat(prefixCodedToInt(val))
| |||||||||||
Returns an int from prefixCoded characters.
| |||||||||||
Returns a long from prefixCoded characters.
| |||||||||||
Converts a sortable
int back to a float . | |||||||||||
Converts a sortable
long back to a double . | |||||||||||
Expert: Splits an int range recursively.
| |||||||||||
Expert: Splits a long range recursively.
|
[Expand]
Inherited Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
From class
java.lang.Object
|
Expert: The maximum term length (used for char[]
buffer size)
for encoding int
values.
Expert: The maximum term length (used for char[]
buffer size)
for encoding long
values.
The default precision step used by NumericField
, NumericTokenStream
,
NumericRangeQuery
, and NumericRangeFilter
as default
Expert: Integers are stored at lower precision by shifting off lower bits. The shift count is
stored as SHIFT_START_INT+shift
in the first character
Expert: Longs are stored at lower precision by shifting off lower bits. The shift count is
stored as SHIFT_START_LONG+shift
in the first character
Convenience method: this just returns: longToPrefixCoded(doubleToSortableLong(val))
Converts a double
value to a sortable signed long
.
The value is converted by getting their IEEE 754 floating-point "double format"
bit layout and then some bits are swapped, to be able to compare the result as long.
By this the precision is not reduced, but the value can easily used as a long.
Convenience method: this just returns: intToPrefixCoded(floatToSortableInt(val))
Converts a float
value to a sortable signed int
.
The value is converted by getting their IEEE 754 floating-point "float format"
bit layout and then some bits are swapped, to be able to compare the result as int.
By this the precision is not reduced, but the value can easily used as an int.
Expert: Returns prefix coded bits after reducing the precision by shift
bits.
This is method is used by NumericUtils.IntRangeBuilder
.
val | the numeric value |
---|---|
shift | how many bits to strip from the right |
Expert: Returns prefix coded bits after reducing the precision by shift
bits.
This is method is used by NumericTokenStream
.
val | the numeric value |
---|---|
shift | how many bits to strip from the right |
buffer | that will contain the encoded chars, must be at least of BUF_SIZE_INT
length |
This is a convenience method, that returns prefix coded bits of an int without reducing the precision. It can be used to store the full precision value as a stored field in index.
To decode, use prefixCodedToInt(String)
.
Expert: Returns prefix coded bits after reducing the precision by shift
bits.
This is method is used by NumericUtils.LongRangeBuilder
.
val | the numeric value |
---|---|
shift | how many bits to strip from the right |
This is a convenience method, that returns prefix coded bits of a long without reducing the precision. It can be used to store the full precision value as a stored field in index.
To decode, use prefixCodedToLong(String)
.
Expert: Returns prefix coded bits after reducing the precision by shift
bits.
This is method is used by NumericTokenStream
.
val | the numeric value |
---|---|
shift | how many bits to strip from the right |
buffer | that will contain the encoded chars, must be at least of BUF_SIZE_LONG
length |
Convenience method: this just returns: sortableLongToDouble(prefixCodedToLong(val))
Convenience method: this just returns: sortableIntToFloat(prefixCodedToInt(val))
Returns an int from prefixCoded characters. Rightmost bits will be zero for lower precision codes. This method can be used to decode e.g. a stored field.
NumberFormatException | if the supplied string is not correctly prefix encoded. |
---|
Returns a long from prefixCoded characters. Rightmost bits will be zero for lower precision codes. This method can be used to decode e.g. a stored field.
NumberFormatException | if the supplied string is not correctly prefix encoded. |
---|
Converts a sortable int
back to a float
.
Converts a sortable long
back to a double
.
Expert: Splits an int range recursively.
You may implement a builder that adds clauses to a
BooleanQuery
for each call to its
addRange(String, String)
method.
This method is used by NumericRangeQuery
.
Expert: Splits a long range recursively.
You may implement a builder that adds clauses to a
BooleanQuery
for each call to its
addRange(String, String)
method.
This method is used by NumericRangeQuery
.