java.lang.Object | |
↳ | com.fasterxml.jackson.core.sym.BytesToNameCanonicalizer |
A caching symbol table implementation used for canonicalizing JSON field
names (as Name
s which are constructed directly from a byte-based
input source).
Complications arise from trying to do efficient reuse and merging of
symbol tables, to be able to make use of usually shared vocabulary
of subsequent parsing runs.
Constants | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
int | DEFAULT_TABLE_SIZE | ||||||||||
int | MAX_TABLE_SIZE | Let's not expand symbol tables past some maximum size; this should protected against OOMEs caused by large documents with unique (~= random) names. |
Fields | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
_collCount | Total number of Names in collision buckets (included in
_count along with primary entries)
|
||||||||||
_collEnd | Index of the first unused collision bucket entry (== size of the used portion of collision list): less than or equal to 0xFF (255), since max number of entries is 255 (8-bit, minus 0 used as 'empty' marker) | ||||||||||
_collList | Array of heads of collision bucket chains; size dynamically | ||||||||||
_count | Total number of Names in the symbol table; only used for child tables. | ||||||||||
_intern | Whether canonical symbol Strings are to be intern()ed before added to the table or not | ||||||||||
_longestCollisionList | We need to keep track of the longest collision list; this is needed both to indicate problems with attacks and to allow flushing for other cases. | ||||||||||
_mainHash | Array of 2^N size, which contains combination of 24-bits of hash (0 to indicate 'empty' slot), and 8-bit collision bucket index (0 to indicate empty collision bucket chain; otherwise subtract one from index) | ||||||||||
_mainHashMask | Mask used to truncate 32-bit hash value to current hash array size; essentially, hash array size - 1 (since hash array sizes are 2^N). | ||||||||||
_mainNames | Array that contains Name instances matching
entries in _mainHash . |
||||||||||
_parent | Reference to the root symbol table, for child tables, so that they can merge table information back as necessary. | ||||||||||
_tableInfo | Member that is only used by the root table instance: root passes immutable state into child instances, and children may return new state if they add entries to the table. |
Public Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Method mostly needed by unit tests; calculates number of
entries that are in collision list.
| |||||||||||
Factory method to call to create a symbol table instance with a
randomized seed value.
| |||||||||||
Finds and returns name matching the specified symbol, if such
name already exists in the table.
| |||||||||||
Finds and returns name matching the specified symbol, if such
name already exists in the table.
| |||||||||||
Finds and returns name matching the specified symbol, if such
name already exists in the table; or if not, creates name object,
adds to the table, and returns it.
| |||||||||||
Factory method used to create actual symbol table instance to
use for parsing.
| |||||||||||
Method mostly needed by unit tests; calculates length of the
longest collision chain.
| |||||||||||
Method called to check to quickly see if a child symbol table
may have gotten additional entries.
| |||||||||||
Method called by the using code to indicate it is done
with this instance.
| |||||||||||
Protected Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Factory method that should only be called from unit tests, where seed
value should remain the same.
| |||||||||||
[Expand]
Inherited Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
From class
java.lang.Object
|
Let's not expand symbol tables past some maximum size; this should protected against OOMEs caused by large documents with unique (~= random) names.
Total number of Names in collision buckets (included in
_count
along with primary entries)
Index of the first unused collision bucket entry (== size of the used portion of collision list): less than or equal to 0xFF (255), since max number of entries is 255 (8-bit, minus 0 used as 'empty' marker)
Array of heads of collision bucket chains; size dynamically
Total number of Names in the symbol table; only used for child tables.
Whether canonical symbol Strings are to be intern()ed before added to the table or not
We need to keep track of the longest collision list; this is needed both to indicate problems with attacks and to allow flushing for other cases.
Array of 2^N size, which contains combination of 24-bits of hash (0 to indicate 'empty' slot), and 8-bit collision bucket index (0 to indicate empty collision bucket chain; otherwise subtract one from index)
Mask used to truncate 32-bit hash value to current hash array size; essentially, hash array size - 1 (since hash array sizes are 2^N).
Array that contains Name
instances matching
entries in _mainHash
. Contains nulls for unused
entries.
Reference to the root symbol table, for child tables, so that they can merge table information back as necessary.
Member that is only used by the root table instance: root passes immutable state into child instances, and children may return new state if they add entries to the table. Child tables do NOT use the reference.
Method mostly needed by unit tests; calculates number of
entries that are in collision list. Value can be at most
(size()
- 1), but should usually be much lower, ideally 0.
Factory method to call to create a symbol table instance with a randomized seed value.
Finds and returns name matching the specified symbol, if such name already exists in the table. If not, will return null.
Note: separate methods to optimize common case of short element/attribute names (4 or less ascii characters)
firstQuad | int32 containing first 4 bytes of the name; if the whole name less than 4 bytes, padded with zero bytes in front (zero MSBs, ie. right aligned) |
---|
Finds and returns name matching the specified symbol, if such name already exists in the table. If not, will return null.
Note: separate methods to optimize common case of relatively short element/attribute names (8 or less ascii characters)
firstQuad | int32 containing first 4 bytes of the name. |
---|---|
secondQuad | int32 containing bytes 5 through 8 of the name; if less than 8 bytes, padded with up to 3 zero bytes in front (zero MSBs, ie. right aligned) |
Finds and returns name matching the specified symbol, if such name already exists in the table; or if not, creates name object, adds to the table, and returns it.
Note: this is the general purpose method that can be called for names of any length. However, if name is less than 9 bytes long, it is preferable to call the version optimized for short names.
quads | Array of int32s, each of which contain 4 bytes of encoded name |
---|---|
qlen | Number of int32s, starting from index 0, in quads parameter |
Factory method used to create actual symbol table instance to use for parsing.
intern | Whether canonical symbol Strings should be interned or not |
---|
Method mostly needed by unit tests; calculates length of the
longest collision chain. This should typically be a low number,
but may be up to size()
- 1 in the pathological case
Method called to check to quickly see if a child symbol table may have gotten additional entries. Used for checking to see if a child table should be merged into shared table.
Method called by the using code to indicate it is done with this instance. This lets instance merge accumulated changes into parent (if need be), safely and efficiently, and without calling code having to know about parent information
Factory method that should only be called from unit tests, where seed value should remain the same.