public final class

BytesToNameCanonicalizer

extends Object
java.lang.Object
   ↳ com.fasterxml.jackson.core.sym.BytesToNameCanonicalizer

Class Overview

A caching symbol table implementation used for canonicalizing JSON field names (as Names which are constructed directly from a byte-based input source). Complications arise from trying to do efficient reuse and merging of symbol tables, to be able to make use of usually shared vocabulary of subsequent parsing runs.

Summary

Constants
int DEFAULT_TABLE_SIZE
int MAX_TABLE_SIZE Let's not expand symbol tables past some maximum size; this should protected against OOMEs caused by large documents with unique (~= random) names.
Fields
protected int _collCount Total number of Names in collision buckets (included in _count along with primary entries)
protected int _collEnd Index of the first unused collision bucket entry (== size of the used portion of collision list): less than or equal to 0xFF (255), since max number of entries is 255 (8-bit, minus 0 used as 'empty' marker)
protected Bucket[] _collList Array of heads of collision bucket chains; size dynamically
protected int _count Total number of Names in the symbol table; only used for child tables.
protected final boolean _intern Whether canonical symbol Strings are to be intern()ed before added to the table or not
protected int _longestCollisionList We need to keep track of the longest collision list; this is needed both to indicate problems with attacks and to allow flushing for other cases.
protected int[] _mainHash Array of 2^N size, which contains combination of 24-bits of hash (0 to indicate 'empty' slot), and 8-bit collision bucket index (0 to indicate empty collision bucket chain; otherwise subtract one from index)
protected int _mainHashMask Mask used to truncate 32-bit hash value to current hash array size; essentially, hash array size - 1 (since hash array sizes are 2^N).
protected Name[] _mainNames Array that contains Name instances matching entries in _mainHash.
protected final BytesToNameCanonicalizer _parent Reference to the root symbol table, for child tables, so that they can merge table information back as necessary.
protected final AtomicReference<BytesToNameCanonicalizer.TableInfo> _tableInfo Member that is only used by the root table instance: root passes immutable state into child instances, and children may return new state if they add entries to the table.
Public Methods
Name addName(String symbolStr, int[] quads, int qlen)
Name addName(String symbolStr, int q1, int q2)
int bucketCount()
final int calcHash(int firstQuad, int secondQuad)
final int calcHash(int[] quads, int qlen)
final int calcHash(int firstQuad)
int collisionCount()
Method mostly needed by unit tests; calculates number of entries that are in collision list.
static BytesToNameCanonicalizer createRoot()
Factory method to call to create a symbol table instance with a randomized seed value.
Name findName(int firstQuad)
Finds and returns name matching the specified symbol, if such name already exists in the table.
Name findName(int firstQuad, int secondQuad)
Finds and returns name matching the specified symbol, if such name already exists in the table.
Name findName(int[] quads, int qlen)
Finds and returns name matching the specified symbol, if such name already exists in the table; or if not, creates name object, adds to the table, and returns it.
static Name getEmptyName()
int hashSeed()
BytesToNameCanonicalizer makeChild(boolean canonicalize, boolean intern)
Factory method used to create actual symbol table instance to use for parsing.
int maxCollisionLength()
Method mostly needed by unit tests; calculates length of the longest collision chain.
boolean maybeDirty()
Method called to check to quickly see if a child symbol table may have gotten additional entries.
void release()
Method called by the using code to indicate it is done with this instance.
int size()
Protected Methods
static int[] calcQuads(byte[] wordBytes)
static BytesToNameCanonicalizer createRoot(int hashSeed)
Factory method that should only be called from unit tests, where seed value should remain the same.
void reportTooManyCollisions(int maxLen)
[Expand]
Inherited Methods
From class java.lang.Object

Constants

protected static final int DEFAULT_TABLE_SIZE

Constant Value: 64 (0x00000040)

protected static final int MAX_TABLE_SIZE

Let's not expand symbol tables past some maximum size; this should protected against OOMEs caused by large documents with unique (~= random) names.

Constant Value: 65536 (0x00010000)

Fields

protected int _collCount

Total number of Names in collision buckets (included in _count along with primary entries)

protected int _collEnd

Index of the first unused collision bucket entry (== size of the used portion of collision list): less than or equal to 0xFF (255), since max number of entries is 255 (8-bit, minus 0 used as 'empty' marker)

protected Bucket[] _collList

Array of heads of collision bucket chains; size dynamically

protected int _count

Total number of Names in the symbol table; only used for child tables.

protected final boolean _intern

Whether canonical symbol Strings are to be intern()ed before added to the table or not

protected int _longestCollisionList

We need to keep track of the longest collision list; this is needed both to indicate problems with attacks and to allow flushing for other cases.

protected int[] _mainHash

Array of 2^N size, which contains combination of 24-bits of hash (0 to indicate 'empty' slot), and 8-bit collision bucket index (0 to indicate empty collision bucket chain; otherwise subtract one from index)

protected int _mainHashMask

Mask used to truncate 32-bit hash value to current hash array size; essentially, hash array size - 1 (since hash array sizes are 2^N).

protected Name[] _mainNames

Array that contains Name instances matching entries in _mainHash. Contains nulls for unused entries.

protected final BytesToNameCanonicalizer _parent

Reference to the root symbol table, for child tables, so that they can merge table information back as necessary.

protected final AtomicReference<BytesToNameCanonicalizer.TableInfo> _tableInfo

Member that is only used by the root table instance: root passes immutable state into child instances, and children may return new state if they add entries to the table. Child tables do NOT use the reference.

Public Methods

public Name addName (String symbolStr, int[] quads, int qlen)

public Name addName (String symbolStr, int q1, int q2)

public int bucketCount ()

public final int calcHash (int firstQuad, int secondQuad)

public final int calcHash (int[] quads, int qlen)

public final int calcHash (int firstQuad)

public int collisionCount ()

Method mostly needed by unit tests; calculates number of entries that are in collision list. Value can be at most (size() - 1), but should usually be much lower, ideally 0.

public static BytesToNameCanonicalizer createRoot ()

Factory method to call to create a symbol table instance with a randomized seed value.

public Name findName (int firstQuad)

Finds and returns name matching the specified symbol, if such name already exists in the table. If not, will return null.

Note: separate methods to optimize common case of short element/attribute names (4 or less ascii characters)

Parameters
firstQuad int32 containing first 4 bytes of the name; if the whole name less than 4 bytes, padded with zero bytes in front (zero MSBs, ie. right aligned)
Returns
  • Name matching the symbol passed (or constructed for it)

public Name findName (int firstQuad, int secondQuad)

Finds and returns name matching the specified symbol, if such name already exists in the table. If not, will return null.

Note: separate methods to optimize common case of relatively short element/attribute names (8 or less ascii characters)

Parameters
firstQuad int32 containing first 4 bytes of the name.
secondQuad int32 containing bytes 5 through 8 of the name; if less than 8 bytes, padded with up to 3 zero bytes in front (zero MSBs, ie. right aligned)
Returns
  • Name matching the symbol passed (or constructed for it)

public Name findName (int[] quads, int qlen)

Finds and returns name matching the specified symbol, if such name already exists in the table; or if not, creates name object, adds to the table, and returns it.

Note: this is the general purpose method that can be called for names of any length. However, if name is less than 9 bytes long, it is preferable to call the version optimized for short names.

Parameters
quads Array of int32s, each of which contain 4 bytes of encoded name
qlen Number of int32s, starting from index 0, in quads parameter
Returns
  • Name matching the symbol passed (or constructed for it)

public static Name getEmptyName ()

public int hashSeed ()

public BytesToNameCanonicalizer makeChild (boolean canonicalize, boolean intern)

Factory method used to create actual symbol table instance to use for parsing.

Parameters
intern Whether canonical symbol Strings should be interned or not

public int maxCollisionLength ()

Method mostly needed by unit tests; calculates length of the longest collision chain. This should typically be a low number, but may be up to size() - 1 in the pathological case

public boolean maybeDirty ()

Method called to check to quickly see if a child symbol table may have gotten additional entries. Used for checking to see if a child table should be merged into shared table.

public void release ()

Method called by the using code to indicate it is done with this instance. This lets instance merge accumulated changes into parent (if need be), safely and efficiently, and without calling code having to know about parent information

public int size ()

Protected Methods

protected static int[] calcQuads (byte[] wordBytes)

protected static BytesToNameCanonicalizer createRoot (int hashSeed)

Factory method that should only be called from unit tests, where seed value should remain the same.

protected void reportTooManyCollisions (int maxLen)