java.lang.Object | |||||
↳ | org.apache.lucene.util.AttributeSource | ||||
↳ | org.apache.lucene.analysis.TokenStream | ||||
↳ | org.apache.lucene.analysis.Tokenizer | ||||
↳ | org.apache.lucene.analysis.CharTokenizer | ||||
↳ | org.apache.lucene.analysis.LetterTokenizer |
Known Direct Subclasses |
A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
[Expand]
Inherited Fields | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
From class
org.apache.lucene.analysis.Tokenizer
|
Public Constructors | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Construct a new LetterTokenizer.
| |||||||||||
Construct a new LetterTokenizer using a given
AttributeSource . | |||||||||||
Construct a new LetterTokenizer using a given
AttributeSource.AttributeFactory . |
Protected Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Collects only characters which satisfy
isLetter(char) . |
[Expand]
Inherited Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
From class
org.apache.lucene.analysis.CharTokenizer
| |||||||||||
From class
org.apache.lucene.analysis.Tokenizer
| |||||||||||
From class
org.apache.lucene.analysis.TokenStream
| |||||||||||
From class
org.apache.lucene.util.AttributeSource
| |||||||||||
From class
java.lang.Object
| |||||||||||
From interface
java.io.Closeable
|
Construct a new LetterTokenizer using a given AttributeSource
.
Construct a new LetterTokenizer using a given AttributeSource.AttributeFactory
.