java.lang.Object | |||||
↳ | org.apache.lucene.util.AttributeSource | ||||
↳ | org.apache.lucene.analysis.TokenStream | ||||
↳ | org.apache.lucene.analysis.Tokenizer | ||||
↳ | org.apache.lucene.analysis.CharTokenizer | ||||
↳ | org.apache.lucene.analysis.LetterTokenizer |
![]() |
A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
[Expand]
Inherited Fields | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
![]() |
Public Constructors | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Construct a new LetterTokenizer.
| |||||||||||
Construct a new LetterTokenizer using a given
AttributeSource . | |||||||||||
Construct a new LetterTokenizer using a given
AttributeSource.AttributeFactory . |
Protected Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Collects only characters which satisfy
isLetter(char) . |
[Expand]
Inherited Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
![]() | |||||||||||
![]() | |||||||||||
![]() | |||||||||||
![]() | |||||||||||
![]() | |||||||||||
![]() |
Construct a new LetterTokenizer using a given AttributeSource
.
Construct a new LetterTokenizer using a given AttributeSource.AttributeFactory
.