public class

LetterTokenizer

extends CharTokenizer
java.lang.Object
   ↳ org.apache.lucene.util.AttributeSource
     ↳ org.apache.lucene.analysis.TokenStream
       ↳ org.apache.lucene.analysis.Tokenizer
         ↳ org.apache.lucene.analysis.CharTokenizer
           ↳ org.apache.lucene.analysis.LetterTokenizer
Known Direct Subclasses

Class Overview

A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

Summary

[Expand]
Inherited Fields
From class org.apache.lucene.analysis.Tokenizer
Public Constructors
LetterTokenizer(Reader in)
Construct a new LetterTokenizer.
LetterTokenizer(AttributeSource source, Reader in)
Construct a new LetterTokenizer using a given AttributeSource.
LetterTokenizer(AttributeSource.AttributeFactory factory, Reader in)
Construct a new LetterTokenizer using a given AttributeSource.AttributeFactory.
Protected Methods
boolean isTokenChar(char c)
Collects only characters which satisfy isLetter(char).
[Expand]
Inherited Methods
From class org.apache.lucene.analysis.CharTokenizer
From class org.apache.lucene.analysis.Tokenizer
From class org.apache.lucene.analysis.TokenStream
From class org.apache.lucene.util.AttributeSource
From class java.lang.Object
From interface java.io.Closeable

Public Constructors

public LetterTokenizer (Reader in)

Construct a new LetterTokenizer.

public LetterTokenizer (AttributeSource source, Reader in)

Construct a new LetterTokenizer using a given AttributeSource.

public LetterTokenizer (AttributeSource.AttributeFactory factory, Reader in)

Construct a new LetterTokenizer using a given AttributeSource.AttributeFactory.

Protected Methods

protected boolean isTokenChar (char c)

Collects only characters which satisfy isLetter(char).