public final class

LowerCaseTokenizer

extends LetterTokenizer
java.lang.Object
   ↳ org.apache.lucene.util.AttributeSource
     ↳ org.apache.lucene.analysis.TokenStream
       ↳ org.apache.lucene.analysis.Tokenizer
         ↳ org.apache.lucene.analysis.CharTokenizer
           ↳ org.apache.lucene.analysis.LetterTokenizer
             ↳ org.apache.lucene.analysis.LowerCaseTokenizer

Class Overview

LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.

Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

Summary

[Expand]
Inherited Fields
From class org.apache.lucene.analysis.Tokenizer
Public Constructors
LowerCaseTokenizer(Reader in)
Construct a new LowerCaseTokenizer.
LowerCaseTokenizer(AttributeSource source, Reader in)
Construct a new LowerCaseTokenizer using a given AttributeSource.
LowerCaseTokenizer(AttributeSource.AttributeFactory factory, Reader in)
Construct a new LowerCaseTokenizer using a given AttributeSource.AttributeFactory.
Protected Methods
char normalize(char c)
Converts char to lower case toLowerCase(char).
[Expand]
Inherited Methods
From class org.apache.lucene.analysis.LetterTokenizer
From class org.apache.lucene.analysis.CharTokenizer
From class org.apache.lucene.analysis.Tokenizer
From class org.apache.lucene.analysis.TokenStream
From class org.apache.lucene.util.AttributeSource
From class java.lang.Object
From interface java.io.Closeable

Public Constructors

public LowerCaseTokenizer (Reader in)

Construct a new LowerCaseTokenizer.

public LowerCaseTokenizer (AttributeSource source, Reader in)

Construct a new LowerCaseTokenizer using a given AttributeSource.

public LowerCaseTokenizer (AttributeSource.AttributeFactory factory, Reader in)

Construct a new LowerCaseTokenizer using a given AttributeSource.AttributeFactory.

Protected Methods

protected char normalize (char c)

Converts char to lower case toLowerCase(char).