public abstract class

Tokenizer

extends TokenStream
java.lang.Object
   ↳ org.apache.lucene.util.AttributeSource
     ↳ org.apache.lucene.analysis.TokenStream
       ↳ org.apache.lucene.analysis.Tokenizer
Known Direct Subclasses
Known Indirect Subclasses

Class Overview

A Tokenizer is a TokenStream whose input is a Reader.

This is an abstract class; subclasses must override incrementToken()

NOTE: Subclasses overriding incrementToken() must call clearAttributes() before setting attributes.

Summary

Fields
protected Reader input The text source for this Tokenizer.
Protected Constructors
Tokenizer()
Construct a tokenizer with null input.
Tokenizer(Reader input)
Construct a token stream processing the given input.
Tokenizer(AttributeSource.AttributeFactory factory)
Construct a tokenizer with null input using the given AttributeFactory.
Tokenizer(AttributeSource.AttributeFactory factory, Reader input)
Construct a token stream processing the given input using the given AttributeFactory.
Tokenizer(AttributeSource source)
Construct a token stream processing the given input using the given AttributeSource.
Tokenizer(AttributeSource source, Reader input)
Construct a token stream processing the given input using the given AttributeSource.
Public Methods
void close()
By default, closes the input Reader.
void reset(Reader input)
Expert: Reset the tokenizer to a new reader.
Protected Methods
final int correctOffset(int currentOff)
Return the corrected offset.
[Expand]
Inherited Methods
From class org.apache.lucene.analysis.TokenStream
From class org.apache.lucene.util.AttributeSource
From class java.lang.Object
From interface java.io.Closeable

Fields

protected Reader input

The text source for this Tokenizer.

Protected Constructors

protected Tokenizer ()

Construct a tokenizer with null input.

protected Tokenizer (Reader input)

Construct a token stream processing the given input.

protected Tokenizer (AttributeSource.AttributeFactory factory)

Construct a tokenizer with null input using the given AttributeFactory.

protected Tokenizer (AttributeSource.AttributeFactory factory, Reader input)

Construct a token stream processing the given input using the given AttributeFactory.

protected Tokenizer (AttributeSource source)

Construct a token stream processing the given input using the given AttributeSource.

protected Tokenizer (AttributeSource source, Reader input)

Construct a token stream processing the given input using the given AttributeSource.

Public Methods

public void close ()

By default, closes the input Reader.

Throws
IOException

public void reset (Reader input)

Expert: Reset the tokenizer to a new reader. Typically, an analyzer (in its reusableTokenStream method) will use this to re-use a previously created tokenizer.

Throws
IOException

Protected Methods

protected final int correctOffset (int currentOff)

Return the corrected offset. If input is a CharStream subclass this method calls correctOffset(int), else returns currentOff.

Parameters
currentOff offset as seen in the output
Returns
  • corrected offset based on the input