public abstract class

CharStream

extends Reader
java.lang.Object
   ↳ java.io.Reader
     ↳ org.apache.lucene.analysis.CharStream
Known Direct Subclasses
Known Indirect Subclasses

Class Overview

CharStream adds correctOffset(int) functionality over Reader. All Tokenizers accept a CharStream instead of Reader as input, which enables arbitrary character based filtering before tokenization. The correctOffset(int) method fixed offsets to account for removal or insertion of characters, so that the offsets reported in the tokens match the character offsets of the original Reader.

Summary

[Expand]
Inherited Fields
From class java.io.Reader
Public Constructors
CharStream()
Public Methods
abstract int correctOffset(int currentOff)
Called by CharFilter(s) and Tokenizer to correct token offset.
[Expand]
Inherited Methods
From class java.io.Reader
From class java.lang.Object
From interface java.io.Closeable
From interface java.lang.Readable

Public Constructors

public CharStream ()

Public Methods

public abstract int correctOffset (int currentOff)

Called by CharFilter(s) and Tokenizer to correct token offset.

Parameters
currentOff offset as seen in the output
Returns
  • corrected offset based on the input