public final class

StopFilter

extends TokenFilter
java.lang.Object
   ↳ org.apache.lucene.util.AttributeSource
     ↳ org.apache.lucene.analysis.TokenStream
       ↳ org.apache.lucene.analysis.TokenFilter
         ↳ org.apache.lucene.analysis.StopFilter

Class Overview

Removes stop words from a token stream.

Summary

[Expand]
Inherited Fields
From class org.apache.lucene.analysis.TokenFilter
Public Constructors
StopFilter(boolean enablePositionIncrements, TokenStream input, Set<?> stopWords, boolean ignoreCase)
Construct a token stream filtering the given input.
StopFilter(boolean enablePositionIncrements, TokenStream in, Set<?> stopWords)
Constructs a filter which removes words from the input TokenStream that are named in the Set.
Public Methods
boolean getEnablePositionIncrements()
static boolean getEnablePositionIncrementsVersionDefault(Version matchVersion)
Returns version-dependent default for enablePositionIncrements.
final boolean incrementToken()
Returns the next input Token whose term() is not a stop word.
final static Set<Object> makeStopSet(String... stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.
final static Set<Object> makeStopSet(List<?> stopWords, boolean ignoreCase)
final static Set<Object> makeStopSet(List<?> stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.
final static Set<Object> makeStopSet(String[] stopWords, boolean ignoreCase)
void setEnablePositionIncrements(boolean enable)
If true, this StopFilter will preserve positions of the incoming tokens (ie, accumulate and set position increments of the removed stop tokens).
[Expand]
Inherited Methods
From class org.apache.lucene.analysis.TokenFilter
From class org.apache.lucene.analysis.TokenStream
From class org.apache.lucene.util.AttributeSource
From class java.lang.Object
From interface java.io.Closeable

Public Constructors

public StopFilter (boolean enablePositionIncrements, TokenStream input, Set<?> stopWords, boolean ignoreCase)

Construct a token stream filtering the given input. If stopWords is an instance of CharArraySet (true if makeStopSet() was used to construct the set) it will be directly used and ignoreCase will be ignored since CharArraySet directly controls case sensitivity.

If stopWords is not an instance of CharArraySet, a new CharArraySet will be constructed and ignoreCase will be used to specify the case sensitivity of that set.

Parameters
enablePositionIncrements true if token positions should record the removed stop words
input Input TokenStream
stopWords A Set of Strings or char[] or any other toString()-able set representing the stopwords
ignoreCase if true, all words are lower cased first

public StopFilter (boolean enablePositionIncrements, TokenStream in, Set<?> stopWords)

Constructs a filter which removes words from the input TokenStream that are named in the Set.

Parameters
enablePositionIncrements true if token positions should record the removed stop words
in Input stream
stopWords A Set of Strings or char[] or any other toString()-able set representing the stopwords

Public Methods

public boolean getEnablePositionIncrements ()

public static boolean getEnablePositionIncrementsVersionDefault (Version matchVersion)

Returns version-dependent default for enablePositionIncrements. Analyzers that embed StopFilter use this method when creating the StopFilter. Prior to 2.9, this returns false. On 2.9 or later, it returns true.

public final boolean incrementToken ()

Returns the next input Token whose term() is not a stop word.

Returns
  • false for end of stream; true otherwise
Throws
IOException

public static final Set<Object> makeStopSet (String... stopWords)

Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

public static final Set<Object> makeStopSet (List<?> stopWords, boolean ignoreCase)

Parameters
stopWords A List of Strings or char[] or any other toString()-able list representing the stopwords
ignoreCase if true, all words are lower cased first
Returns

public static final Set<Object> makeStopSet (List<?> stopWords)

Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

Parameters
stopWords A List of Strings or char[] or any other toString()-able list representing the stopwords
Returns

public static final Set<Object> makeStopSet (String[] stopWords, boolean ignoreCase)

Parameters
stopWords An array of stopwords
ignoreCase If true, all words are lower cased first.
Returns
  • a Set containing the words

public void setEnablePositionIncrements (boolean enable)

If true, this StopFilter will preserve positions of the incoming tokens (ie, accumulate and set position increments of the removed stop tokens). Generally, true is best as it does not lose information (positions of the original tokens) during indexing.

When set, when a token is stopped (omitted), the position increment of the following token is incremented.

NOTE: be sure to also set setEnablePositionIncrements(boolean) if you use QueryParser to create queries.