public class

StandardAnalyzer

extends Analyzer

java.lang.Object
↳	org.apache.lucene.analysis.Analyzer
	↳	org.apache.lucene.analysis.standard.StandardAnalyzer

Class Overview

Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

You must specify the required Version compatibility when creating StandardAnalyzer:

As of 2.9, StopFilter preserves position increments
As of 2.4, Tokens incorrectly identified as acronyms are corrected (see LUCENE-1608

Summary

Constants
int	DEFAULT_MAX_TOKEN_LENGTH	Default maximum allowed token length

Fields
public static final Set<?>	STOP_WORDS_SET	An unmodifiable set containing some common English words that are usually not useful for searching.

[Expand]

Inherited Fields

From class org.apache.lucene.analysis.Analyzer

Public Constructors
	StandardAnalyzer(Version matchVersion) Builds an analyzer with the default stop words (`STOP_WORDS_SET`).
	StandardAnalyzer(Version matchVersion, Set<?> stopWords) Builds an analyzer with the given stop words.
	StandardAnalyzer(Version matchVersion, File stopwords) Builds an analyzer with the stop words from the given file.
	StandardAnalyzer(Version matchVersion, Reader stopwords) Builds an analyzer with the stop words from the given reader.

Public Methods
int	getMaxTokenLength()
TokenStream	reusableTokenStream(String fieldName, Reader reader) Creates a TokenStream that is allowed to be re-used from the previous time that the same thread called this method.
void	setMaxTokenLength(int length) Set maximum allowed token length.
TokenStream	tokenStream(String fieldName, Reader reader) Constructs a `StandardTokenizer` filtered by a `StandardFilter`, a `LowerCaseFilter` and a `StopFilter`.

[Expand]

Inherited Methods

From class org.apache.lucene.analysis.Analyzer

void	close() Frees persistent resources used by this Analyzer
int	getOffsetGap(Fieldable field) Just like `getPositionIncrementGap(String)`, except for Token offsets instead.
int	getPositionIncrementGap(String fieldName) Invoked before indexing a Fieldable instance if terms have already been added to that field.
Object	getPreviousTokenStream() Used by Analyzers that implement reusableTokenStream to retrieve previously saved TokenStreams for re-use by the same thread.
TokenStream	reusableTokenStream(String fieldName, Reader reader) Creates a TokenStream that is allowed to be re-used from the previous time that the same thread called this method.
void	setOverridesTokenStreamMethod(Class<? extends Analyzer> baseClass) This method is deprecated. This is only present to preserve back-compat of classes that subclass a core analyzer and override tokenStream but not reusableTokenStream
void	setPreviousTokenStream(Object obj) Used by Analyzers that implement reusableTokenStream to save a TokenStream for later re-use by the same thread.
abstract TokenStream	tokenStream(String fieldName, Reader reader) Creates a TokenStream which tokenizes all the text in the provided Reader.

From class java.lang.Object

From interface java.io.Closeable

Constants

public static final int DEFAULT_MAX_TOKEN_LENGTH

Default maximum allowed token length

Constant Value: 255 (0x000000ff)

Fields

public static final Set<?> STOP_WORDS_SET

An unmodifiable set containing some common English words that are usually not useful for searching.

Public Constructors

public StandardAnalyzer (Version matchVersion)

Builds an analyzer with the default stop words (STOP_WORDS_SET).

Parameters

matchVersion	Lucene version to match See above

public StandardAnalyzer (Version matchVersion, Set<?> stopWords)

Builds an analyzer with the given stop words.

Parameters

matchVersion	Lucene version to match See above
stopWords	stop words

public StandardAnalyzer (Version matchVersion, File stopwords)

Builds an analyzer with the stop words from the given file.

Parameters

matchVersion	Lucene version to match See above
stopwords	File to read stop words from

Throws

IOException

public StandardAnalyzer (Version matchVersion, Reader stopwords)

Builds an analyzer with the stop words from the given reader.

Parameters

matchVersion	Lucene version to match See above
stopwords	Reader to read stop words from

Throws

IOException

Public Methods

public int getMaxTokenLength ()

public TokenStream reusableTokenStream (String fieldName, Reader reader)

Creates a TokenStream that is allowed to be re-used from the previous time that the same thread called this method. Callers that do not need to use more than one TokenStream at the same time from this analyzer should use this method for better performance.

Throws

IOException

public void setMaxTokenLength (int length)

Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or reusableTokenStream is called.

public TokenStream tokenStream (String fieldName, Reader reader)

Constructs a StandardTokenizer filtered by a StandardFilter, a LowerCaseFilter and a StopFilter.

Classes

StandardAnalyzer

Class Overview

Summary

Constants

public static final int DEFAULT_MAX_TOKEN_LENGTH

Fields

public static final Set<?> STOP_WORDS_SET

Public Constructors

public StandardAnalyzer (Version matchVersion)

Parameters

public StandardAnalyzer (Version matchVersion, Set<?> stopWords)

Parameters

public StandardAnalyzer (Version matchVersion, File stopwords)

Parameters

Throws

See Also

public StandardAnalyzer (Version matchVersion, Reader stopwords)

Parameters

Throws

See Also

Public Methods

public int getMaxTokenLength ()

See Also

public TokenStream reusableTokenStream (String fieldName, Reader reader)

Throws

public void setMaxTokenLength (int length)

public TokenStream tokenStream (String fieldName, Reader reader)