package

org.apache.lucene.analysis.standard

A fast grammar-based tokenizer constructed with JFlex.

Classes

StandardAnalyzer Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words. 
StandardFilter Normalizes tokens extracted with StandardTokenizer
StandardTokenizer A grammar-based tokenizer constructed with JFlex

This should be a good tokenizer for most European-language documents:

  • Splits words at punctuation characters, removing punctuation.