public class

WordlistLoader

extends Object
java.lang.Object
   ↳ org.apache.lucene.analysis.WordlistLoader

Class Overview

Loader for text files that represent a list of stopwords.

Summary

Public Constructors
WordlistLoader()
Public Methods
static HashMap<StringString> getStemDict(File wordstemfile)
Reads a stem dictionary.
static HashSet<String> getWordSet(Reader reader, String comment)
Reads lines from a Reader and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace).
static HashSet<String> getWordSet(Reader reader)
Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace).
static HashSet<String> getWordSet(File wordfile)
Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace).
static HashSet<String> getWordSet(File wordfile, String comment)
Loads a text file and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace).
[Expand]
Inherited Methods
From class java.lang.Object

Public Constructors

public WordlistLoader ()

Public Methods

public static HashMap<StringString> getStemDict (File wordstemfile)

Reads a stem dictionary. Each line contains:

word\tstem
(i.e. two tab seperated words)

Returns
  • stem dictionary that overrules the stemming algorithm
Throws
IOException

public static HashSet<String> getWordSet (Reader reader, String comment)

Reads lines from a Reader and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters
reader Reader containing the wordlist
comment The string representing a comment.
Returns
  • A HashSet with the reader's words
Throws
IOException

public static HashSet<String> getWordSet (Reader reader)

Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters
reader Reader containing the wordlist
Returns
  • A HashSet with the reader's words
Throws
IOException

public static HashSet<String> getWordSet (File wordfile)

Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the file should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters
wordfile File containing the wordlist
Returns
  • A HashSet with the file's words
Throws
IOException

public static HashSet<String> getWordSet (File wordfile, String comment)

Loads a text file and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the file should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters
wordfile File containing the wordlist
comment The comment string to ignore
Returns
  • A HashSet with the file's words
Throws
IOException