Summary: Ctors | Methods | Inherited Methods | [Expand All]

public class

WordlistLoader

extends Object

java.lang.Object
↳	org.apache.lucene.analysis.WordlistLoader

Class Overview

Loader for text files that represent a list of stopwords.

Summary

Public Constructors
	WordlistLoader()

Public Methods
static HashMap<String, String>	getStemDict(File wordstemfile) Reads a stem dictionary.
static HashSet<String>	getWordSet(Reader reader, String comment) Reads lines from a Reader and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace).
static HashSet<String>	getWordSet(Reader reader) Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace).
static HashSet<String>	getWordSet(File wordfile) Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace).
static HashSet<String>	getWordSet(File wordfile, String comment) Loads a text file and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace).

[Expand]

Inherited Methods

From class java.lang.Object

Public Constructors

public WordlistLoader ()

Public Methods

public static HashMap<String, String> getStemDict (File wordstemfile)

Reads a stem dictionary. Each line contains:

word\tstem

(i.e. two tab seperated words)

Returns

stem dictionary that overrules the stemming algorithm

Throws

IOException

public static HashSet<String> getWordSet (Reader reader, String comment)

Reads lines from a Reader and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters

reader	Reader containing the wordlist
comment	The string representing a comment.

Returns

A HashSet with the reader's words

Throws

IOException

public static HashSet<String> getWordSet (Reader reader)

Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters

reader	Reader containing the wordlist

Returns

A HashSet with the reader's words

Throws

IOException

public static HashSet<String> getWordSet (File wordfile)

Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the file should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters

wordfile	File containing the wordlist

Returns

A HashSet with the file's words

Throws

IOException

public static HashSet<String> getWordSet (File wordfile, String comment)

Loads a text file and adds every non-comment line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the file should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters

wordfile	File containing the wordlist
comment	The comment string to ignore

Returns

A HashSet with the file's words

Throws

IOException

Classes

WordlistLoader

Class Overview

Summary

Public Constructors

public WordlistLoader ()

Public Methods

public static HashMap<String, String> getStemDict (File wordstemfile)

Returns

Throws

public static HashSet<String> getWordSet (Reader reader, String comment)

Parameters

Returns

Throws

public static HashSet<String> getWordSet (Reader reader)

Parameters

Returns

Throws

public static HashSet<String> getWordSet (File wordfile)

Parameters

Returns

Throws

public static HashSet<String> getWordSet (File wordfile, String comment)

Parameters

Returns

Throws