public abstract class

LogMergePolicy

extends MergePolicy
java.lang.Object
   ↳ org.apache.lucene.index.MergePolicy
     ↳ org.apache.lucene.index.LogMergePolicy
Known Direct Subclasses

Class Overview

This class implements a MergePolicy that tries to merge segments into levels of exponentially increasing size, where each level has fewer segments than the value of the merge factor. Whenever extra segments (beyond the merge factor upper bound) are encountered, all segments within the level are merged. You can get or set the merge factor using getMergeFactor() and setMergeFactor(int) respectively.

This class is abstract and requires a subclass to define the size(SegmentInfo) method which specifies how a segment's size is determined. LogDocMergePolicy is one subclass that measures size by document count in the segment. LogByteSizeMergePolicy is another subclass that measures size as the total byte size of the file(s) for the segment.

Summary

Constants
int DEFAULT_MAX_MERGE_DOCS Default maximum segment size.
int DEFAULT_MERGE_FACTOR Default merge factor, which is how many segments are merged at a time
double DEFAULT_NO_CFS_RATIO Default noCFSRatio.
double LEVEL_LOG_SPAN Defines the allowed range of log(size) for each level.
Fields
protected boolean calibrateSizeByDeletes
protected double noCFSRatio
[Expand]
Inherited Fields
From class org.apache.lucene.index.MergePolicy
Public Constructors
LogMergePolicy(IndexWriter writer)
Public Methods
void close()
Release all resources for the policy.
MergePolicy.MergeSpecification findMerges(SegmentInfos infos)
Checks if any merges are now necessary and returns a MergePolicy.MergeSpecification if so.
MergePolicy.MergeSpecification findMergesForOptimize(SegmentInfos infos, int maxNumSegments, Set<SegmentInfo> segmentsToOptimize)
Returns the merges necessary to optimize the index.
MergePolicy.MergeSpecification findMergesToExpungeDeletes(SegmentInfos segmentInfos)
Finds merges necessary to expunge all deletes from the index.
boolean getCalibrateSizeByDeletes()
Returns true if the segment size should be calibrated by the number of deletes when choosing segments for merge.
int getMaxMergeDocs()
Returns the largest segment (measured by document count) that may be merged with other segments.
int getMergeFactor()

Returns the number of segments that are merged at once and also controls the total number of segments allowed to accumulate in the index.

double getNoCFSRatio()
boolean getUseCompoundDocStore()
Returns true if newly flushed and newly merge doc store segment files (term vectors and stored fields) are written in compound file format.
boolean getUseCompoundFile()
Returns true if newly flushed and newly merge segments are written in compound file format.
void setCalibrateSizeByDeletes(boolean calibrateSizeByDeletes)
Sets whether the segment size should be calibrated by the number of deletes when choosing segments for merge.
void setMaxMergeDocs(int maxMergeDocs)

Determines the largest segment (measured by document count) that may be merged with other segments.

void setMergeFactor(int mergeFactor)
Determines how often segment indices are merged by addDocument().
void setNoCFSRatio(double noCFSRatio)
If a merged segment will be more than this percentage of the total size of the index, leave the segment as non-compound file even if compound file is enabled.
void setUseCompoundDocStore(boolean useCompoundDocStore)
Sets whether compound file format should be used for newly flushed and newly merged doc store segment files (term vectors and stored fields).
void setUseCompoundFile(boolean useCompoundFile)
Sets whether compound file format should be used for newly flushed and newly merged segments.
boolean useCompoundDocStore(SegmentInfos infos)
Returns true if the doc store files should use the compound file format.
boolean useCompoundFile(SegmentInfos infos, SegmentInfo info)
Returns true if a newly flushed (not from merge) segment should use the compound file format.
Protected Methods
MergePolicy.OneMerge makeOneMerge(SegmentInfos infos, SegmentInfos infosToMerge)
abstract long size(SegmentInfo info)
long sizeBytes(SegmentInfo info)
long sizeDocs(SegmentInfo info)
boolean verbose()
[Expand]
Inherited Methods
From class org.apache.lucene.index.MergePolicy
From class java.lang.Object
From interface java.io.Closeable

Constants

public static final int DEFAULT_MAX_MERGE_DOCS

Default maximum segment size. A segment of this size or larger will never be merged. @see setMaxMergeDocs

Constant Value: 2147483647 (0x7fffffff)

public static final int DEFAULT_MERGE_FACTOR

Default merge factor, which is how many segments are merged at a time

Constant Value: 10 (0x0000000a)

public static final double DEFAULT_NO_CFS_RATIO

Default noCFSRatio. If a merge's size is >= 10% of the index, then we disable compound file for it.

Constant Value: 0.1

public static final double LEVEL_LOG_SPAN

Defines the allowed range of log(size) for each level. A level is computed by taking the max segment log size, minus LEVEL_LOG_SPAN, and finding all segments falling within that range.

Constant Value: 0.75

Fields

protected boolean calibrateSizeByDeletes

protected double noCFSRatio

Public Constructors

public LogMergePolicy (IndexWriter writer)

Public Methods

public void close ()

Release all resources for the policy.

public MergePolicy.MergeSpecification findMerges (SegmentInfos infos)

Checks if any merges are now necessary and returns a MergePolicy.MergeSpecification if so. A merge is necessary when there are more than setMergeFactor(int) segments at a given level. When multiple levels have too many segments, this method will return multiple merges, allowing the MergeScheduler to use concurrency.

Parameters
infos the total set of segments in the index
Throws
IOException

public MergePolicy.MergeSpecification findMergesForOptimize (SegmentInfos infos, int maxNumSegments, Set<SegmentInfo> segmentsToOptimize)

Returns the merges necessary to optimize the index. This merge policy defines "optimized" to mean only one segment in the index, where that segment has no deletions pending nor separate norms, and it is in compound file format if the current useCompoundFile setting is true. This method returns multiple merges (mergeFactor at a time) so the MergeScheduler in use may make use of concurrency.

Parameters
infos the total set of segments in the index
maxNumSegments requested maximum number of segments in the index (currently this is always 1)
segmentsToOptimize contains the specific SegmentInfo instances that must be merged away. This may be a subset of all SegmentInfos.
Throws
IOException

public MergePolicy.MergeSpecification findMergesToExpungeDeletes (SegmentInfos segmentInfos)

Finds merges necessary to expunge all deletes from the index. We simply merge adjacent segments that have deletes, up to mergeFactor at a time.

Parameters
segmentInfos the total set of segments in the index

public boolean getCalibrateSizeByDeletes ()

Returns true if the segment size should be calibrated by the number of deletes when choosing segments for merge.

public int getMaxMergeDocs ()

Returns the largest segment (measured by document count) that may be merged with other segments.

public int getMergeFactor ()

Returns the number of segments that are merged at once and also controls the total number of segments allowed to accumulate in the index.

public double getNoCFSRatio ()

public boolean getUseCompoundDocStore ()

Returns true if newly flushed and newly merge doc store segment files (term vectors and stored fields) are written in compound file format. @see #setUseCompoundDocStore

public boolean getUseCompoundFile ()

Returns true if newly flushed and newly merge segments are written in compound file format. @see #setUseCompoundFile

public void setCalibrateSizeByDeletes (boolean calibrateSizeByDeletes)

Sets whether the segment size should be calibrated by the number of deletes when choosing segments for merge.

public void setMaxMergeDocs (int maxMergeDocs)

Determines the largest segment (measured by document count) that may be merged with other segments. Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.

The default value is MAX_VALUE.

The default merge policy (LogByteSizeMergePolicy) also allows you to set this limit by net size (in MB) of the segment, using setMaxMergeMB(double).

public void setMergeFactor (int mergeFactor)

Determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.

public void setNoCFSRatio (double noCFSRatio)

If a merged segment will be more than this percentage of the total size of the index, leave the segment as non-compound file even if compound file is enabled. Set to 1.0 to always use CFS regardless of merge size.

public void setUseCompoundDocStore (boolean useCompoundDocStore)

Sets whether compound file format should be used for newly flushed and newly merged doc store segment files (term vectors and stored fields).

public void setUseCompoundFile (boolean useCompoundFile)

Sets whether compound file format should be used for newly flushed and newly merged segments.

public boolean useCompoundDocStore (SegmentInfos infos)

Returns true if the doc store files should use the compound file format.

public boolean useCompoundFile (SegmentInfos infos, SegmentInfo info)

Returns true if a newly flushed (not from merge) segment should use the compound file format.

Protected Methods

protected MergePolicy.OneMerge makeOneMerge (SegmentInfos infos, SegmentInfos infosToMerge)

Throws
IOException

protected abstract long size (SegmentInfo info)

Throws
IOException

protected long sizeBytes (SegmentInfo info)

Throws
IOException

protected long sizeDocs (SegmentInfo info)

Throws
IOException

protected boolean verbose ()