manpagez: man pages & more
info aspell
Home | html | info | man
[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

5.6.3 The Munch-list Command

The munch-list command will reduce the size of word list via affix compression. It will reduce a list of words to a minimal (or close to it) set of roots and affixes that will match the same list of words. The list of words is read from standard input and the result, the “munched” list, is written to standard out. It’s usage is:

aspell munch-list [keep] [single|multi] [simple] < infile > outfile

where ‘simple’, ‘single’, ‘multi’, and ‘keep’ are literal values.

The default algorithm used should give near optimum results. In some cases the set of words returned is, provably, the minimum number possible. In the typical case the number of words returned is within 1% of the optimum number.

By default Aspell will remove redundant affix flags. The ‘keep’ flag will avoid removing them, which can be useful if you want to include all possible expansions for each base word.

When cross products are involved it may be beneficial to list a base word more than once. Unfortunately, the current version of Aspell can not correctly handle multiple base words in a dictionary. Therefore, the current default behavior is to only include the one with the most expansions. All of them can be included via the ‘multi’ flag. Once Aspell is able to handle multiple base words the default will be to include them all. The ‘single’ flag can be used to only include one of them.

The ‘simple’ flag will select an alternate faster algorithm. This algorithm is very similar to the munch command distributed with MySpell (the Open Office spell checker), however, it doesn’t give nearly as good results. It does okay for the English word list but not for some other languages such as German; the normal algorithm reduced a list of 312,002 German words to 79,420 base words while the simple algorithm only reduced it to 115,927 words. This algorithm may disappear in a future version of Aspell.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]
© 2000-2018
Individual documents may contain additional copyright information.