manpagez: man pages & more
man patgen(1)
Home | html | info | man
patgen(1)                                                            patgen(1)




NAME

       patgen - generate patterns for TeX hyphenation


SYNOPSIS

       patgen dictionary_file pattern_file patout_file translate_file


DESCRIPTION

       This manual page is not meant to be exhaustive.  See also the Info file
       or manual Web2C: A TeX implementation available as part of the TeX Live
       distribution or at http://tug.org/web2c.

       The  patgen  program  reads  the  dictionary_file  containing a list of
       hyphenated words and the pattern_file  containing  previously-generated
       patterns  (if any) for a particular language (not a complete TeX source
       file; see below), and produces the patout_file with  (previously-  plus
       newly-generated)  hyphenation  patterns  for  that language. The trans-
       late_file  defines  language  specific  values   for   the   parameters
       left_hyphen_min  and  right_hyphen_min  used by TeX's hyphenation algo-
       rithm and the external representation of the lower and upper case  ver-
       sion(s)  of all `letters' of that language. Further details of the pat-
       tern generation process such as hyphenation levels and pattern  lengths
       are requested interactively from the user's terminal. Optionally patgen
       creates a new dictionary file pattmp.n showing the good and bad hyphens
       found  by  the  generated  patterns, where n is the highest hyphenation
       level.

       The patterns generated by patgen can be  read  by  initex  for  use  in
       hyphenating  words.  For  a  real-life  example of patgen's output, see
       $TEXMFMAIN/tex/generic/hyphen/hyphen.tex, which contains  the  patterns
       TeX  uses  for  English by default.  At some sites, patterns for (many)
       other languages may be available, and the local tex programs  may  have
       them preloaded.

       All filenames must be complete; no adding of default extensions or path
       searching is done.


FILE FORMATS

       Letters
           When initex digests hyphenation patterns, TeX first expands  macros
           and  the  result  must entirely consist of digits (hyphenation lev-
           els), dots (`.', edge of a word), and letters. In pattern files for
           non-English  languages  letters  are often represented by macros or
           other expandable constructs.  For the purpose of patgen  these  are
           just  character  sequences,  subject  to the condition that no such
           sequence is a prefix of another one.

       Dictionary file
           A dictionary file contains a weighted list of hyphenated words, one
           word per line starting in column 1. A digit in column 1 indicates a
           global word weight (initially =1) applicable to all following words
           up  to  the next global word weight. A digit at some intercharacter
           position indicates a weight for that position only.

           The hyphens in a word are indicated by `-', `*', or `.'  (or  their
           replacements  as  defined in the translate file) for hyphens yet to
           be found, `good' hyphens (correctly found  by  the  patterns),  and
           `bad'  hyphens  (erroneously  found  by the patterns) respectively;
           when reading a dictionary file `*' is treated like `-' and  `.'  is
           ignored.

       Pattern file
           A  pattern  file  contains only patterns in the format above, e.g.,
           from a previous run of patgen.  It may not contain any TeX comments
           or  control  sequences.   For instance, this is not a valid pattern
           file:

           % this is a pattern file read by TeX.
           \patterns{%
            ...
           }
           It can only contain the actual patterns, i.e., the `...'.

       Translate file
           A translate file starts  with  a  line  containing  the  values  of
           left_hyphen_min  in  columns  1-2, right_hyphen_min in columns 3-4,
           and either a blank or the replacement for one of the "hyphen" char-
           acters  `-',  `*', and `.' in columns 5, 6, and 7. (Input lines are
           padded with blanks as for many TeX related programs.)

           Each following line defines one `letter':  an  arbitrary  delimiter
           character in column 1, followed by one or more external representa-
           tions of that character (first the `lower' case one used  for  out-
           put),  each  one terminated by the delimiter and the whole sequence
           terminated by another delimiter.

           If the translate  file  is  empty,  the  values  left_hyphen_min=2,
           right_hyphen_min=3,  and the 26 lower case letters a...z with their
           upper case representations A...Z are assumed.

       Terminal input
           After reading the translate_file and any previously-generated  pat-
           terns from pattern_file, patgen requests input from the user's ter-
           minal.

           First the integer values of hyph_start and hyph_finish, the  lowest
           and  highest  hyphenation level for which patterns are to be gener-
           ated. The value of hyph_start should be larger than any hyphenation
           level already present in pattern_file.

           Then,  for  each hyphenation level, the integer values of pat_start
           and pat_finish, the smallest and largest pattern length to be  ana-
           lyzed,  as  well  as  good  weight,  bad weight, and threshold, the
           weights for good and bad hyphens and a weight threshold for  useful
           patterns.

           Finally  the decision (`y' or `Y' vs. anything else) whether or not
           to produce a hyphenated word list.


FILES

       $TEXMFMAIN/tex/generic/hyphen/hyphen.tex
           The original hyphenation patterns for English, by Donald Knuth  and
           Frank Liang.

       http://www.ctan.org/pkg/ushyph
           Additional  hyphenation  patterns  for  English, extended by Gerard
           Kuiken.

       http://www.ctan.org/pkg/hyph-utf8
           Collected hyphenation patterns for many languages in many  formats.

       http://www.ctan.org/tex-archive/language/
           General CTAN directory for patterns and support for many other lan-
           guages.


SEE ALSO

       Frank Liang and Peter Breitenlohner, patgen.web.

       Frank Liang, Word hy-phen-a-tion by com-puter, STAN-CS-83-977, Stanford
       University Ph.D. thesis, 1983, http://tug.org/docs/liang.

       Donald E. Knuth, The TeXbook, Addison-Wesley, 1986, ISBN 0-201-13447-0,
       Appendix H.


AUTHORS

       Frank Liang wrote the first version of this  program.   Peter  Breiten-
       lohner  made  a substantial revision in 1991 for TeX 3.  The first ver-
       sion was published as the appendix to  the  TeXware  technical  report.
       Howard Trickey originally ported it to Unix.



Web2C 2016                       16 June 2015                        patgen(1)

texlive-bin-extra 41101 - Generated Sat Jul 2 12:49:38 CDT 2016
© manpagez.com 2000-2025
Individual documents may contain additional copyright information.