patgen

PATGEN(1)                   General Commands Manual                  PATGEN(1)

NAME
       patgen - generate patterns for TeX hyphenation

SYNOPSIS
       patgen dictionary_file pattern_file patout_file translate_file

DESCRIPTION
       This manual page is not meant to be exhaustive.  See also the Info file
       or manual Web2C: A TeX implementation available as part of the TeX Live
       distribution or at http://tug.org/web2c.

       The  patgen  program reads the dictionary_file containing a list of hy-
       phenated words and  the  pattern_file  containing  previously-generated
       patterns  (if any) for a particular language (not a complete TeX source
       file; see below), and produces the patout_file with  (previously-  plus
       newly-generated)  hyphenation  patterns  for  that language. The trans-
       late_file defines language specific values for the parameters  left_hy-
       phen_min  and  right_hyphen_min used by TeX's hyphenation algorithm and
       the external representation of the lower and upper case  version(s)  of
       all  `letters' of that language. Further details of the pattern genera-
       tion process such as hyphenation levels and  pattern  lengths  are  re-
       quested  interactively from the user's terminal. Optionally patgen cre-
       ates a new dictionary file pattmp.n showing the good  and  bad  hyphens
       found  by  the  generated  patterns, where n is the highest hyphenation
       level.

       The patterns generated by patgen can be read by initex for use  in  hy-
       phenating  words.  For  a  real-life  example  of  patgen's output, see
       $TEXMFMAIN/tex/generic/hyphen/hyphen.tex, which contains  the  patterns
       TeX  uses  for  English by default.  At some sites, patterns for (many)
       other languages may be available, and the local tex programs  may  have
       them preloaded.

       All filenames must be complete; no adding of default extensions or path
       searching is done.

FILE FORMATS
       Letters
           When initex digests hyphenation patterns, TeX first expands  macros
           and  the  result  must entirely consist of digits (hyphenation lev-
           els), dots (`.', edge of a word), and letters. In pattern files for
           non-English  languages  letters  are often represented by macros or
           other expandable constructs.  For the purpose of patgen  these  are
           just character sequences, subject to the condition that no such se-
           quence is a prefix of another one.

       Dictionary file
           A dictionary file contains a weighted list of hyphenated words, one
           word per line starting in column 1. A digit in column 1 indicates a
           global word weight (initially =1) applicable to all following words
           up  to  the next global word weight. A digit at some intercharacter
           position indicates a weight for that position only.

           The hyphens in a word are indicated by `-', `*', or `.'  (or  their
           replacements  as  defined in the translate file) for hyphens yet to
           be found, `good' hyphens (correctly found  by  the  patterns),  and
           `bad'  hyphens  (erroneously  found  by the patterns) respectively;
           when reading a dictionary file `*' is treated like `-' and  `.'  is
           ignored.

       Pattern file
           A  pattern  file  contains only patterns in the format above, e.g.,
           from a previous run of patgen.  It may not contain any TeX comments
           or  control  sequences.   For instance, this is not a valid pattern
           file:

           % this is a pattern file read by TeX.
           \patterns{%
            ...
           }
           It can only contain the actual patterns, i.e., the `...'.

       Translate file
           A translate file starts  with  a  line  containing  the  values  of
           left_hyphen_min  in  columns  1-2, right_hyphen_min in columns 3-4,
           and either a blank or the replacement for one of the "hyphen" char-
           acters  `-',  `*', and `.' in columns 5, 6, and 7. (Input lines are
           padded with blanks as for many TeX related programs.)

           Each following line defines one `letter':  an  arbitrary  delimiter
           character in column 1, followed by one or more external representa-
           tions of that character (first the `lower' case one used  for  out-
           put),  each  one terminated by the delimiter and the whole sequence
           terminated by another delimiter.

           If the translate  file  is  empty,  the  values  left_hyphen_min=2,
           right_hyphen_min=3,  and the 26 lower case letters a...z with their
           upper case representations A...Z are assumed.

       Terminal input
           After reading the translate_file and any previously-generated  pat-
           terns from pattern_file, patgen requests input from the user's ter-
           minal.

           First the integer values of hyph_start and hyph_finish, the  lowest
           and  highest  hyphenation level for which patterns are to be gener-
           ated. The value of hyph_start should be larger than any hyphenation
           level already present in pattern_file.

           Then,  for  each hyphenation level, the integer values of pat_start
           and pat_finish, the smallest and largest pattern length to be  ana-
           lyzed,  as  well  as  good  weight,  bad weight, and threshold, the
           weights for good and bad hyphens and a weight threshold for  useful
           patterns.

           Finally  the decision (`y' or `Y' vs. anything else) whether or not
           to produce a hyphenated word list.

FILES
       $TEXMFMAIN/tex/generic/hyphen/hyphen.tex
           The original hyphenation patterns for English, by Donald Knuth  and
           Frank Liang.

       http://www.ctan.org/pkg/ushyph
           Additional  hyphenation  patterns  for  English, extended by Gerard
           Kuiken.

       http://www.ctan.org/pkg/hyph-utf8
           Collected hyphenation patterns for many languages in many formats.

       http://www.ctan.org/tex-archive/language/
           General CTAN directory for patterns and support for many other lan-
           guages.

SEE ALSO
       Frank Liang and Peter Breitenlohner, patgen.web.

       Frank Liang, Word hy-phen-a-tion by com-puter, STAN-CS-83-977, Stanford
       University Ph.D. thesis, 1983, http://tug.org/docs/liang.

       Donald E. Knuth, The TeXbook, Addison-Wesley, 1986, ISBN 0-201-13447-0,
       Appendix H.

AUTHORS
       Frank  Liang  wrote  the first version of this program.  Peter Breiten-
       lohner made a substantial revision in 1991 for TeX 3.  The  first  ver-
       sion  was  published  as  the appendix to the TeXware technical report.
       Howard Trickey originally ported it to Unix.

Web2C 2019                       16 June 2015                        PATGEN(1)
Man Pages Copyright Respective Owners. Site Copyright (C) 1994 - 2024 Hurricane Electric. All Rights Reserved.