morphy

MORPHY(7WN)                        WordNettm                       MORPHY(7WN)

NAME
       morphy - discussion of WordNet's morphological processing

DESCRIPTION
       Although  only  base  forms  of  words  are  usually stored in WordNet,
       searches may be done on inflected forms.  A  set  of  morphology  func-
       tions,  Morphy, is applied to the search string to generate a form that
       is present in WordNet.

       Morphology in WordNet uses two types of processes to try to convert the
       string  passed  into  one  that  can  be found in the WordNet database.
       There are lists of inflectional endings, based on  syntactic  category,
       that can be detached from individual words in an attempt to find a form
       of the word that is in WordNet.  There are also exception  list  files,
       one  for  each  syntactic  category, in which a search for an inflected
       form is done.  Morphy tries to use these two processes in  an  intelli-
       gent  manner  to  translate the string passed to the base form found in
       WordNet.  Morphy first checks for exceptions, then uses  the  rules  of
       detachment.  The Morphy functions are not independent from WordNet. Af-
       ter each transformation, WordNet is searched for the  resulting  string
       in the syntactic category specified.

       The  Morphy  functions are passed a string and a syntactic category.  A
       string is either a single word or a  collocation.   Since  some  words,
       such  as  axes  can have more than one base form (axe and axis), Morphy
       works in the following manner.  The first time that  Morphy  is  called
       with  a  specific  string, it returns a base form.  For each subsequent
       call to Morphy made with a NULL string argument, Morphy returns another
       base form.  Whenever Morphy cannot perform a transformation, whether on
       the first call for a word or subsequent calls,  NULL  is  returned.   A
       transformation  to  a valid English string will return NULL if the base
       form of the string is not in WordNet.

       The morphological functions are found  in  the  WordNet  library.   See
       morph(3WN) for information on using these functions.

   Rules of Detachment
       The following table shows the rules of detachment used by Morphy.  If a
       word ends with one of the suffixes, it is stripped from  the  word  and
       the  corresponding  ending  is added.  Then WordNet is searched for the
       resulting string.  No rules are applicable to adverbs.

                                    |        |
                               POS  | Suffix | Ending
                               -----+--------+--------
                               NOUN | "s"    | ""
                               NOUN | "ses"  | "s"
                               NOUN | "xes"  | "x"
                               NOUN | "zes"  | "z"
                               NOUN | "ches" | "ch"
                               NOUN | "shes" | "sh"
                               NOUN | "men"  | "man"
                               NOUN | "ies"  | "y"
                               VERB | "s"    | ""
                               VERB | "ies"  | "y"
                               VERB | "es"   | "e"
                               VERB | "es"   | ""
                               VERB | "ed"   | "e"
                               VERB | "ed"   | ""
                               VERB | "ing"  | "e"
                               VERB | "ing"  | ""

                               ADJ  | "er"   | ""
                               ADJ  | "est"  | ""
                               ADJ  | "er"   | "e"
                               ADJ  | "est"  | "e"

   Exception Lists
       There is one exception list file for each syntactic category.  The  ex-
       ception  lists  contain  the  morphological transformations for strings
       that are not regular and therefore cannot be processed in an  algorith-
       mic  manner.  Each line of an exception list contains an inflected form
       of a word or collocation, followed by one or more base forms.  The list
       is kept in alphabetical order and a binary search is used to find words
       in these lists.  See wndb(5WN) for information on the format of the ex-
       ception list files.

   Single Words
       In  general, single words are relatively easy to process.  Morphy first
       looks for the word in the exception list.  If it  is  found  the  first
       base  form  is  returned.  Subsequent calls with a NULL argument return
       additional base forms, if present.  A NULL is returned when  there  are
       no more base forms of the word.

       If  the  word  is  not found in the exception list corresponding to the
       syntactic category, an algorithmic process using the rules  of  detach-
       ment  looks  for  a  matching suffix.  If a matching suffix is found, a
       corresponding ending is  applied  (sometimes  this  ending  is  a  NULL
       string,  so in effect the suffix is removed from the word), and WordNet
       is consulted to see if the resulting word is found in the desired  part
       of speech.

   Collocations
       As  opposed  to  single  words,  collocations can be quite difficult to
       transform into a base form that is present  in  WordNet.   In  general,
       only  base  forms  of  words,  even  those comprising collocations, are
       stored in WordNet, such as attorney general.  Transforming the colloca-
       tion  attorneys general  is  then  simply  a matter of finding the base
       forms of the individual words comprising the collocation.  This usually
       works  for  nouns, therefore non-conforming nouns, such as customs duty
       are presently entered in the noun exception list.

       Verb collocations that contain prepositions, such  as  ask for it,  are
       more  difficult.   As with single words, the exception list is searched
       first.  If the collocation is not found, special code in Morphy  deter-
       mines whether a verb collocation includes a preposition.  If it does, a
       function is called to try to find the base form in the  following  man-
       ner.   It  is  assumed that the first word in the collocation is a verb
       and that the last word is a noun.  The algorithm then builds  a  search
       string  with the base forms of the verb and noun, leaving the remainder
       of the collocation (usually just the preposition, but more words may be
       involved)  in the middle.  For example, passed asking for it, the data-
       base search would be performed with ask for it, which is found in Word-
       Net,  and  therefore  returned from Morphy.  If a verb collocation does
       not contain a preposition, then the base form of each word in the  col-
       location is found and WordNet is searched for the resulting string.

   Hyphenation
       Hyphenation  also presents special difficulties when searching WordNet.
       It is often a subjective decision as to whether a word  is  hyphenated,
       joined  as one word, or is a collocation of several words, and which of
       the various forms are entered  into  WordNet.   When  Morphy  breaks  a
       string  into  "words",  it  looks for both spaces and hyphens as delim-
       iters.  It also looks for periods in strings and removes them if an ex-
       act  match is not found.  A search for an abbreviation like oct. return
       the synset for { October, Oct }.  Not every pattern of  hyphenated  and
       collocated  string  is searched for properly, so it may be advantageous
       to specify several search strings if the results of  a  search  attempt
       seem incomplete.

   Special Processing for nouns ending with 'ful'
       Morphy  contains  code that searches for nouns ending with ful and per-
       forms a transformation on the substring preceeding it.  It then appends
       'ful'  back  onto  the resulting string and returns it. For example, if
       passed the nouns boxesful, it will return boxful.

BUGS
       Since  many  noun   collocations   contains   prepositions,   such   as
       line of products, an algorithm similar to that used for verbs should be
       written for  nouns.   In  the  present  scheme,  if  Morphy  is  passed
       lines of products,  the search string becomes line of product, which is
       not in WordNet

       Morphy will allow non-words to be converted to words,  if  they  follow
       one of the rules described above.  For example, it will happily convert
       plantes to plants.

ENVIRONMENT VARIABLES (UNIX)
       WNHOME              Base directory for WordNet.   Default  is  /usr/lo-
                           cal/WordNet-3.0.

       WNSEARCHDIR         Directory  in  which  the WordNet database has been
                           installed.  Default is WNHOME/dict.

REGISTRY (WINDOWS)
       HKEY_LOCAL_MACHINE\SOFTWARE\WordNet\3.0\WNHome
                           Base directory for  WordNet.   Default  is  C:\Pro-
                           gram Files\WordNet\3.0.

FILES
       pos.exc             morphology exception lists

SEE ALSO
       wn(1WN), wnb(1WN), binsrch(3WN), morph(3WN), wndb(5WN), wninput(7WN).

WordNet 3.0                        Dec 2006                        MORPHY(7WN)
Man Pages Copyright Respective Owners. Site Copyright (C) 1994 - 2024 Hurricane Electric. All Rights Reserved.