gvpr

       ( previously known as gpr )

SYNOPSIS
       gvpr  [-icV?]   [  -o  outfile ] [ -a args ] [ 'prog' | -f progfile ] [
       files ]

DESCRIPTION
       gvpr is a graph stream editor inspired by awk.  It copies input  graphs
       to  its  output,  possibly transforming their structure and attributes,
       creating new graphs, or  printing  arbitrary  information.   The  graph
       model  is that provided by libagraph(3).  In particular, gvpr reads and
       writes graphs using the dot language.

       Basically, gvpr traverses each input graph,  denoted  by  $G,  visiting
       each  node  and  edge, matching it with the predicate-action rules sup-
       plied in the input program.  The rules are  evaluated  in  order.   For
       each  predicate  evaluating  to  true, the corresponding action is per-
       formed.  During the traversal, the current node or edge  being  visited
       is denoted by $.

       For  each  input graph, there is a target subgraph, denoted by $T, ini-
       tially empty and used to accumulate  chosen  entities,  and  an  output
       graph,  $O,  used  for final processing and then written to output.  By
       default, the output graph is the target graph.  The output graph can be
       set in the program or, in a limited sense, on the command line.

OPTIONS
       The following options are supported:

       -a args
              The  string args is split into whitespace-separated tokens, with
              the individual tokens available as strings in the  gvpr  program
              as  ARGV[0],...,ARGV[ARGC-1].  Whitespace characters within sin-
              gle or double quoted substrings, or preceded by a backslash, are
              ignored  as separators.  In general, a backslash character turns
              off any special meaning of the following character.   Note  that
              the tokens derived from multiple -a flags are concatenated.

       -c     Use the source graph as the output graph.

       -i     Derive  the  node-induced subgraph extension of the output graph
              in the context of its root graph.

       -o outfile
              Causes the output stream to be written to the specified file; by
              default, output is written to stdout.

       -f progfile
              Use the contents of the specified file as the program to execute
              on the input. If progfile contains a slash character,  the  name
              is  taken  as the pathname of the file. Otherwise, gvpr will use
              the directories specified in the environment variable GPRPATH to
              look  for  the file. If -f is not given, gvpr will use the first
              non-option argument as the program.

PROGRAMS
       A  gvpr  program consists of a list of predicate-action clauses, having
       one of the forms:

              BEGIN { action }

              BEG_G { action }

              N [ predicate ] { action }

              E [ predicate ] { action }

              END_G { action }

              END { action }

       A program can contain at most one of each of the  BEGIN,  BEG_G,  END_G
       and  END  clauses.   There can be any number of N and E statements, the
       first applied to nodes, the second to edges.  The  top-level  semantics
       of a gvpr program are:

              Evaluate the BEGIN clause, if any.
              For each input graph G {
                  Set G as the current graph and current object.
                  Evaluate the BEG_G clause, if any.
                  For each node and edge in G {
                    Set the node or edge as the current object.
                    Evaluate the N or E clauses, as appropriate.
                  }
                  Set G as the current object.
                  Evaluate the END_G clause, if any.
              }
              Evaluate the END clause, if any.

       The  actions  of  the BEGIN, BEG_G, END_G and END clauses are performed
       when the clauses are evaluated.  For N or E clauses, either the  predi-
       cate  or  action  may  be  omitted.   If  there is no predicate with an
       action, the action is performed on every node or edge, as  appropriate.
       If  there is no action and the predicate evaluates to true, the associ-
       ated node or edge is added to the target graph.

       Predicates and actions are sequences of statements  in  the  C  dialect
       supported by the libexpr(3) library.  The only difference between pred-
       icates and actions is that the former must have a type that may  inter-
       preted  as  either  true or false.  Here the usual C convention is fol-
       lowed, in which a non-zero value is considered true. This would include
       non-empty  strings  and non-empty references to nodes, edges, etc. How-
       ever, if a string can be converted to an integer, this value is used.

       In addition to the usual C base types (void, int,  char,  float,  long,
       unsigned  and double), gvpr provides string as a synonym for char*, and
       the graph-based types node_t, edge_t, graph_t  and  obj_t.   The  obj_t
       type  can  be  viewed as a supertype of the other 3 concrete types; the
       Array declarations have the form:

               type array [ type0 ]

       where  type0  is optional. If it is supplied, the parser  will  enforce
       that  all  array  subscripts have the specified type. If it is not sup-
       plied, objects of all types can be used as subscripts.  As in C,  vari-
       ables  and  arrays must be declared. In particular, an undeclared vari-
       able will be interpreted as the name of an attribute of a node, edge or
       graph, depending on the context.

       Executable statements can be one of the following:
              { [ statement ... ] }
              expression              // commonly var = expression
              if( expression ) statement [ else statement ]
              for( expression ; expression ; expression ) statement
              for( array [ var ]) statement
              while( expression ) statement
              switch( expression ) case statements
              break [ expression ]
              continue [ expression ]
              return [ expression ]
       Items in brackets are optional.

       In  the  second  form  of the for statement, the variable var is set to
       each value used as an index in the specified array and then the associ-
       ated  statement  is  evaluated. Function definitions can only appear in
       the BEGIN clause.

       Expressions include the usual C expressions.  String comparisons  using
       == and != treat the right hand operand as a pattern.  gvpr will attempt
       to use an expression as a string or numeric value as appropriate.

       Expressions of graphical type (i.e., graph_t,  node_t,  edge_t,  obj_t)
       may  be followed by a field reference in the form of .name. The result-
       ing value is the value of the attribute named name of the given object.
       In  addition,  in certain contexts an undeclared, unmodified identifier
       is taken to be an attribute name. Specifically, such identifiers denote
       attributes  of  the  current  node  or  edge,  respectively, in N and E
       clauses, and the current graph in BEG_G and END_G clauses.

       As usual in the libagraph(3) model, attributes are  string-valued.   In
       addition, gvpr supports certain pseudo-attributes of graph objects, not
       necessarily string-valued. These reflect intrinsic  properties  of  the
       graph objects and cannot be set by the user.

       head : node_t
              the head of an edge.

       tail : node_t
              the tail of an edge.

       name : string
              the  name of an edge, node or graph. The name of an edge has the
              the degree of a node.

       root : graph_t
              the root graph of an object. The root of a root graph is itself.

       parent : graph_t
              the  parent  graph  of a subgraph. The parent of a root graph is
              NULL

       n_edges : int
              the number of edges in the graph

       n_nodes : int
              the number of nodes in the graph

       directed : int
              true (non-zero) if the graph is directed

       strict : int
              true (non-zero) if the graph is strict

BUILT-IN FUNCTIONS
       The following functions are built into gvpr. Those functions  returning
       references to graph objects return NULL in case of failure.

   Graphs and subgraph
       graph(s : string, t : string) : graph_t
              creates  a  graph whose name is s and whose type is specified by
              the string t. Ignoring case, the characters U, D, S, N have  the
              interpretation  undirected,  directed,  strict,  and non-strict,
              respectively. If t is empty, a  directed,  non-strict  graph  is
              generated.

       subg(g : graph_t, s : string) : graph_t
              creates  a  subgraph  in  graph  g  with name s. If the subgraph
              already exists, it is returned.

       isSubg(g : graph_t, s : string) : graph_t
              returns the subgraph in graph g with name s, if  it  exists,  or
              NULL otherwise.

       fstsubg(g : graph_t) : graph_t
              returns the first subgraph in graph g, or NULL if none exists.

       nxtsubg(sg : graph_t) : graph_t
              returns the next subgraph after sg, or NULL.

       isDirect(g : graph_t) : int
              returns true if and only if g is directed.

       isStrict(g : graph_t) : int
              returns true if and only if g is strict.

       nNodes(g : graph_t) : int

       fstnode(g : graph_t) : node_t
              returns the first node in graph g, or NULL if none exists.

       nxtnode(n : node_t) : node_t
              returns the next node after n in the root graph, or NULL.

       nxtnode_sg(sg : graph_t, n : node_t) : node_t
              returns the next node after n in sg, or NULL.

       isNode(sg : graph_t, s : string) : node_t
              looks for a node in (sub)graph sg of name  s.  If  such  a  node
              exists, it is returned. Otherwise, NULL is returned.

       isSubnode(sg : graph_t, n : node_t) : int
              returns  non-zero  if node n is in (sub)graph sg, or zero other-
              wise.

       indegreeOf(sg : graph_t, n : node_t) : int
              returns the indegree of node n in (sub)graph sg.

       outdegreeOf(sg : graph_t, n : node_t) : int
              returns the outdegree of node n in (sub)graph sg.

       degreeOf(sg : graph_t, n : node_t) : int
              returns the degree of node n in (sub)graph sg.

   Edges
       edge(t : node_t, h : node_t, s : string) : edge_t
              creates an edge with tail node t, head node h and name s in  the
              root  graph. If the graph is undirected, the distinction between
              head and tail nodes is unimportant.  If  such  an  edge  already
              exists, it is returned.

       edge_sg(sg : graph_t, t : node_t, h : node_t, s : string) : edge_t
              creates  an  edge  with  tail  node t, head node h and name s in
              (sub)graph sg (and all parent graphs). If  the  graph  is  undi-
              rected,  the distinction between head and tail nodes is unimpor-
              tant.  If such an edge already exists, it is returned.

       subedge(g : graph_t, e : edge_t) : edge_t
              inserts the edge e into the subgraph g. Returns the edge.

       isEdge(t : node_t, h : node_t, s : string) : edge_t
              looks for an edge with tail node t, head node h and name  s.  If
              the  graph  is undirected, the distinction between head and tail
              nodes is unimportant.  If such an edge exists, it  is  returned.
              Otherwise, NULL is returned.

       isEdge_sg(sg : graph_t, t : node_t, h : node_t, s : string) : edge_t
              looks  for  an  edge with tail node t, head node h and name s in
              (sub)graph sg. If  the  graph  is  undirected,  the  distinction
              between  head  and  tail  nodes is unimportant.  If such an edge
              exists, it is returned. Otherwise, NULL is returned.
              returns the next outedge after e in the root graph.

       nxtout_sg(sg : graph_t, e : edge_t) : edge_t
              returns the next outedge after e in graph sg.

       fstin(n : node_t) : edge_t
              returns the first inedge of node n in the root graph.

       fstin_sg(sg : graph_t, n : node_t) : edge_t
              returns the first inedge of node n in graph sg.

       nxtin(e : edge_t) : edge_t
              returns the next inedge after e in the root graph.

       nxtin_sg(sg : graph_t, e : edge_t) : edge_t
              returns the next inedge after e in graph sg.

       fstedge(n : node_t) : edge_t
              returns the first edge of node n in the root graph.

       fstedge_sg(sg : graph_t, n : node_t) : edge_t
              returns the first edge of node n in graph sg.

       nxtedge(e : edge_t, node_t) : edge_t
              returns the next edge after e in the root graph.

       nxtedge_sg(sg : graph_t, e : edge_t, node_t) : edge_t
              returns the next edge after e in the graph sg.

   Graph I/O
       write(g : graph_t) : void
              prints g in dot format onto the output stream.

       writeG(g : graph_t, fname : string) : void
              prints g in dot format into the file fname.

       fwriteG(g : graph_t, fd : int) : void
              prints g in dot format onto the open stream denoted by the inte-
              ger fd.

       readG(fname : string) : graph_t
              returns a graph read from the file fname. The graph should be in
              dot format. If no graph can be read, NULL is returned.

       freadG(fd : int) : graph_t
              returns  the  next  graph read from the open stream fd.  Returns
              NULL at end of file.

   Graph miscellany
       delete(g : graph_t, x : obj_t) : void
              deletes object x from graph g.  If g is NULL, the function  uses
              the  root graph of x.  If x is a graph or subgraph, it is closed
              unless x is locked.

              in which case the cloned object will be a new root graph.

       copy(g : graph_t, x : obj_t) : obj_t
              creates a copy of object x in graph g, where the new object  has
              the  same  name/value  attributes as the original object.  If an
              object with the same key as x already exists, its attributes are
              overlaid  by  those  of x and the object is returned.  Note that
              this is a shallow copy. If x is a  graph,  none  of  its  nodes,
              edges  or  subgraphs  are  copied into the new graph. If x is an
              edge, the endpoints are created if necessary, but they  are  not
              cloned.   If  x  is  a  graph,  g may be NULL, in which case the
              cloned object will be a new root graph.

       copyA(src : obj_t, tgt : obj_t) : int
              copies the attributes of object src to object  tgt,  overwriting
              any attribute values tgt may initially have.

       induce(g : graph_t) : void
              extends  g  to  its  node-induced subgraph extension in its root
              graph.

       aget(src : obj_t, name : string) : string
              returns the value of attribute name in object src. This is  use-
              ful for those cases when name conflicts with one of the keywords
              such as "head" or "root".  Returns NULL on  failure  or  if  the
              attribute is not defined.

       aset(src : obj_t, name : string, value : string) : int
              sets  the  value  of  attribute  name  in  object  src to value.
              Returns 0 on success, non-zero on failure. See aget above.

       getDflt(g : graph_t, kind : string, name : string) : string
              returns the default value of attribute name in objects in  g  of
              the  given  kind.  For  nodes, edges, and graphs, kind should be
              "N", "E", and "G", respectively.  Returns NULL on failure or  if
              the attribute is not defined.

       setDflt(g  :  graph_t,  kind : string, name : string, value : string) :
       int
              sets the default value of attribute name to value in objects  in
              g  of  the given kind. For nodes, edges, and graphs, kind should
              be "N", "E", and "G", respectively.  Returns 0 on success,  non-
              zero on failure. See setDflt above.

       compOf(g : graph_t, n : node_t) : graph_t
              returns  the  connected component of the graph g containing node
              n, as a subgraph of g. The subgraph only contains the nodes. One
              can  use induce to add the edges. The function fails and returns
              NULL if n is not in g. Connectivity is based on  the  underlying
              undirected graph of g.

       kindOf(obj : obj_t) : string
              returns  an indication of what kind of graph object is the argu-
              ment.  For nodes, edges, and graphs, it returns should  be  "N",
              returns the string resulting from formatting the values  of  the
              expressions  occurring after fmt according to the printf(3) for-
              mat fmt

       gsub(str : string, pat : string) : string

       gsub(str : string, pat : string, repl : string) : string
              returns str with all substrings matching pat deleted or replaced
              by repl, respectively.

       sub(str : string, pat : string) : string

       sub(str : string, pat : string, repl : string) : string
              returns  str with the leftmost substring matching pat deleted or
              replaced by repl, respectively. The characters '^' and  '$'  may
              be used at the beginning and end, respectively, of pat to anchor
              the pattern to the beginning or end of str.

       substr(str : string, idx : int) : string

       substr(str : string, idx : int, len : int) : string
              returns the substring of str starting at position idx to the end
              of  the  string or of length len, respectively.  Indexing starts
              at 0. If idx is negative or idx is greater than  the  length  of
              str, a fatal error occurs. Similarly, in the second case, if len
              is negative or idx + len is greater than the length  of  str,  a
              fatal error occurs.

       length(s : string) : int
              returns the length of the string s.

       index(s : string, t : string) : int
              returns  the  index of the character in string s where the left-
              most copy of string t can be found, or -1 if t  is  not  a  sub-
              string of s.

       match(s : string, p : string) : int
              returns  the  index of the character in string s where the left-
              most match of pattern p can be found, or -1 if no substring of s
              matches p.

       canon(s : string) : string
              returns  a  version of s appropriate to be used as an identifier
              in a dot file.

       xOf(s : string) : string
              returns the string "x" if s has the form "x,y", where both x and
              y are numeric.

       yOf(s : string) : string
              returns the string "y" if s has the form "x,y", where both x and
              y are numeric.

       llOf(s : string) : string
              fmt, addresses having the form &v,  where  v  is  some  declared
              variable  of the correct type.  Returns the number of items suc-
              cessfully scanned.

   I/O
       print(...) : void
              print( expr, ... ) prints a string representation of each  argu-
              ment in turn onto stdout, followed by a newline.

       printf(fmt : string, ...) : int

       printf(fd : int, fmt : string, ...) : int
              prints  the  string  resulting from formatting the values of the
              expressions following fmt according to the printf(3) format fmt.
              Returns  0 on success.  By default, it prints on stdout.  If the
              optional integer fd is given, output  is  written  on  the  open
              stream associated with fd.

       scanf(fmt : string, ...) : int

       scanf(fd : int, fmt : string, ...) : int
              scans  in  values from an input stream according to the scanf(3)
              format fmt.  The values are stored in  the  addresses  following
              fmt,  addresses  having  the  form  &v, where v is some declared
              variable of the correct type.  By default, it reads from  stdin.
              If the optional integer fd is given, input is read from the open
              stream associated with fd.  Returns the number of items success-
              fully scanned.

       openF(s : string, t : string) : int
              opens  the file s as an I/O stream. The string argument t speci-
              fies how the file is opened. The arguments are the same  as  for
              the  C  function  fopen(3).   It returns an integer denoting the
              stream, or -1 on error.

              As usual, streams 0, 1 and 2 are already open as stdin,  stdout,
              and  stderr,  respectively. Since gvpr may use stdin to read the
              input graphs, the user should avoid using this stream.

       closeF(fd : int) : int
              closes the open stream denoted by the integer fd.  Streams  0, 1
              and 2 cannot be closed.  Returns 0 on success.

       readL(fd : int) : string
              returns  the next line read from the input stream fd. It returns
              the empty string "" on end of file. Note that the newline  char-
              acter is left in the returned string.

   Math
       exp(d : double) : double
              returns e to the dth power.

       log(d : double) : double
              returns the natural log of d.

       atan2(y : double, x : double) : double
              returns the arctangent of y/x in the range -pi to pi.

   Miscellaneous
       exit() : void

       exit(v : int) : void
              causes  gvpr  to  exit with the exit code v.  v defaults to 0 if
              omitted.

       rand() : double
              returns a pseudo-random double between 0 and 1.

       srand() : int

       srand(v : int) : int
              sets a seed for the random number generator. The optional  argu-
              ment gives the seed; if it is omitted, the current time is used.
              The previous seed value is  returned.  srand  should  be  called
              before any calls to rand.

BUILT-IN VARIABLES
       gvpr provides certain special, built-in variables, whose values are set
       automatically by gvpr depending on the context. Except  as  noted,  the
       user cannot modify their values.

       $ : obj_t
              denotes  the current object (node, edge, graph) depending on the
              context.  It is not available in BEGIN or END clauses.

       $F : string
              is the name of the current input file.

       $G : graph_t
              denotes the current graph being processed. It is  not  available
              in BEGIN or END clauses.

       $O : graph_t
              denotes the output graph. Before graph traversal, it is initial-
              ized to the target graph. After traversal and any END_G actions,
              if  it  refers  to a non-empty graph, that graph is printed onto
              the output stream.  It is only valid in N, E and END_G  clauses.
              The output graph may be set by the user.

       $T : graph_t
              denotes  the current target graph. It is a subgraph of $G and is
              available only in N, E and END_G clauses.

       $tgtname : string
              denotes the name of the target graph.  By default, it is set  to
              "gvpr_result".   If  used multiple times during the execution of
              gvpr, the name will be appended with an integer.  This  variable
              may be set by the user.

       ARGC : int
              denotes the number of arguments specified by the  -a  args  com-
              mand-line argument.

       ARGV : string array
              denotes the array of arguments specified by the -a args command-
              line argument. The ith argument is given by ARGV[i].

BUILT-IN CONSTANTS
       There are several symbolic constants defined by gvpr.

       NULL : obj_t
              a null object reference, equivalent to 0.

       TV_flat : tvtype_t
              a simple, flat traversal, with graph objects  visited  in  seem-
              ingly arbitrary order.

       TV_ne : tvtype_t
              a traversal which first visits all of the nodes, then all of the
              edges.

       TV_en : tvtype_t
              a traversal which first visits all of the edges, then all of the
              nodes.

       TV_dfs : tvtype_t
              a  traversal  of  the  graph  using  a depth-first search on the
              underlying undirected graph.  To do  the  traversal,  gvpr  will
              check  the  value of $tvroot. If this has the same value that it
              had previously (at the start, the previous value is  initialized
              to  NULL.),  gvpr  will  simply look for some unvisited node and
              traverse its connected component. On the other hand, if  $tvroot
              has changed, its connected component will be toured, assuming it
              has not been previously visited or, if $tvroot is NULL, the tra-
              versal will stop. Note that using TV_dfs and $tvroot, it is pos-
              sible to create an infinite loop.

       TV_fwd : tvtype_t
              a traversal of the graph using a depth-first search on the graph
              following only forward arcs. In

       TV_bfs : tvtype_t
              a traversal of the graph using a bread-first search on the graph
              ignoring edge directions. See the item on TV_dfs above  for  the
              role  of  $tvroot.  libagraph(3), edges in undirected graphs are
              given an arbitrary direction, which is used for this  traversal.
              The  choice  of roots for the traversal is the same as described
              for TV_dfs above.

       TV_rev : tvtype_t
              a traversal of the graph using a depth-first search on the graph
              following  only  reverse  arcs.  In libagraph(3), edges in undi-
              rected graphs are given an arbitrary direction,  which  is  used
              BEGIN { int n, e; int tot_n = 0; int tot_e = 0; }
              BEG_G {
                n = nNodes($G);
                e = nEdges($G);
                printf ("%d nodes %d edges %s0, n, e, $G.name);
                tot_n += n;
                tot_e += e;
              }
              END { printf ("%d nodes %d edges total0, tot_n, tot_e) }

       Version of the program gc.

              gvpr -c ""

       Equivalent to nop.

              BEG_G { graph_t g = graph ("merge", "S"); }
              E {
                node_t h = clone(g,$.head);
                node_t t = clone(g,$.tail);
                edge_t e = edge(t,h,"");
                e.weight = e.weight + 1;
              }
              END_G { $O = g; }

       Produces a  strict  version  of  the  input  graph,  where  the  weight
       attribute  of an edge indicates how many edges from the input graph the
       edge represents.

              BEGIN {node_t n; int deg[]}
              E{deg[head]++; deg[tail]++; }
              END_G {
                for (deg[n]) {
                  printf ("deg[%s] = %d0, n.name, deg[n]);
                }
              }

       Computes the degrees of nodes with edges.

ENVIRONMENT
       GPRPATH
              Colon-separated list of directories to be searched to  find  the
              file specified by the -f option.

BUGS AND WARNINGS
       When  the  program is given as a command line argument, the usual shell
       interpretation takes place, which may affect some of the special  names
       in  gvpr.  To  avoid  this,  it  is  best to wrap the program in single
       quotes.

       As of 24 April 2008, gvpr switched to using  a  new,  underlying  graph
       library,  which uses the simpler model that there is only one copy of a
       node, not one copy for each  subgraph  logically  containing  it.  This
       means that iterators such as InxtnodeP cannot traverse a subgraph using
       with each branch doing a return, type  checking  may  fail.   Functions
       should use a return at the end.

       The  expr  library  does  not  support string values of (char*)0.  This
       means we can't distinguish between "" and (char*)0 edge keys.  For  the
       purposes  of  looking  up  and  creating  edges,  we translate "" to be
       (char*)0, since this latter value is necessary in order to look up  any
       edge with a matching head and tail.

       Related  to this, strings converted to integers act like char pointers,
       getting the value 0 or 1  depending  on  whether  the  string  consists
       solely of zeroes or not. Thus, the ((int)"2") evaluates to 1.

       The  language inherits the usual C problems such as dangling references
       and the confusion between '=' and '=='.

AUTHOR
       Emden R. Gansner <erg@research.att.com>

SEE ALSO
       awk(1), gc(1), dot(1), nop(1), libexpr(3), libagraph(3)



                                 24 April 2008                         GVPR(1)
Man Pages Copyright Respective Owners. Site Copyright (C) 1994 - 2012 Hurricane Electric. All Rights Reserved.