re2c

RE2C(1)                                                                RE2C(1)

NAME
       re2c - convert regular expressions to C/C++ code

SYNOPSIS
       re2c [OPTIONS] FILE

DESCRIPTION
       re2c is a lexer generator for C/C++. It finds regular expression speci-
       fications inside of C/C++ comments and replaces them with a  hard-coded
       DFA.  The  user must supply some interface code in order to control and
       customize the generated DFA.

OPTIONS
       -? -h --help
              Invoke a short help.

       -b --bit-vectors
              Implies -s. Use bit vectors as well in the attempt to coax  bet-
              ter  code  out  of  the compiler. Most useful for specifications
              with more than a few keywords (e.g. for  most  programming  lan-
              guages).

       -c --conditions
              Used to support (f)lex-like condition support.

       -d --debug-output
              Creates  a parser that dumps information about the current posi-
              tion and in which state the parser is while parsing  the  input.
              This  is  useful  to  debug parser issues and states. If you use
              this switch you need to define a macro YYDEBUG  that  is  called
              like  a  function  with two parameters: void YYDEBUG (int state,
              char current).  The first parameter receives the state or -1 and
              the second parameter receives the input at the current cursor.

       -D --emit-dot
              Emit  Graphviz  dot data. It can then be processed with e.g. dot
              -Tpng input.dot > output.png. Please  note  that  scanners  with
              many states may crash dot.

       -e --ecb
              Generate  a  parser that supports EBCDIC. The generated code can
              deal with any character up to 0xFF. In this  mode  re2c  assumes
              that input character size is 1 byte. This switch is incompatible
              with -w, -x, -u and -8.

       -f --storable-state
              Generate a scanner with support for storable state.

       -F --flex-syntax
              Partial support for flex syntax. When this flag is  active  then
              named  definitions must be surrounded by curly braces and can be
              defined without an equal sign and the  terminating  semi  colon.
              Instead names are treated as direct double quoted strings.

       -g --computed-gotos
              Generate  a  scanner  that utilizes GCC's computed goto feature.
              That is re2c generates jump tables whenever a decision is  of  a
              certain  complexity  (e.g.  a lot of if conditions are otherwise
              necessary). This is only useable with GCC  and  produces  output
              that  cannot be compiled with any other compiler. Note that this
              implies -b and that the complexity threshold can  be  configured
              using the inplace configuration cgoto:threshold.

       -i --no-debug-info
              Do  not  output  #line information. This is useful when you want
              use a CMS tool with the re2c output which you might want if  you
              do  not require your users to have re2c themselves when building
              from your source.

       -o OUTPUT --output=OUTPUT
              Specify the OUTPUT file.

       -r --reusable
              Allows reuse of scanner definitions with  /*!use:re2c  */  after
              /*!rules:re2c  */.  In this mode no /*!re2c */ block and exactly
              one /*!rules:re2c */ must be present.  The rules are being saved
              and  used  by  every  /*!use:re2c  */ block that follows.  These
              blocks   can   contain   inplace   configurations,    especially
              re2c:flags:e,   re2c:flags:w,   re2c:flags:x,  re2c:flags:u  and
              re2c:flags:8.  That way it is possible to create the same  scan-
              ner  multiple  times  for  different  character types, different
              input  mechanisms   or   different   output   mechanisms.    The
              /*!use:re2c  */  blocks  can  also contain additional rules that
              will be appended to the set of rules in /*!rules:re2c */.

       -s --nested-ifs
              Generate nested ifs for some switches. Many compilers need  this
              assist to generate better code.

       -t HEADER --type-header=HEADER
              Create  a  HEADER  file  that contains types for the (f)lex-like
              condition support. This can only be activated when -c is in use.

       -u --unicode
              Generate a parser that supports UTF-32. The generated  code  can
              deal  with  any  valid Unicode character up to 0x10FFFF. In this
              mode re2c assumes that input character size  is  4  bytes.  This
              switch is incompatible with -e, -w, -x and -8. This implies -s.

       -v --version
              Show version information.

       -V --vernum
              Show the version as a number XXYYZZ.

       -w --wide-chars
              Generate  a  parser  that supports UCS-2. The generated code can
              deal with any valid Unicode character up  to  0xFFFF.   In  this
              mode  re2c  assumes  that  input character size is 2 bytes. This
              switch is incompatible with -e, -x, -u and -8. This implies -s.

       -x --utf-16
              Generate a parser that supports UTF-16. The generated  code  can
              deal  with  any  valid Unicode character up to 0x10FFFF. In this
              mode re2c assumes that input character size  is  2  bytes.  This
              switch is incompatible with -e, -w, -u and -8. This implies -s.

       -8 --utf-8
              Generate  a  parser  that supports UTF-8. The generated code can
              deal with any valid Unicode character up to  0x10FFFF.  In  this
              mode  re2c  assumes  that  input  character size is 1 byte. This
              switch is incompatible with -e, -w, -x and -u.

       --case-insensitive
              All strings are  case  insensitive,  so  all  "-expressions  are
              treated in the same way '-expressions are.

       --case-inverted
              Invert  the  meaning  of  single and double quoted strings. With
              this switch single quotes are case sensitive and  double  quotes
              are case insensitive.

       --no-generation-date
              Suppress date output in the generated file.

       --no-generation-date
              Suppress version output in the generated file.

       --encoding-policy POLICY
              Specify  how  re2c  must treat Unicode surrogates. POLICY can be
              one of the following: fail  (abort  with  error  when  surrogate
              encountered),  substitute  (silently  substitute  surrogate with
              error code point 0xFFFD), ignore  (treat  surrogates  as  normal
              code  points).  By default re2c ignores surrogates (for backward
              compatibility). Unicode standard says that standalone surrogates
              are  invalid  code  points, but different libraries and programs
              treat them differently.

       --input INPUT
              Specify re2c input API. INPUT  can  be  one  of  the  following:
              default, custom.

       -S --skeleton
              Instead of embedding re2c-generated code into C/C++ source, gen-
              erate a self-contained program for the same DFA. Most useful for
              correctness and performance testing.

       --empty-class POLICY
              What  to  do if user inputs empty character class. POLICY can be
              one of the following: match-empty  (match  empty  input:  pretty
              illogical,  but  this is the default for backwards compatibility
              reason), match-none (fail to match on any input), error  (compi-
              lation  error).  Note  that  there are various ways to construct
              empty class, e.g: [], [^\x00-\xFF], [\x00-\xFF][\x00-\xFF].

       --dfa-minimization <table | moore>
              Internal algorithm used by re2c to  minimize  DFA  (defaults  to
              moore).   Both  table filling and Moore's algorithms should pro-
              duce identical DFA (up to states  relabelling).   Table  filling
              algorithm  is  much simpler and slower; it serves as a reference
              implementation.

       -1 --single-pass
              Deprecated and does nothing (single pass is by default now).

       -W     Turn on all warnings.

       -Werror
              Turn warnings into errors. Note that this option  along  doesn't
              turn  on  any warnings, it only affects those warnings that have
              been turned on so far or will be turned on later.

       -W<warning>
              Turn on individual warning.

       -Wno-<warning>
              Turn off individual warning.

       -Werror-<warning>
              Turn on individual warning and treat it as error  (this  implies
              -W<warning>).

       -Wno-error-<warning>
              Don't  treat this particular warning as error. This doesn't turn
              off the warning itself.

       -Wcondition-order
              Warn if the generated program makes implicit  assumptions  about
              condition  numbering.  One  should  use either -t, --type-header
              option or /*!types:re2c*/ directive to generate mapping of  con-
              dition names to numbers and use autogenerated condition names.

       -Wempty-character-class
              Warn  if regular expression contains empty character class. From
              the rational point of view trying to match empty character class
              makes  no  sense:  it should always fail. However, for backwards
              compatibility reasons re2c  allows  empty  character  class  and
              treats  it  as  empty string. Use --empty-class option to change
              default behaviour.

       -Wmatch-empty-string
              Warn if regular expression in a rule is nullable (matches  empty
              string).  If DFA runs in a loop and empty match is unintentional
              (input position in not advanced manually), lexer may  get  stuck
              in eternal loop.

       -Wswapped-range
              Warn  if  range lower bound is greater that upper bound. Default
              re2c behaviour is to silently swap range bounds.

       -Wundefined-control-flow
              Warn if some input strings cause undefined control flow in lexer
              (the  faulty  patterns are reported). This is the most dangerous
              and common mistake. It can be easily  fixed  by  adding  default
              rule * (this rule has the lowest priority, matches any code unit
              and consumes exactly one code unit).

       -Wuseless-escape
              Warn if a symbol is escaped when it shouldn't  be.   By  default
              re2c  silently  ignores  escape, but this may as well indicate a
              typo or an error in escape sequence.

INTERFACE CODE
       The user must supply interface code either in the form  of  C/C++  code
       (macros, functions, variables, etc.) or in the form of INPLACE CONFIGU-
       RATIONS.  Which symbols must be defined and which are optional  depends
       on a particular use case.

       YYCONDTYPE
              In  -c  mode you can use -t to generate a file that contains the
              enumeration used as conditions. Each of the values refers  to  a
              condition of a rule set.

       YYCTXMARKER
              l-value  of  type  YYCTYPE *.  The generated code saves trailing
              context backtracking information in YYCTXMARKER. The  user  only
              needs  to  define  this  macro  if  a scanner specification uses
              trailing context in one or more of its regular expressions.

       YYCTYPE
              Type used to hold an input symbol (code unit). Usually  char  or
              unsigned  char  for  ASCII, EBCDIC and UTF-8, unsigned short for
              UTF-16 or UCS-2 and unsigned int for UTF-32.

       YYCURSOR
              l-value of type YYCTYPE * that points to the current input  sym-
              bol.  The  generated  code  advances  YYCURSOR  as  symbols  are
              matched. On entry, YYCURSOR is assumed to  point  to  the  first
              character  of the current token. On exit, YYCURSOR will point to
              the first character of the following token.

       YYDEBUG (state, current)
              This is only needed if the -d flag was specified. It allows  one
              to  easily  debug the generated parser by calling a user defined
              function for every state. The function should have the following
              signature:  void  YYDEBUG  (int  state, char current). The first
              parameter receives the state or  -1  and  the  second  parameter
              receives the input at the current cursor.

       YYFILL (n)
              The  generated  code  "calls""  YYFILL (n) when the buffer needs
              (re)filling: at least n additional  characters  should  be  pro-
              vided.  YYFILL (n) should adjust YYCURSOR, YYLIMIT, YYMARKER and
              YYCTXMARKER as needed. Note that for  typical  programming  lan-
              guages n will be the length of the longest keyword plus one. The
              user can place a comment of the  form  /*!max:re2c*/  to  insert
              YYMAXFILL definition that is set to the maximum length value.

       YYGETCONDITION ()
              This  define  is used to get the condition prior to entering the
              scanner code when using -c switch. The value must be initialized
              with a value from the enumeration YYCONDTYPE type.

       YYGETSTATE ()
              The  user  only  needs  to  define this macro if the -f flag was
              specified. In that case, the generated code  "calls"  YYGETSTATE
              ()  at  the very beginning of the scanner in order to obtain the
              saved state. YYGETSTATE () must return  a  signed  integer.  The
              value  must be either -1, indicating that the scanner is entered
              for the first time, or a value previously  saved  by  YYSETSTATE
              (s).  In  the  second  case,  the scanner will resume operations
              right after where the last YYFILL (n) was called.

       YYLIMIT
              Expression of type YYCTYPE * that marks the end  of  the  buffer
              YYLIMIT[-1]  is the last character in the buffer). The generated
              code repeatedly compares YYCURSOR to YYLIMIT to  determine  when
              the buffer needs (re)filling.

       YYMARKER
              l-value  of type YYCTYPE *.  The generated code saves backtrack-
              ing information in YYMARKER. Some easy scanners  might  not  use
              this.

       YYMAXFILL
              This  will  be  automatically defined by /*!max:re2c*/ blocks as
              explained above.

       YYSETCONDITION (c)
              This define is used to set the condition  in  transition  rules.
              This  is  only being used when -c is active and transition rules
              are being used.

       YYSETSTATE (s)
              The user only needs to define this macro  if  the  -f  flag  was
              specified.  In  that case, the generated code "calls" YYSETSTATE
              just before calling YYFILL (n). The parameter to YYSETSTATE is a
              signed integer that uniquely identifies the specific instance of
              YYFILL (n) that is about to be called. Should the user  wish  to
              save  the state of the scanner and have YYFILL (n) return to the
              caller, all he has to do is store that  unique  identifer  in  a
              variable.  Later,  when  the  scannered is called again, it will
              call YYGETSTATE () and resume execution right where it left off.
              The  generated  code will contain both YYSETSTATE (s) and YYGET-
              STATE even if YYFILL (n) is being disabled.

SYNTAX
       Code for re2c consists of a set of RULES, NAMED DEFINITIONS and INPLACE
       CONFIGURATIONS.

   RULES
       Rules  consist  of a regular expression (see REGULAR EXPRESSIONS) along
       with a block of C/C++ code that is to be executed when  the  associated
       regular  expression  is  matched. You can either start the code with an
       opening curly brace or the sequence :=. When  the  code  with  a  curly
       brace then re2c counts the brace depth and stops looking for code auto-
       matically. Otherwise curly braces are not allowed and re2c stops  look-
       ing  for code at the first line that does not begin with whitespace. If
       two or more rules overlap, the first rule is preferred.
          regular-expression { C/C++ code }

          regular-expression := C/C++ code

       There is one special rule: default rule *
          * { C/C++ code }

          * := C/C++ code

       Note that default rule * differs from [^]: default rule has the  lowest
       priority,  matches  any  code unit (either valid or invalid) and always
       consumes one character; while [^] matches any  valid  code  point  (not
       code  unit)  and  can  consume multiple code units. In fact, when vari-
       able-length encoding is used, * is  the  only  possible  way  to  match
       invalid input character (see ENCODINGS for details).

       If  -c  is active then each regular expression is preceded by a list of
       comma separated condition names. Besides normal naming rules there  are
       two special cases: <*> (such rules are merged to all conditions) and <>
       (such the rule cannot have an associated regular expression,  its  code
       is merged to all actions). Non empty rules may further more specify the
       new condition. In that case re2c will generate the  necessary  code  to
       change  the condition automatically. Rules can use :=> as a shortcut to
       automatically generate code that not only sets the new condition  state
       but also continues execution with the new state. A shortcut rule should
       not be used in a loop where there is code between the start of the loop
       and  the  re2c  block  unless re2c:cond:goto is changed to continue. If
       code is necessary before all rules (though not simple  jumps)  you  can
       doso by using <!> pseudo-rules.
          <condition-list> regular-expression { C/C++ code }

          <condition-list> regular-expression := C/C++ code

          <condition-list> * { C/C++ code }

          <condition-list> * := C/C++ code

          <condition-list> regular-expression => condition { C/C++ code }

          <condition-list> regular-expression => condition := C/C++ code

          <condition-list> * => condition { C/C++ code }

          <condition-list> * => condition := C/C++ code

          <condition-list> regular-expression :=> condition

          <*> regular-expression { C/C++ code }

          <*> regular-expression := C/C++ code

          <*> * { C/C++ code }

          <*> * := C/C++ code

          <*> regular-expression => condition { C/C++ code }

          <*> regular-expression => condition := C/C++ code

          <*> * => condition { C/C++ code }

          <*> * => condition := C/C++ code

          <*> regular-expression :=> condition

          <> { C/C++ code }

          <> := C/C++ code

          <> => condition { C/C++ code }

          <> => condition := C/C++ code

          <> :=> condition

          <> :=> condition

          <! condition-list> { C/C++ code }

          <! condition-list> := C/C++ code

          <!> { C/C++ code }

          <!> := C/C++ code

   NAMED DEFINITIONS
       Named definitions are of the form:
          name = regular-expression;

       If -F is active, then named definitions are also of the form:
          name { regular-expression }

   INPLACE CONFIGURATIONS
       re2c:condprefix = yyc;
              Allows one to specify the prefix used for condition labels. That
              is this text is prepended to any condition label in  the  gener-
              ated output file.

       re2c:condenumprefix = yyc;
              Allows one to specify the prefix used for condition values. That
              is this text is prepended to any condition  enum  value  in  the
              generated output file.

       re2c:cond:divider = /* *********************************** */ ;
              Allows  one  to  customize the devider for condition blocks. You
              can use @@ to put the name of the  condition  or  customize  the
              placeholder using re2c:cond:divider@cond.

       re2c:cond:divider@cond = @@;
              Specifies  the placeholder that will be replaced with the condi-
              tion name in re2c:cond:divider.

       re2c:cond:goto = goto @@; ;
              Allows one to customize the condition goto statements used  with
              :=> style rules. You can use @@ to put the name of the condition
              or ustomize the placeholder using re2c:cond:goto@cond.  You  can
              also change this to continue;, which would allow you to continue
              with the next loop cycle including any code between  loop  start
              and re2c block.

       re2c:cond:goto@cond = @@;
              Spcifies  the  placeholder that will be replaced with the condi-
              tion label in re2c:cond:goto.

       re2c:indent:top = 0;
              Specifies the minimum number of indentation to use.  Requires  a
              numeric value greater than or equal zero.

       re2c:indent:string = \t ;
              Specifies  the  string to use for indentation. Requires a string
              that should contain only whitespace unless  you  need  this  for
              external  tools. The easiest way to specify spaces is to enclude
              them in single or double quotes.  If you do not want any  inden-
              tation at all you can simply set this to "".

       re2c:yych:conversion = 0;
              When this setting is non zero, then re2c automatically generates
              conversion code whenever yych gets read. In this case  the  type
              must be defined using re2c:define:YYCTYPE.

       re2c:yych:emit = 1;
              Generation of yych can be suppressed by setting this to 0.

       re2c:yybm:hex = 0;
              If  set  to zero then a decimal table is being used else a hexa-
              decimal table will be generated.

       re2c:yyfill:enable = 1;
              Set this to zero to suppress  generation  of  YYFILL  (n).  When
              using this be sure to verify that the generated scanner does not
              read behind input. Allowing this behavior might introduce  sever
              security issues to you programs.

       re2c:yyfill:check = 1;
              This  can be set 0 to suppress output of the pre condition using
              YYCURSOR and YYLIMIT which becomes useful when YYLIMIT +  YYMAX-
              FILL is always accessible.

       re2c:define:YYFILL = YYFILL ;
              Substitution  for  YYFILL.  Note  that by default re2c generates
              argument in braces and semicolon after YYFILL. If  you  need  to
              make  YYFILL  an  arbitrary  statement  rather  than a call, set
              re2c:define:YYFILL:naked      to      non-zero      and      use
              re2c:define:YYFILL@len  to  denote  formal  parameter  inside of
              YYFILL body.

       re2c:define:YYFILL@len = @@ ;
              Any occurrence of this text inside of YYFILL  will  be  replaced
              with the actual argument.

       re2c:yyfill:parameter = 1;
              Controls  argument  in braces after YYFILL. If zero, agrument is
              omitted.   If   non-zero,   argument   is    generated    unless
              re2c:define:YYFILL:naked is set to non-zero.

       re2c:define:YYFILL:naked = 0;
              Controls argument in braces and semicolon after YYFILL. If zero,
              both agrument and semicolon are omitted. If  non-zero,  argument
              is  generated  unless  re2c:yyfill:parameter  is set to zero and
              semicolon is generated unconditionally.

       re2c:startlabel = 0;
              If set to a non zero integer then the start label  of  the  next
              scanner blocks will be generated even if not used by the scanner
              itself. Otherwise the normal yy0 like start label is only  being
              generated  if  needed.  If set to a text value then a label with
              that text will be generated regardless  of  whether  the  normal
              start label is being used or not. This setting is being reset to
              0 after a start label has been generated.

       re2c:labelprefix = yy ;
              Allows one to change the prefix of numbered labels. The  default
              is yy and can be set any string that is a valid label.

       re2c:state:abort = 0;
              When  not zero and switch -f is active then the YYGETSTATE block
              will contain a default case that aborts and a -1  case  is  used
              for initialization.

       re2c:state:nextlabel = 0;
              Used  when  -f is active to control whether the YYGETSTATE block
              is followed by a yyNext: label line.  Instead  of  using  yyNext
              you  can  usually  also  use configuration startlabel to force a
              specific start label or default to yy0 as start  label.  Instead
              of  using  a  dedicated label it is often better to separate the
              YYGETSTATE code from  the  actual  scanner  code  by  placing  a
              /*!getstate:re2c*/ comment.

       re2c:cgoto:threshold = 9;
              When  -g is active this value specifies the complexity threshold
              that triggers generation of jump tables rather than using nested
              if's and decision bitfields. The threshold is compared against a
              calculated estimation of if-s needed  where  every  used  bitmap
              divides the threshold by 2.

       re2c:yych:conversion = 0;
              When  the input uses signed characters and -s or -b switches are
              in effect re2c  allows  one  to  automatically  convert  to  the
              unsigned  character type that is then necessary for its internal
              single character. When this setting is zero or an  empty  string
              the  conversion is disabled. Using a non zero number the conver-
              sion is taken from YYCTYPE. If that is given by an inplace  con-
              figuration  that value is being used. Otherwise it will be (YYC-
              TYPE) and changes to that configuration are no longer  possible.
              When  this setting is a string the braces must be specified. Now
              assuming your input is a char * buffer and you are  using  above
              mentioned switches you can set YYCTYPE to unsigned char and this
              setting to either 1 or (unsigned char).

       re2c:define:YYCONDTYPE = YYCONDTYPE ;
              Enumeration used for condition support with -c mode.

       re2c:define:YYCTXMARKER = YYCTXMARKER ;
              Allows one to overwrite the define YYCTXMARKER and thus avoiding
              it by setting the value to the actual code needed.

       re2c:define:YYCTYPE = YYCTYPE ;
              Allows  one to overwrite the define YYCTYPE and thus avoiding it
              by setting the value to the actual code needed.

       re2c:define:YYCURSOR = YYCURSOR ;
              Allows one to overwrite the define YYCURSOR and thus avoiding it
              by setting the value to the actual code needed.

       re2c:define:YYDEBUG = YYDEBUG ;
              Allows  one to overwrite the define YYDEBUG and thus avoiding it
              by setting the value to the actual code needed.

       re2c:define:YYGETCONDITION = YYGETCONDITION ;
              Substitution for YYGETCONDITION. Note that by default re2c  gen-
              erates  braces after YYGETCONDITION. Set re2c:define:YYGETCONDI-
              TION:naked to non-zero to omit braces.

       re2c:define:YYGETCONDITION:naked = 0;
              Controls braces after YYGETCONDITION. If zero, braces are  omit-
              ted. If non-zero, braces are generated.

       re2c:define:YYSETCONDITION = YYSETCONDITION ;
              Substitution  for YYSETCONDITION. Note that by default re2c gen-
              erates argument in braces and semicolon after YYSETCONDITION. If
              you  need  to  make YYSETCONDITION an arbitrary statement rather
              than a call, set  re2c:define:YYSETCONDITION:naked  to  non-zero
              and use re2c:define:YYSETCONDITION@cond to denote formal parame-
              ter inside of YYSETCONDITION body.

       re2c:define:YYSETCONDITION@cond = @@ ;
              Any occurrence of this text inside  of  YYSETCONDITION  will  be
              replaced with the actual argument.

       re2c:define:YYSETCONDITION:naked = 0;
              Controls  argument in braces and semicolon after YYSETCONDITION.
              If zero, both agrument and semicolon are omitted.  If  non-zero,
              both argument and semicolon are generated.

       re2c:define:YYGETSTATE = YYGETSTATE ;
              Substitution for YYGETSTATE. Note that by default re2c generates
              braces after  YYGETSTATE.  Set  re2c:define:YYGETSTATE:naked  to
              non-zero to omit braces.

       re2c:define:YYGETSTATE:naked = 0;
              Controls  braces  after YYGETSTATE. If zero, braces are omitted.
              If non-zero, braces are generated.

       re2c:define:YYSETSTATE = YYSETSTATE ;
              Substitution for YYSETSTATE. Note that by default re2c generates
              argument  in  braces and semicolon after YYSETSTATE. If you need
              to make YYSETSTATE an arbitrary statement rather  than  a  call,
              set    re2c:define:YYSETSTATE:naked    to   non-zero   and   use
              re2c:define:YYSETSTATE@cond to denote formal parameter inside of
              YYSETSTATE body.

       re2c:define:YYSETSTATE@state = @@ ;
              Any  occurrence  of  this  text  inside  of  YYSETSTATE  will be
              replaced with the actual argument.

       re2c:define:YYSETSTATE:naked = 0;
              Controls argument in braces and semicolon after  YYSETSTATE.  If
              zero, both agrument and semicolon are omitted. If non-zero, both
              argument and semicolon are generated.

       re2c:define:YYLIMIT = YYLIMIT ;
              Allows one to overwrite the define YYLIMIT and thus avoiding  it
              by setting the value to the actual code needed.

       re2c:define:YYMARKER = YYMARKER ;
              Allows one to overwrite the define YYMARKER and thus avoiding it
              by setting the value to the actual code needed.

       re2c:label:yyFillLabel = yyFillLabel ;
              Allows one to overwrite the name of the label yyFillLabel.

       re2c:label:yyNext = yyNext ;
              Allows one to overwrite the name of the label yyNext.

       re2c:variable:yyaccept = yyaccept;
              Allows one to overwrite the name of the variable yyaccept.

       re2c:variable:yybm = yybm ;
              Allows one to overwrite the name of the variable yybm.

       re2c:variable:yych = yych ;
              Allows one to overwrite the name of the variable yych.

       re2c:variable:yyctable = yyctable ;
              When both -c and -g are active then re2c uses this  variable  to
              generate a static jump table for YYGETCONDITION.

       re2c:variable:yystable = yystable ;
              Deprecated.

       re2c:variable:yytarget = yytarget ;
              Allows one to overwrite the name of the variable yytarget.

   REGULAR EXPRESSIONS
       "foo"  literal string "foo". ANSI-C escape sequences can be used.

       'foo'  literal  string "foo" (characters [a-zA-Z] treated case-insensi-
              tive). ANSI-C escape sequences can be used.

       [xyz]  character class; in this case, regular expression matches either
              x, y, or z.

       [abj-oZ]
              character  class  with  a  range in it; matches a, b, any letter
              from j through o or Z.

       [^class]
              inverted character class.

       r \ s  match any r which isn't s. r and s must be  regular  expressions
              which can be expressed as character classes.

       r*     zero or more occurrences of r.

       r+     one or more occurrences of r.

       r?     optional r.

       (r)    r; parentheses are used to override precedence.

       r s    r followed by s (concatenation).

       r | s  either r or s (alternative).

       r / s  r  but  only  if it is followed by s. Note that s is not part of
              the matched text. This type  of  regular  expression  is  called
              "trailing  context".  Trailing  context can only be the end of a
              rule and not part of a named definition.

       r{n}   matches r exactly n times.

       r{n,}  matches r at least n times.

       r{n,m} matches r at least n times, but not more than m times.

       .      match any character except newline.

       name   matches named definition as specified by name only if -F is off.
              If -F is active then this behaves like it was enclosed in double
              quotes and matches the string "name".

       Character classes and string literals may contain octal or  hexadecimal
       character  definitions  and  the following set of escape sequences: \a,
       \b, \f, \n, \r, \t, \v, \\. An octal character is defined  by  a  back-
       slash  followed  by  its  three  octal digits (e.g. \377).  Hexadecimal
       characters from 0 to 0xFF are defined by backslash, a lower cased x and
       two  hexadecimal  digits (e.g. \x12). Hexadecimal characters from 0x100
       to 0xFFFF are defined by backslash, a lower cased \u or an upper  cased
       \X  and  four hexadecimal digits (e.g. \u1234).  Hexadecimal characters
       from 0x10000 to 0xFFFFffff are defined by backslash, an upper cased  \U
       and eight hexadecimal digits (e.g. \U12345678).

       The only portable "any" rule is the default rule *.

SCANNER WITH STORABLE STATES
       When  the -f flag is specified, re2c generates a scanner that can store
       its current state, return to the caller, and  later  resume  operations
       exactly where it left off.

       The default operation of re2c is a "pull" model, where the scanner asks
       for extra input whenever it needs it. However, this mode  of  operation
       assumes  that the scanner is the "owner" the parsing loop, and that may
       not always be convenient.

       Typically, if there is a preprocessor  ahead  of  the  scanner  in  the
       stream,  or  for  that  matter any other procedural source of data, the
       scanner cannot "ask" for more data unless both scanner and source  live
       in a separate threads.

       The  -f  flag  is  useful for just this situation: it lets users design
       scanners that work in a "push" model, i.e. where data  is  fed  to  the
       scanner  chunk  by chunk. When the scanner runs out of data to consume,
       it just stores its state, and return to the  caller.  When  more  input
       data is fed to the scanner, it resumes operations exactly where it left
       off.

       Changes needed compared to the "pull" model:

       o User has to supply macros YYSETSTATE () and YYGETSTATE (state).

       o The -f option inhibits declaration of yych and yyaccept. So the  user
         has  to  declare  these. Also the user has to save and restore these.
         In the example  examples/push_model/push.re  these  are  declared  as
         fields  of  the (C++) class of which the scanner is a method, so they
         do not need to be saved/restored explicitly. For C they could e.g. be
         made  macros that select fields from a structure passed in as parame-
         ter.  Alternatively, they could be declared as local variables, saved
         with  YYFILL  (n)  when it decides to return and restored at entry to
         the function. Also, it could be more efficient to save the state from
         YYFILL  (n)  because  YYSETSTATE  (state)  is called unconditionally.
         YYFILL (n) however does not get state as parameter, so we would  have
         to store state in a local variable by YYSETSTATE (state).

       o Modify  YYFILL  (n)  to return (from the function calling it) if more
         input is needed.

       o Modify caller to recognise if more input is needed and respond appro-
         priately.

       o The  generated  code  will  contain  a  switch  block that is used to
         restores the last state by jumping behind the corrspoding YYFILL  (n)
         call. This code is automatically generated in the epilog of the first
         /*!re2c */ block. It is possible to trigger generation of the  YYGET-
         STATE  () block earlier by placing a /*!getstate:re2c*/ comment. This
         is especially useful when the scanner code should be wrapped inside a
         loop.

       Please  see  examples/push_model/push.re  for "push" model scanner. The
       generated code can be tweaked using inplace configurations  state:abort
       and state:nextlabel.

SCANNER WITH CONDITION SUPPORT
       You can preceed regular expressions with a list of condition names when
       using the -c switch. In this case re2c  generates  scanner  blocks  for
       each conditon. Where each of the generated blocks has its own precondi-
       tion. The precondition is given by the interface define YYGETCONDITON()
       and must be of type YYCONDTYPE.

       There are two special rule types. First, the rules of the condition <*>
       are merged to all conditions (note that they have lower  priority  than
       other  rules  of  that  condition). And second the empty condition list
       allows one to provide a code block that does not have a  scanner  part.
       Meaning  it  does not allow any regular expression. The condition value
       referring to this special block is always the one with the  enumeration
       value 0. This way the code of this special rule can be used to initial-
       ize a scanner. It is in no way necessary to have these rules: but some-
       times it is helpful to have a dedicated uninitialized condition state.

       Non  empty  rules  allow  one to specify the new condition, which makes
       them transition rules. Besides generating calls for the  define  YYSET-
       CONDTITION no other special code is generated.

       There  is  another kind of special rules that allow one to prepend code
       to any code block of all rules of a certain set of conditions or to all
       code  blocks  to  all rules. This can be helpful when some operation is
       common among rules. For instance this can be used to store  the  length
       of the scanned string. These special setup rules start with an exclama-
       tion mark followed by either a list of conditions <! condition,  ...  >
       or  a  star  <!*>.  When re2c generates the code for a rule whose state
       does not have a setup rule and a star'd setup  rule  is  present,  than
       that code will be used as setup code.

ENCODINGS
       re2c  supports  the  following encodings: ASCII (default), EBCDIC (-e),
       UCS-2 (-w), UTF-16 (-x), UTF-32 (-u) and UTF-8 (-8).  See also  inplace
       configuration re2c:flags.

       The following concepts should be clarified when talking about encoding.
       Code point is an abstract number, which represents single encoding sym-
       bol.  Code  unit  is  the smallest unit of memory, which is used in the
       encoded text (it corresponds to one character in the input stream). One
       or  more  code  units  can  be needed to represent a single code point,
       depending on the encoding. In fixed-length encoding, each code point is
       represented  with equal number of code units. In variable-length encod-
       ing, different code points can be represented with different number  of
       code units.

       ASCII  is  a  fixed-length encoding. Its code space includes 0x100 code
              points, from 0 to 0xFF.  One  code  point  is  represented  with
              exactly  one  1-byte  code unit, which has the same value as the
              code point. Size of YYCTYPE must be 1 byte.

       EBCDIC is a fixed-length encoding. Its code space includes  0x100  code
              points,  from  0  to  0xFF.  One  code point is represented with
              exactly one 1-byte code unit, which has the same  value  as  the
              code point. Size of YYCTYPE must be 1 byte.

       UCS-2  is a fixed-length encoding. Its code space includes 0x10000 code
              points, from 0 to 0xFFFF. One code  point  is  represented  with
              exactly  one  2-byte  code unit, which has the same value as the
              code point. Size of YYCTYPE must be 2 bytes.

       UTF-16 is a variable-length encoding. Its code space includes all  Uni-
              code  code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF.
              One code point is represented with one or two 2-byte code units.
              Size of YYCTYPE must be 2 bytes.

       UTF-32 is  a fixed-length encoding. Its code space includes all Unicode
              code points, from 0 to 0xD7FF and from 0xE000 to  0x10FFFF.  One
              code  point  is  represented  with exactly one 4-byte code unit.
              Size of YYCTYPE must be 4 bytes.

       UTF-8  is a variable-length encoding. Its code space includes all  Uni-
              code  code points, from 0 to 0xD7FF and from 0xE000 to 0x10FFFF.
              One code point is represented with sequence of one,  two,  three
              or four 1-byte code units. Size of YYCTYPE must be 1 byte.

       In  Unicode,  values  from  range 0xD800 to 0xDFFF (surrogates) are not
       valid Unicode code points, any encoded sequence  of  code  units,  that
       would  map  to  Unicode  code  points  in  the  range 0xD800-0xDFFF, is
       ill-formed. The user  can  control  how  re2c  treats  such  ill-formed
       sequences  with  --encoding-policy  <policy> flag (see OPTIONS for full
       explanation).

       For some encodings, there are code units, that  never  occur  in  valid
       encoded stream (e.g. 0xFF byte in UTF-8). If the generated scanner must
       check for invalid input, the only true way to do so is to  use  default
       rule  *.  Note, that full range rule [^] won't catch invalid code units
       when variable-length encoding  is  used  ([^]  means  "all  valid  code
       points", while default rule * means "all possible code units").

GENERIC INPUT API
       re2c  usually operates on input using pointer-like primitives YYCURSOR,
       YYMARKER, YYCTXMARKER and YYLIMIT.

       Generic input API (enabled with --input custom switch)  allows  one  to
       customize  input operations. In this mode, re2c will express all opera-
       tions on input in terms of the following primitives:

                     +----------------+----------------------------+
                     |YYPEEK ()       | get current input  charac- |
                     |                | ter                        |
                     +----------------+----------------------------+
                     |YYSKIP ()       | advance  to the next char- |
                     |                | acter                      |
                     +----------------+----------------------------+
                     |YYBACKUP ()     | backup current input posi- |
                     |                | tion                       |
                     +----------------+----------------------------+
                     |YYBACKUPCTX ()  | backup current input posi- |
                     |                | tion for trailing context  |
                     +----------------+----------------------------+
                     |YYRESTORE ()    | restore   current    input |
                     |                | position                   |
                     +----------------+----------------------------+
                     |YYRESTORECTX () | restore    current   input |
                     |                | position for trailing con- |
                     |                | text                       |
                     +----------------+----------------------------+
                     |YYLESSTHAN (n)  | check if less than n input |
                     |                | characters are left        |
                     +----------------+----------------------------+

       A couple of useful links that provide some examples:

       1. http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-13-input_model.html

       2. http://skvadrik.github.io/aleph_null/posts/re2c/2015-01-15-input_model_custom.html

SEE ALSO
       You  can  find  more   information   about   re2c   on   the   website:
       http://re2c.org.      See    also:    flex(1),    lex(1),    quex    (-
       http://quex.sourceforge.net).

AUTHORS
       Peter Bumbulis   peter@csg.uwaterloo.ca

       Brian Young      bayoung@acm.org

       Dan Nuffer       nuffer@users.sourceforge.net

       Marcus Boerger   helly@users.sourceforge.net

       Hartmut Kaiser   hkaiser@users.sourceforge.net

       Emmanuel Mogenet mgix@mgix.com

       Ulya Trofimovich skvadrik@gmail.com

VERSION INFORMATION
       This manpage describes re2c version 0.16, package date 21 Jan 2016.

                                                                       RE2C(1)
Man Pages Copyright Respective Owners. Site Copyright (C) 1994 - 2021 Hurricane Electric. All Rights Reserved.