manpagez: man (manual) pages & more
man RegExp(3)
Home | html | info | man  
Tcl_RegExpMatch(3)          Tcl Library Procedures          Tcl_RegExpMatch(3)




NAME

       Tcl_RegExpMatch,  Tcl_RegExpCompile,  Tcl_RegExpExec,  Tcl_RegExpRange,
       Tcl_GetRegExpFromObj, Tcl_RegExpMatchObj,  Tcl_RegExpExecObj,  Tcl_Reg-
       ExpGetInfo - Pattern matching with regular expressions


SYNOPSIS

       #include <tcl.h>

       int
       Tcl_RegExpMatchObj(interp, textObj, patObj)

       int
       Tcl_RegExpMatch(interp, text, pattern)

       Tcl_RegExp
       Tcl_RegExpCompile(interp, pattern)

       int
       Tcl_RegExpExec(interp, regexp, text, start)

       void
       Tcl_RegExpRange(regexp, index, startPtr, endPtr)

       Tcl_RegExp
       Tcl_GetRegExpFromObj(interp, patObj, cflags)

       int
       Tcl_RegExpExecObj(interp, regexp, textObj, offset, nmatches, eflags)

       void
       Tcl_RegExpGetInfo(regexp, infoPtr)


ARGUMENTS

       Tcl  interpreter  to  use  for error reporting.  The interpreter may be
       NULL if no error reporting is desired.  Refers to the object from which
       to  get  the text to search.  The internal representation of the object
       may be converted to a form that can be efficiently searched.  Refers to
       the object from which to get a regular expression. The compiled regular
       expression is cached in the object.  Text to search for a match with  a
       regular  expression.   String  in the form of a regular expression pat-
       tern.  Compiled regular expression.  Must have been returned previously
       by  Tcl_GetRegExpFromObj  or Tcl_RegExpCompile.  If text is just a por-
       tion of some other string, this argument identifies  the  beginning  of
       the larger string.  If it is not the same as text, then no matches will
       be allowed.  Specifies which range is desired:  0 means  the  range  of
       the entire match, 1 or greater means the range that matched a parenthe-
       sized sub-expression.  The address of the first character in the  range
       is  stored here, or NULL if there is no such range.  The address of the
       character just after the last one in the range is stored here, or  NULL
       if  there is no such range.  OR-ed combination of the compilation flags
       TCL_REG_ADVANCED,  TCL_REG_EXTENDED,  TCL_REG_BASIC,  TCL_REG_EXPANDED,
       TCL_REG_QUOTE,    TCL_REG_NOCASE,    TCL_REG_NEWLINE,   TCL_REG_NLSTOP,
       TCL_REG_NLANCH, TCL_REG_NOSUB, and TCL_REG_CANMATCH. See below for more
       information.   The character offset into the text where matching should
       begin.  The value of the offset has  no  impact  on  ^  matches.   This
       behavior  is  controlled  by eflags.  The number of matching subexpres-
       sions that should be remembered for later use.  If  this  value  is  0,
       then no subexpression match information will be computed.  If the value
       is -1, then all of the matching subexpressions will be remembered.  Any
       other  value  will  be taken as the maximum number of subexpressions to
       remember.  OR-ed combination of the execution flags TCL_REG_NOTBOL  and
       TCL_REG_NOTEOL.  See  below  for  more information.  The address of the
       location where information about a previous match should be  stored  by
       Tcl_RegExpGetInfo.


DESCRIPTION

       Tcl_RegExpMatch determines whether its pattern argument matches regexp,
       where regexp is interpreted as a regular expression using the rules  in
       the re_syntax reference page.  If there is a match then Tcl_RegExpMatch
       returns 1.  If there is no match then Tcl_RegExpMatch returns 0.  If an
       error occurs in the matching process (e.g. pattern is not a valid regu-
       lar expression) then Tcl_RegExpMatch returns -1  and  leaves  an  error
       message  in  the  interpreter result.  Tcl_RegExpMatchObj is similar to
       Tcl_RegExpMatch except it operates  on  the  Tcl  objects  textObj  and
       patObj  instead  of  UTF strings.  Tcl_RegExpMatchObj is generally more
       efficient than Tcl_RegExpMatch, so it is the preferred interface.

       Tcl_RegExpCompile, Tcl_RegExpExec, and Tcl_RegExpRange  provide  lower-
       level access to the regular expression pattern matcher.  Tcl_RegExpCom-
       pile compiles a regular expression string into the internal  form  used
       for  efficient  pattern matching.  The return value is a token for this
       compiled form, which can be used in subsequent calls to  Tcl_RegExpExec
       or  Tcl_RegExpRange.   If  an  error occurs while compiling the regular
       expression then Tcl_RegExpCompile returns NULL and leaves an error mes-
       sage  in the interpreter result.  Note:  the return value from Tcl_Reg-
       ExpCompile is only valid up to the next call to Tcl_RegExpCompile;   it
       is not safe to retain these values for long periods of time.

       Tcl_RegExpExec  executes  the  regular  expression pattern matcher.  It
       returns 1 if text contains a range of characters that match  regexp,  0
       if  no  match  is  found, and -1 if an error occurs.  In the case of an
       error, Tcl_RegExpExec  leaves  an  error  message  in  the  interpreter
       result.   When searching a string for multiple matches of a pattern, it
       is important to distinguish between the start of  the  original  string
       and  the  start of the current search.  For example, when searching for
       the second occurrence of a match, the text argument might point to  the
       character just after the first match;  however, it is important for the
       pattern matcher to know that this  is  not  the  start  of  the  entire
       string,  so  that it does not allow atoms in the pattern to match.  The
       start argument provides this information by pointing to  the  start  of
       the  overall  string containing text.  Start will be less than or equal
       to text;  if it is less than text then no ^ matches will be allowed.

       Tcl_RegExpRange may be invoked after Tcl_RegExpExec returns;   it  pro-
       vides detailed information about what ranges of the string matched what
       parts of the pattern.  Tcl_RegExpRange returns a pair  of  pointers  in
       *startPtr and *endPtr that identify a range of characters in the source
       string for the most recent call  to  Tcl_RegExpExec.   Index  indicates
       which  of  several  ranges  is  desired:  if index is 0, information is
       returned about the overall range of characters that matched the  entire
       pattern;  otherwise, information is returned about the range of charac-
       ters that matched the index'th parenthesized subexpression  within  the
       pattern.   If  there  is  no  range corresponding to index then NULL is
       stored in *startPtr and *endPtr.

       Tcl_GetRegExpFromObj,  Tcl_RegExpExecObj,  and  Tcl_RegExpGetInfo   are
       object  interfaces  that  provide  the  most  direct  control  of Henry
       Spencer's regular expression library.  For users that  need  to  modify
       compilation  and execution options directly, it is recommended that you
       use these interfaces instead of calling the internal regexp  functions.
       These  interfaces  handle the details of UTF to Unicode translations as
       well as providing improved performance through caching in  the  pattern
       and string objects.

       Tcl_GetRegExpFromObj  attempts  to return a compiled regular expression
       from the patObj.  If the object does not  already  contain  a  compiled
       regular expression it will attempt to create one from the string in the
       object and assign it to the internal representation of the patObj.  The
       return  value of this function is of type Tcl_RegExp.  The return value
       is a token for this compiled form, which  can  be  used  in  subsequent
       calls  to  Tcl_RegExpExecObj  or Tcl_RegExpGetInfo.  If an error occurs
       while  compiling  the  regular  expression  then   Tcl_GetRegExpFromObj
       returns  NULL  and  leaves  an error message in the interpreter result.
       The regular expression token can be used as long as the internal repre-
       sentation  of  patObj refers to the compiled form.  The cflags argument
       is a bit-wise OR of zero or more of the following  flags  that  control
       the compilation of patObj:

         TCL_REG_ADVANCED
                Compile  advanced regular expressions This mode corresponds to
                the normal regular expression syntax accepted by the Tcl  reg-
                exp and regsub commands.

         TCL_REG_EXTENDED
                Compile  extended regular expressions This mode corresponds to
                the regular expression syntax recognized by Tcl 8.0  and  ear-
                lier versions.

         TCL_REG_BASIC
                Compile basic regular expressions This mode corresponds to the
                regular expression syntax recognized by common Unix  utilities
                like sed and grep.  This is the default if no flags are speci-
                fied.

         TCL_REG_EXPANDED
                Compile the regular expression (basic, extended, or  advanced)
                using  an expanded syntax that allows comments and whitespace.
                This mode causes non-backslashed non-bracket-expression  white
                space and #-to-end-of-line comments to be ignored.

         TCL_REG_QUOTE
                Compile a literal string, with all characters treated as ordi-
                nary characters.

         TCL_REG_NOCASE
                Compile for matching that ignores  upper/lower  case  distinc-
                tions.

         TCL_REG_NEWLINE
                Compile  for  newline-sensitive matching.  By default, newline
                is a completely ordinary character with no special meaning  in
                either  regular  expressions  or  strings.   With  this  flag,
                bracket expressions and never match newline, matches an  empty
                string  after  any newline in addition to its normal function,
                and matches an empty string before any newline in addition  to
                its  normal  function.   REG_NEWLINE  is  the  bit-wise  OR of
                REG_NLSTOP and REG_NLANCH.

         TCL_REG_NLSTOP
                Compile  for  partial  newline-sensitive  matching,  with  the
                behavior  of  bracket  expressions  and  affected, but not the
                behavior of and In this mode, bracket  expressions  and  never
                match newline.

         TCL_REG_NLANCH
                Compile  for  inverse partial newline-sensitive matching, with
                the behavior of and (the affected, but  not  the  behavior  of
                bracket  expressions  and In this mode matches an empty string
                after any newline in addition  to  its  normal  function,  and
                matches  an empty string before any newline in addition to its
                normal function.

         TCL_REG_NOSUB
                Compile for matching that reports only success or failure, not
                what  was  matched.   This  reduces  compile  overhead and may
                improve performance.  Subsequent calls to Tcl_RegExpGetInfo or
                Tcl_RegExpRange will not report any match information.

         TCL_REG_CANMATCH
                Compile  for matching that reports the potential to complete a
                partial match given more text (see below).

       Only one  of  TCL_REG_EXTENDED,  TCL_REG_ADVANCED,  TCL_REG_BASIC,  and
       TCL_REG_QUOTE may be specified.

       Tcl_RegExpExecObj  executes the regular expression pattern matcher.  It
       returns 1 if objPtr contains a range of characters that match regexp, 0
       if  no  match  is  found, and -1 if an error occurs.  In the case of an
       error, Tcl_RegExpExecObj leaves an error  message  in  the  interpreter
       result.   The  nmatches  value indicates to the matcher how many subex-
       pressions are of interest.  If nmatches is  0,  then  no  subexpression
       match information is recorded, which may allow the matcher to make var-
       ious optimizations.  If the value is -1, then all of the subexpressions
       in  the  pattern  are  remembered.  If the value is a positive integer,
       then only that number of subexpressions will be  remembered.   Matching
       begins  at  the  specified  Unicode  character  index  given by offset.
       Unlike Tcl_RegExpExec, the behavior of anchors is not affected  by  the
       offset  value.   Instead the behavior of the anchors is explicitly con-
       trolled by the eflags argument, which is a bit-wise OR of zero or  more
       of the following flags:

         TCL_REG_NOTBOL
                The starting character will not be treated as the beginning of
                a line or the beginning of  the  string,  so  will  not  match
                there.  Note that this flag has no effect on how matches.

         TCL_REG_NOTEOL
                The  last  character  in the string will not be treated as the
                end of a line or the end of the  string,  so  will  not  match
                there.  Note that this flag has no effect on how matches.

       Tcl_RegExpGetInfo  retrieves information about the last match performed
       with a given regular expression regexp.  The infoPtr argument  contains
       a pointer to a structure that is defined as follows:

       typedef struct Tcl_RegExpInfo {
               int nsubs;
               Tcl_RegExpIndices *matches;
               long extendStart; } Tcl_RegExpInfo;

       The  nsubs field contains a count of the number of parenthesized subex-
       pressions within the regular  expression.   If  the  TCL_REG_NOSUB  was
       used,  then  this  value  will be zero.  The matches field points to an
       array of nsubs values that indicate the bounds  of  each  subexpression
       matched.  The first element in the array refers to the range matched by
       the entire regular expression, and subsequent  elements  refer  to  the
       parenthesized  subexpressions in the order that they appear in the pat-
       tern.  Each element is a structure that is defined as follows:

       typedef struct Tcl_RegExpIndices {
               long start;
               long end; } Tcl_RegExpIndices;

       The start and end values are Unicode character indices relative to  the
       offset  location  within  objPtr where matching began.  The start index
       identifies the first character of the matched subexpression.   The  end
       index  identifies  the first character after the matched subexpression.
       If the subexpression matched the empty string, then start and end  will
       be  equal.  If the subexpression did not participate in the match, then
       start and end will be set to -1.

       The extendStart field in Tcl_RegExpInfo is only set if the TCL_REG_CAN-
       MATCH  flag  was  used.  It indicates the first character in the string
       where a match could occur.  If a match was found, this will be the same
       as  the beginning of the current match.  If no match was found, then it
       indicates the earliest point at which a match might occur if additional
       text  is  appended  to  the string.  If it is no match is possible even
       with further text, this field will be set to -1.


SEE ALSO

       re_syntax(n)


KEYWORDS

       match, pattern, regular expression, string,  subexpression,  Tcl_RegEx-
       pIndices, Tcl_RegExpInfo



Tcl                                   8.1                   Tcl_RegExpMatch(3)

RegExp 8.5.4 - Generated Wed Aug 20 18:47:20 CDT 2008