manpagez: man pages & more
info gawk
Home | html | info | man

gawk: Standard Regexp Constants

 
 6.1.2.1 Standard Regular Expression Constants
 .............................................
 
 When used on the righthand side of the '~' or '!~' operators, a regexp
 constant merely stands for the regexp that is to be matched.  However,
 regexp constants (such as '/foo/') may be used like simple expressions.
 When a regexp constant appears by itself, it has the same meaning as if
 it appeared in a pattern (i.e., '($0 ~ /foo/)').  (d.c.)  ⇒
 Expression Patterns.  This means that the following two code segments:
 
      if ($0 ~ /barfly/ || $0 ~ /camelot/)
          print "found"
 
 and:
 
      if (/barfly/ || /camelot/)
          print "found"
 
 are exactly equivalent.  One rather bizarre consequence of this rule is
 that the following Boolean expression is valid, but does not do what its
 author probably intended:
 
      # Note that /foo/ is on the left of the ~
      if (/foo/ ~ $1) print "found foo"
 
 This code is "obviously" testing '$1' for a match against the regexp
 '/foo/'.  But in fact, the expression '/foo/ ~ $1' really means '($0 ~
 /foo/) ~ $1'.  In other words, first match the input record against the
 regexp '/foo/'.  The result is either zero or one, depending upon the
 success or failure of the match.  That result is then matched against
 the first field in the record.  Because it is unlikely that you would
 ever really want to make this kind of test, 'gawk' issues a warning when
 it sees this construct in a program.  Another consequence of this rule
 is that the assignment statement:
 
      matches = /foo/
 
 assigns either zero or one to the variable 'matches', depending upon the
 contents of the current input record.
 
    Constant regular expressions are also used as the first argument for
 the 'gensub()', 'sub()', and 'gsub()' functions, as the second argument
 of the 'match()' function, and as the third argument of the 'split()'
 and 'patsplit()' functions (⇒String Functions).  Modern
 implementations of 'awk', including 'gawk', allow the third argument of
 'split()' to be a regexp constant, but some older implementations do
 not.  (d.c.)  Because some built-in functions accept regexp constants as
 arguments, confusion can arise when attempting to use regexp constants
 as arguments to user-defined functions (⇒User-defined).  For
 example:
 
      function mysub(pat, repl, str, global)
      {
          if (global)
              gsub(pat, repl, str)
          else
              sub(pat, repl, str)
          return str
      }
 
      {
          ...
          text = "hi! hi yourself!"
          mysub(/hi/, "howdy", text, 1)
          ...
      }
 
    In this example, the programmer wants to pass a regexp constant to
 the user-defined function 'mysub()', which in turn passes it on to
 either 'sub()' or 'gsub()'.  However, what really happens is that the
 'pat' parameter is assigned a value of either one or zero, depending
 upon whether or not '$0' matches '/hi/'.  'gawk' issues a warning when
 it sees a regexp constant used as a parameter to a user-defined
 function, because passing a truth value in this way is probably not what
 was intended.
 
© manpagez.com 2000-2018
Individual documents may contain additional copyright information.