manpagez: man pages & more
info gawk
Home | html | info | man

gawk: POSIX String Comparison

 
 6.3.2.3 String Comparison Based on Locale Collating Order
 .........................................................
 
 The POSIX standard used to say that all string comparisons are performed
 based on the locale's "collating order".  This is the order in which
 characters sort, as defined by the locale (for more discussion, ⇒
 Locales).  This order is usually very different from the results
 obtained when doing straight byte-by-byte comparison.(1)
 
    Because this behavior differs considerably from existing practice,
 'gawk' only implemented it when in POSIX mode (⇒Options).  Here
 is an example to illustrate the difference, in an 'en_US.UTF-8' locale:
 
      $ gawk 'BEGIN { printf("ABC < abc = %s\n",
      >                     ("ABC" < "abc" ? "TRUE" : "FALSE")) }'
      -| ABC < abc = TRUE
      $ gawk --posix 'BEGIN { printf("ABC < abc = %s\n",
      >                             ("ABC" < "abc" ? "TRUE" : "FALSE")) }'
      -| ABC < abc = FALSE
 
    Fortunately, as of August 2016, comparison based on locale collating
 order is no longer required for the '==' and '!=' operators.(2)
 However, comparison based on locales is still required for '<', '<=',
 '>', and '>='.  POSIX thus recommends as follows:
 
      Since the '==' operator checks whether strings are identical, not
      whether they collate equally, applications needing to check whether
      strings collate equally can use:
 
           a <= b && a >= b
 
    As of version 4.2, 'gawk' continues to use locale collating order for
 '<', '<=', '>', and '>=' only in POSIX mode.
 
    ---------- Footnotes ----------
 
    (1) Technically, string comparison is supposed to behave the same way
 as if the strings were compared with the C 'strcoll()' function.
 
    (2) See the Austin Group website
 (http://austingroupbugs.net/view.php?id=1070).
 
© manpagez.com 2000-2018
Individual documents may contain additional copyright information.