manpagez: man pages & more
info gawk
Home | html | info | man

gawk: Locale influences conversions

 
 6.1.4.2 Locales Can Influence Conversion
 ........................................
 
 Where you are can matter when it comes to converting between numbers and
 strings.  The local character set and language--the "locale"--can affect
 numeric formats.  In particular, for 'awk' programs, it affects the
 decimal point character and the thousands-separator character.  The
 '"C"' locale, and most English-language locales, use the period
 character ('.') as the decimal point and don't have a thousands
 separator.  However, many (if not most) European and non-English locales
 use the comma (',') as the decimal point character.  European locales
 often use either a space or a period as the thousands separator, if they
 have one.
 
    The POSIX standard says that 'awk' always uses the period as the
 decimal point when reading the 'awk' program source code, and for
 command-line variable assignments (⇒Other Arguments).  However,
 when interpreting input data, for 'print' and 'printf' output, and for
 number-to-string conversion, the local decimal point character is used.
 (d.c.)  In all cases, numbers in source code and in input data cannot
 have a thousands separator.  Here are some examples indicating the
 difference in behavior, on a GNU/Linux system:
 
      $ export POSIXLY_CORRECT=1                        Force POSIX behavior
      $ gawk 'BEGIN { printf "%g\n", 3.1415927 }'
      -| 3.14159
      $ LC_ALL=en_DK.utf-8 gawk 'BEGIN { printf "%g\n", 3.1415927 }'
      -| 3,14159
      $ echo 4,321 | gawk '{ print $1 + 1 }'
      -| 5
      $ echo 4,321 | LC_ALL=en_DK.utf-8 gawk '{ print $1 + 1 }'
      -| 5,321
 
 The 'en_DK.utf-8' locale is for English in Denmark, where the comma acts
 as the decimal point separator.  In the normal '"C"' locale, 'gawk'
 treats '4,321' as 4, while in the Danish locale, it's treated as the
 full number including the fractional part, 4.321.
 
    Some earlier versions of 'gawk' fully complied with this aspect of
 the standard.  However, many users in non-English locales complained
 about this behavior, because their data used a period as the decimal
 point, so the default behavior was restored to use a period as the
 decimal point character.  You can use the '--use-lc-numeric' option
 (⇒Options) to force 'gawk' to use the locale's decimal point
 character.  ('gawk' also uses the locale's decimal point character when
 in POSIX mode, either via '--posix' or the 'POSIXLY_CORRECT' environment
 variable, as shown previously.)
 
    ⇒Table 6.1 table-locale-affects. describes the cases in which
 the locale's decimal point character is used and when a period is used.
 Some of these features have not been described yet.
 
 Feature     Default        '--posix' or
                            '--use-lc-numeric'
 ------------------------------------------------------------
 '%'g'       Use locale     Use locale
 '%g'        Use period     Use locale
 Input       Use period     Use locale
 'strtonum()'Use period     Use locale
 
 Table 6.1: Locale decimal point versus a period
 
    Finally, modern-day formal standards and the IEEE standard
 floating-point representation can have an unusual but important effect
 on the way 'gawk' converts some special string values to numbers.  The
 details are presented in ⇒POSIX Floating Point Problems.
 
© manpagez.com 2000-2018
Individual documents may contain additional copyright information.