manpagez: man pages & more
info gawk
Home | html | info | man

File: gawk.info,  Node: Locale influences conversions,  Prev: Strings And Numbers,  Up: Conversion

6.1.4.2 Locales Can Influence Conversion
........................................

Where you are can matter when it comes to converting between numbers and
strings.  The local character set and language--the "locale"--can affect
numeric formats.  In particular, for 'awk' programs, it affects the
decimal point character and the thousands-separator character.  The
'"C"' locale, and most English-language locales, use the period
character ('.') as the decimal point and don't have a thousands
separator.  However, many (if not most) European and non-English locales
use the comma (',') as the decimal point character.  European locales
often use either a space or a period as the thousands separator, if they
have one.

   The POSIX standard says that 'awk' always uses the period as the
decimal point when reading the 'awk' program source code, and for
command-line variable assignments (*note Other Arguments::).  However,
when interpreting input data, for 'print' and 'printf' output, and for
number-to-string conversion, the local decimal point character is used.
(d.c.)  In all cases, numbers in source code and in input data cannot
have a thousands separator.  Here are some examples indicating the
difference in behavior, on a GNU/Linux system:

     $ export POSIXLY_CORRECT=1                        Force POSIX behavior
     $ gawk 'BEGIN { printf "%g\n", 3.1415927 }'
     -| 3.14159
     $ LC_ALL=en_DK.utf-8 gawk 'BEGIN { printf "%g\n", 3.1415927 }'
     -| 3,14159
     $ echo 4,321 | gawk '{ print $1 + 1 }'
     -| 5
     $ echo 4,321 | LC_ALL=en_DK.utf-8 gawk '{ print $1 + 1 }'
     -| 5,321

The 'en_DK.utf-8' locale is for English in Denmark, where the comma acts
as the decimal point separator.  In the normal '"C"' locale, 'gawk'
treats '4,321' as 4, while in the Danish locale, it's treated as the
full number including the fractional part, 4.321.

   Some earlier versions of 'gawk' fully complied with this aspect of
the standard.  However, many users in non-English locales complained
about this behavior, because their data used a period as the decimal
point, so the default behavior was restored to use a period as the
decimal point character.  You can use the '--use-lc-numeric' option
(*note Options::) to force 'gawk' to use the locale's decimal point
character.  ('gawk' also uses the locale's decimal point character when
in POSIX mode, either via '--posix' or the 'POSIXLY_CORRECT' environment
variable, as shown previously.)

   *note Table 6.1: table-locale-affects. describes the cases in which
the locale's decimal point character is used and when a period is used.
Some of these features have not been described yet.


Feature     Default        '--posix' or
                           '--use-lc-numeric'
------------------------------------------------------------
'%'g'       Use locale     Use locale
'%g'        Use period     Use locale
Input       Use period     Use locale
'strtonum()'Use period     Use locale

Table 6.1: Locale decimal point versus a period

   Finally, modern-day formal standards and the IEEE standard
floating-point representation can have an unusual but important effect
on the way 'gawk' converts some special string values to numbers.  The
details are presented in *note POSIX Floating Point Problems::.

© manpagez.com 2000-2025
Individual documents may contain additional copyright information.