File: gawk.info, Node: Locale influences conversions, Prev: Strings And Numbers, Up: Conversion 6.1.4.2 Locales Can Influence Conversion ........................................ Where you are can matter when it comes to converting between numbers and strings. The local character set and language--the "locale"--can affect numeric formats. In particular, for 'awk' programs, it affects the decimal point character and the thousands-separator character. The '"C"' locale, and most English-language locales, use the period character ('.') as the decimal point and don't have a thousands separator. However, many (if not most) European and non-English locales use the comma (',') as the decimal point character. European locales often use either a space or a period as the thousands separator, if they have one. The POSIX standard says that 'awk' always uses the period as the decimal point when reading the 'awk' program source code, and for command-line variable assignments (*note Other Arguments::). However, when interpreting input data, for 'print' and 'printf' output, and for number-to-string conversion, the local decimal point character is used. (d.c.) In all cases, numbers in source code and in input data cannot have a thousands separator. Here are some examples indicating the difference in behavior, on a GNU/Linux system: $ export POSIXLY_CORRECT=1 Force POSIX behavior $ gawk 'BEGIN { printf "%g\n", 3.1415927 }' -| 3.14159 $ LC_ALL=en_DK.utf-8 gawk 'BEGIN { printf "%g\n", 3.1415927 }' -| 3,14159 $ echo 4,321 | gawk '{ print $1 + 1 }' -| 5 $ echo 4,321 | LC_ALL=en_DK.utf-8 gawk '{ print $1 + 1 }' -| 5,321 The 'en_DK.utf-8' locale is for English in Denmark, where the comma acts as the decimal point separator. In the normal '"C"' locale, 'gawk' treats '4,321' as 4, while in the Danish locale, it's treated as the full number including the fractional part, 4.321. Some earlier versions of 'gawk' fully complied with this aspect of the standard. However, many users in non-English locales complained about this behavior, because their data used a period as the decimal point, so the default behavior was restored to use a period as the decimal point character. You can use the '--use-lc-numeric' option (*note Options::) to force 'gawk' to use the locale's decimal point character. ('gawk' also uses the locale's decimal point character when in POSIX mode, either via '--posix' or the 'POSIXLY_CORRECT' environment variable, as shown previously.) *note Table 6.1: table-locale-affects. describes the cases in which the locale's decimal point character is used and when a period is used. Some of these features have not been described yet. Feature Default '--posix' or '--use-lc-numeric' ------------------------------------------------------------ '%'g' Use locale Use locale '%g' Use period Use locale Input Use period Use locale 'strtonum()'Use period Use locale Table 6.1: Locale decimal point versus a period Finally, modern-day formal standards and the IEEE standard floating-point representation can have an unusual but important effect on the way 'gawk' converts some special string values to numbers. The details are presented in *note POSIX Floating Point Problems::.