manpagez: man pages & more
info gawk
Home | html | info | man
[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.1.4 Conversion of Strings and Numbers

Strings are converted to numbers and numbers are converted to strings, if the context of the awk program demands it. For example, if the value of either foo or bar in the expression ‘foo + bar’ happens to be a string, it is converted to a number before the addition is performed. If numeric values appear in string concatenation, they are converted to strings. Consider the following:

two = 2; three = 3
print (two three) + 4

This prints the (numeric) value 27. The numeric values of the variables two and three are converted to strings and concatenated together. The resulting string is converted back to the number 23, to which 4 is then added.

If, for some reason, you need to force a number to be converted to a string, concatenate that number with the empty string, "". To force a string to be converted to a number, add zero to that string. A string is converted to a number by interpreting any numeric prefix of the string as numerals: "2.5" converts to 2.5, "1e3" converts to 1000, and "25fix" has a numeric value of 25. Strings that can’t be interpreted as valid numbers convert to zero.

The exact manner in which numbers are converted into strings is controlled by the awk built-in variable CONVFMT (see section Built-in Variables). Numbers are converted using the sprintf() function with CONVFMT as the format specifier (see section String-Manipulation Functions).

CONVFMT’s default value is "%.6g", which prints a value with at most six significant digits. For some applications, you might want to change it to specify more precision. On most modern machines, 17 digits is usually enough to capture a floating-point number’s value exactly.(29)

Strange results can occur if you set CONVFMT to a string that doesn’t tell sprintf() how to format floating-point numbers in a useful way. For example, if you forget the ‘%’ in the format, awk converts all numbers to the same constant string.

As a special case, if a number is an integer, then the result of converting it to a string is always an integer, no matter what the value of CONVFMT may be. Given the following code fragment:

CONVFMT = "%2.2f"
a = 12
b = a ""

b has the value "12", not "12.00". (d.c.)

Prior to the POSIX standard, awk used the value of OFMT for converting numbers to strings. OFMT specifies the output format to use when printing numbers with print. CONVFMT was introduced in order to separate the semantics of conversion from the semantics of printing. Both CONVFMT and OFMT have the same default value: "%.6g". In the vast majority of cases, old awk programs do not change their behavior. However, these semantics for OFMT are something to keep in mind if you must port your new-style program to older implementations of awk. We recommend that instead of changing your programs, just port gawk itself. See section The print Statement, for more information on the print statement.

And, once again, where you are can matter when it comes to converting between numbers and strings. In Where You Are Makes A Difference, we mentioned that the local character set and language (the locale) can affect how gawk matches characters. The locale also affects numeric formats. In particular, for awk programs, it affects the decimal point character. The "C" locale, and most English-language locales, use the period character (‘.’) as the decimal point. However, many (if not most) European and non-English locales use the comma (‘,’) as the decimal point character.

The POSIX standard says that awk always uses the period as the decimal point when reading the awk program source code, and for command-line variable assignments (see section Other Command-Line Arguments). However, when interpreting input data, for print and printf output, and for number to string conversion, the local decimal point character is used. Here are some examples indicating the difference in behavior, on a GNU/Linux system:

$ gawk 'BEGIN { printf "%g\n", 3.1415927 }'
-| 3.14159
$ LC_ALL=en_DK gawk 'BEGIN { printf "%g\n", 3.1415927 }'
-| 3,14159
$ echo 4,321 | gawk '{ print $1 + 1 }'
-| 5
$ echo 4,321 | LC_ALL=en_DK gawk '{ print $1 + 1 }'
-| 5,321

The ‘en_DK’ locale is for English in Denmark, where the comma acts as the decimal point separator. In the normal "C" locale, gawk treats ‘4,321’ as ‘4’, while in the Danish locale, it’s treated as the full number, 4.321.

Some earlier versions of gawk fully complied with this aspect of the standard. However, many users in non-English locales complained about this behavior, since their data used a period as the decimal point, so the default behavior was restored to use a period as the decimal point character. You can use the ‘--use-lc-numeric’ option (see section Command-Line Options) to force gawk to use the locale’s decimal point character. (gawk also uses the locale’s decimal point character when in POSIX mode, either via ‘--posix’, or the POSIXLY_CORRECT environment variable.)

Table 6.1 describes the cases in which the locale’s decimal point character is used and when a period is used. Some of these features have not been described yet.

FeatureDefault--posix’ or ‘--use-lc-numeric
%'gUse localeUse locale
%gUse periodUse locale
InputUse periodUse locale
strtonum()Use periodUse locale

Table 6.1: Locale Decimal Point versus A Period

Finally, modern day formal standards and IEEE standard floating point representation can have an unusual but important effect on the way gawk converts some special string values to numbers. The details are presented in Standards Versus Existing Practice.


[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated on March 30, 2012 using texi2html 5.0.