*gawk:* Basic Data Typing

D.2 Data Values in a Computer ============================= In a program, you keep track of information and values in things called "variables". A variable is just a name for a given value, such as 'first_name', 'last_name', 'address', and so on. 'awk' has several predefined variables, and it has special names to refer to the current input record and the fields of the record. You may also group multiple associated values under one name, as an array. Data, particularly in 'awk', consists of either numeric values, such as 42 or 3.1415927, or string values. String values are essentially anything that's not a number, such as a name. Strings are sometimes referred to as "character data", since they store the individual characters that comprise them. Individual variables, as well as numeric and string variables, are referred to as "scalar" values. Groups of values, such as arrays, are not scalars. ⇒Computer Arithmetic, provided a basic introduction to numeric types (integer and floating-point) and how they are used in a computer. Please review that information, including a number of caveats that were presented. While you are probably used to the idea of a number without a value (i.e., zero), it takes a bit more getting used to the idea of zero-length character data. Nevertheless, such a thing exists. It is called the "null string". The null string is character data that has no value. In other words, it is empty. It is written in 'awk' programs like this: '""'. Humans are used to working in decimal; i.e., base 10. In base 10, numbers go from 0 to 9, and then "roll over" into the next column. (Remember grade school? 42 = 4 x 10 + 2.) There are other number bases though. Computers commonly use base 2 or "binary", base 8 or "octal", and base 16 or "hexadecimal". In binary, each column represents two times the value in the column to its right. Each column may contain either a 0 or a 1. Thus, binary 1010 represents (1 x 8) + (0 x 4) + (1 x 2) + (0 x 1), or decimal 10. Octal and hexadecimal are discussed more in ⇒Nondecimal-numbers. At the very lowest level, computers store values as groups of binary digits, or "bits". Modern computers group bits into groups of eight, called "bytes". Advanced applications sometimes have to manipulate bits directly, and 'gawk' provides functions for doing so. Programs are written in programming languages. Hundreds, if not thousands, of programming languages exist. One of the most popular is the C programming language. The C language had a very strong influence on the design of the 'awk' language. There have been several versions of C. The first is often referred to as "K&R" C, after the initials of Brian Kernighan and Dennis Ritchie, the authors of the first book on C. (Dennis Ritchie created the language, and Brian Kernighan was one of the creators of 'awk'.) In the mid-1980s, an effort began to produce an international standard for C. This work culminated in 1989, with the production of the ANSI standard for C. This standard became an ISO standard in 1990. In 1999, a revised ISO C standard was approved and released. Where it makes sense, POSIX 'awk' is compatible with 1999 ISO C.