manpagez: man pages & more
info gawk
Home | html | info | man

gawk: Glossary

      A series of 'awk' statements attached to a rule.  If the rule's
      pattern matches an input record, 'awk' executes the rule's action.
      Actions are always enclosed in braces.  (⇒Action Overview.)
      A programming language originally defined by the U.S. Department of
      Defense for embedded programming.  It was designed to enforce good
      Software Engineering practices.
 Amazing 'awk' Assembler
      Henry Spencer at the University of Toronto wrote a retargetable
      assembler completely as 'sed' and 'awk' scripts.  It is thousands
      of lines long, including machine descriptions for several eight-bit
      microcomputers.  It is a good example of a program that would have
      been better written in another language.
 Amazingly Workable Formatter ('awf')
      Henry Spencer at the University of Toronto wrote a formatter that
      accepts a large subset of the 'nroff -ms' and 'nroff -man'
      formatting commands, using 'awk' and 'sh'.
      The regexp metacharacters '^' and '$', which force the match to the
      beginning or end of the string, respectively.
      The American National Standards Institute.  This organization
      produces many standards, among them the standards for the C and C++
      programming languages.  These standards often become international
      standards as well.  See also "ISO."
      An argument can be two different things.  It can be an option or a
      file name passed to a command while invoking it from the command
      line, or it can be something passed to a "function" inside a
      program, e.g.  inside 'awk'.
      In the latter case, an argument can be passed to a function in two
      ways.  Either it is given to the called function by value, i.e., a
      copy of the value of the variable is made available to the called
      function, but the original variable cannot be modified by the
      function itself; or it is given by reference, i.e., a pointer to
      the interested variable is passed to the function, which can then
      directly modify it.  In 'awk' scalars are passed by value, and
      arrays are passed by reference.  See "Pass By Value/Reference."
      A grouping of multiple values under the same name.  Most languages
      just provide sequential arrays.  'awk' provides associative arrays.
      A statement in a program that a condition is true at this point in
      the program.  Useful for reasoning about how a program is supposed
      to behave.
      An 'awk' expression that changes the value of some 'awk' variable
      or data object.  An object that you can assign to is called an
      "lvalue".  The assigned values are called "rvalues".  ⇒
      Assignment Ops.
 Associative Array
      Arrays in which the indices may be numbers or strings, not just
      sequential integers in a fixed range.
 'awk' Language
      The language in which 'awk' programs are written.
 'awk' Program
      An 'awk' program consists of a series of "patterns" and "actions",
      collectively known as "rules".  For each input record given to the
      program, the program's rules are all processed in turn.  'awk'
      programs may also contain function definitions.
 'awk' Script
      Another name for an 'awk' program.
      The GNU version of the standard shell (the Bourne-Again SHell).
      See also "Bourne Shell."
      Base-two notation, where the digits are '0'-'1'.  Since electronic
      circuitry works "naturally" in base 2 (just think of Off/On),
      everything inside a computer is calculated using base 2.  Each
      digit represents the presence (or absence) of a power of 2 and is
      called a "bit".  So, for example, the base-two number '10101' is
      the same as decimal 21, ((1 x 16) + (1 x 4) + (1 x 1)).
      Since base-two numbers quickly become very long to read and write,
      they are usually grouped by 3 (i.e., they are read as octal
      numbers), or by 4 (i.e., they are read as hexadecimal numbers).
      There is no direct way to insert base 2 numbers in a C program.  If
      need arises, such numbers are usually inserted as octal or
      hexadecimal numbers.  The number of base-two digits that fit into
      registers used for representing integer numbers in computers is a
      rough indication of the computing power of the computer itself.
      Most computers nowadays use 64 bits for representing integer
      numbers in their registers, but 32-bit, 16-bit and 8-bit registers
      have been widely used in the past.  ⇒Nondecimal-numbers.
      Short for "Binary Digit."  All values in computer memory ultimately
      reduce to binary digits: values that are either zero or one.
      Groups of bits may be interpreted differently--as integers,
      floating-point numbers, character data, addresses of other memory
      objects, or other data.  'awk' lets you work with floating-point
      numbers and strings.  'gawk' lets you manipulate bit values with
      the built-in functions described in ⇒Bitwise Functions.
      Computers are often defined by how many bits they use to represent
      integer values.  Typical systems are 32-bit systems, but 64-bit
      systems are becoming increasingly popular, and 16-bit systems have
      essentially disappeared.
 Boolean Expression
      Named after the English mathematician Boole.  See also "Logical
 Bourne Shell
      The standard shell ('/bin/sh') on Unix and Unix-like systems,
      originally written by Steven R. Bourne at Bell Laboratories.  Many
      shells (Bash, 'ksh', 'pdksh', 'zsh') are generally upwardly
      compatible with the Bourne shell.
      The characters '{' and '}'.  Braces are used in 'awk' for
      delimiting actions, compound statements, and function bodies.
 Bracket Expression
      Inside a "regular expression", an expression included in square
      brackets, meant to designate a single character as belonging to a
      specified character class.  A bracket expression can contain a list
      of one or more characters, like '[abc]', a range of characters,
      like '[A-Z]', or a name, delimited by ':', that designates a known
      set of characters, like '[:digit:]'.  The form of bracket
      expression enclosed between ':' is independent of the underlying
      representation of the character themselves, which could utilize the
      ASCII, EBCDIC, or Unicode codesets, depending on the architecture
      of the computer system, and on localization.  See also "Regular
 Built-in Function
      The 'awk' language provides built-in functions that perform various
      numerical, I/O-related, and string computations.  Examples are
      'sqrt()' (for the square root of a number) and 'substr()' (for a
      substring of a string).  'gawk' provides functions for timestamp
      management, bit manipulation, array sorting, type checking, and
      runtime string translation.  (⇒Built-in.)
 Built-in Variable
      'NF', 'NR', 'OFMT', 'OFS', 'ORS', 'RLENGTH', 'RSTART', 'RS', and
      'SUBSEP' are the variables that have special meaning to 'awk'.  In
      addition, 'ARGIND', 'BINMODE', 'ERRNO', 'FIELDWIDTHS', 'FPAT',
      'IGNORECASE', 'LINT', 'PROCINFO', 'RT', and 'TEXTDOMAIN' are the
      variables that have special meaning to 'gawk'.  Changing some of
      them affects 'awk''s running environment.  (⇒Built-in
      The system programming language that most GNU software is written
      in.  The 'awk' programming language has C-like syntax, and this
      Info file points out similarities between 'awk' and C when
      In general, 'gawk' attempts to be as similar to the 1990 version of
      ISO C as makes sense.
 C Shell
      The C Shell ('csh' or its improved version, 'tcsh') is a Unix shell
      that was created by Bill Joy in the late 1970s.  The C shell was
      differentiated from other shells by its interactive features and
      overall style, which looks more like C. The C Shell is not backward
      compatible with the Bourne Shell, so special attention is required
      when converting scripts written for other Unix shells to the C
      shell, especially with regard to the management of shell variables.
      See also "Bourne Shell."
      A popular object-oriented programming language derived from C.
 Character Class
      See "Bracket Expression."
 Character List
      See "Bracket Expression."
 Character Set
      The set of numeric codes used by a computer system to represent the
      characters (letters, numbers, punctuation, etc.)  of a particular
      country or place.  The most common character set in use today is
      ASCII (American Standard Code for Information Interchange).  Many
      European countries use an extension of ASCII known as ISO-8859-1
      (ISO Latin-1).  The Unicode character set (
      is increasingly popular and standard, and is particularly widely
      used on GNU/Linux systems.
      A preprocessor for 'pic' that reads descriptions of molecules and
      produces 'pic' input for drawing them.  It was written in 'awk' by
      Brian Kernighan and Jon Bentley, and is available from
 Comparison Expression
      A relation that is either true or false, such as 'a < b'.
      Comparison expressions are used in 'if', 'while', 'do', and 'for'
      statements, and in patterns to select which input records to
      process.  (⇒Typing and Comparison.)
      A program that translates human-readable source code into
      machine-executable object code.  The object code is then executed
      directly by the computer.  See also "Interpreter."
 Complemented Bracket Expression
      The negation of a "bracket expression".  All that is _not_
      described by a given bracket expression.  The symbol '^' precedes
      the negated bracket expression.  E.g.: '[[^:digit:]' designates
      whatever character is not a digit.  '[^bad]' designates whatever
      character is not one of the letters 'b', 'a', or 'd'.  See "Bracket
 Compound Statement
      A series of 'awk' statements, enclosed in curly braces.  Compound
      statements may be nested.  (⇒Statements.)
 Computed Regexps
      See "Dynamic Regular Expressions."
      Concatenating two strings means sticking them together, one after
      another, producing a new string.  For example, the string 'foo'
      concatenated with the string 'bar' gives the string 'foobar'.
 Conditional Expression
      An expression using the '?:' ternary operator, such as 'EXPR1 ?
      EXPR2 : EXPR3'.  The expression EXPR1 is evaluated; if the result
      is true, the value of the whole expression is the value of EXPR2;
      otherwise the value is EXPR3.  In either case, only one of EXPR2
      and EXPR3 is evaluated.  (⇒Conditional Exp.)
 Control Statement
      A control statement is an instruction to perform a given operation
      or a set of operations inside an 'awk' program, if a given
      condition is true.  Control statements are: 'if', 'for', 'while',
      and 'do' (⇒Statements).
      A peculiar goodie, token, saying or remembrance produced by or
      presented to a program.  (With thanks to Professor Doug McIlroy.)
      A subordinate program with which two-way communications is
 Curly Braces
      See "Braces."
 Dark Corner
      An area in the language where specifications often were (or still
      are) not clear, leading to unexpected or undesirable behavior.
      Such areas are marked in this Info file with "(d.c.)"  in the text
      and are indexed under the heading "dark corner."
 Data Driven
      A description of 'awk' programs, where you specify the data you are
      interested in processing, and what to do when that data is seen.
 Data Objects
      These are numbers and strings of characters.  Numbers are converted
      into strings and vice versa, as needed.  (⇒Conversion.)
      The situation in which two communicating processes are each waiting
      for the other to perform an action.
      A program used to help developers remove "bugs" from (de-bug) their
 Double Precision
      An internal representation of numbers that can have fractional
      parts.  Double precision numbers keep track of more digits than do
      single precision numbers, but operations on them are sometimes more
      expensive.  This is the way 'awk' stores numeric values.  It is the
      C type 'double'.
 Dynamic Regular Expression
      A dynamic regular expression is a regular expression written as an
      ordinary expression.  It could be a string constant, such as
      '"foo"', but it may also be an expression whose value can vary.
      (⇒Computed Regexps.)
 Empty String
      See "Null String."
      A collection of strings, of the form 'NAME=VAL', that each program
      has available to it.  Users generally place values into the
      environment in order to provide information to various programs.
      Typical examples are the environment variables 'HOME' and 'PATH'.
      The date used as the "beginning of time" for timestamps.  Time
      values in most systems are represented as seconds since the epoch,
      with library functions available for converting these values into
      standard date and time formats.
      The epoch on Unix and POSIX systems is 1970-01-01 00:00:00 UTC. See
      also "GMT" and "UTC."
 Escape Sequences
      A special sequence of characters used for describing nonprinting
      characters, such as '\n' for newline or '\033' for the ASCII ESC
      (Escape) character.  (⇒Escape Sequences.)
      An additional feature or change to a programming language or
      utility not defined by that language's or utility's standard.
      'gawk' has (too) many extensions over POSIX 'awk'.
      See "Free Documentation License."
      When 'awk' reads an input record, it splits the record into pieces
      separated by whitespace (or by a separator regexp that you can
      change by setting the predefined variable 'FS').  Such pieces are
      called fields.  If the pieces are of fixed length, you can use the
      built-in variable 'FIELDWIDTHS' to describe their lengths.  If you
      wish to specify the contents of fields instead of the field
      separator, you can use the predefined variable 'FPAT' to do so.
DONTPRINTYET       (⇒Field Separators, ⇒Constant Size, and *noteDONTPRINTYET       (⇒Field Separators, ⇒Constant Size, and ⇒
      Splitting By Content.)
      A variable whose truth value indicates the existence or
      nonexistence of some condition.
 Floating-Point Number
      Often referred to in mathematical terms as a "rational" or real
      number, this is just a number that can have a fractional part.  See
      also "Double Precision" and "Single Precision."
      Format strings control the appearance of output in the 'strftime()'
      and 'sprintf()' functions, and in the 'printf' statement as well.
      Also, data conversions from numbers to strings are controlled by
      the format strings contained in the predefined variables 'CONVFMT'
      and 'OFMT'.  (⇒Control Letters.)
      Shorthand for FORmula TRANslator, one of the first programming
      languages available for scientific calculations.  It was created by
      John Backus, and has been available since 1957.  It is still in use
 Free Documentation License
      This document describes the terms under which this Info file is
      published and may be copied.  (⇒GNU Free Documentation
 Free Software Foundation
      A nonprofit organization dedicated to the production and
      distribution of freely distributable software.  It was founded by
      Richard M. Stallman, the author of the original Emacs editor.  GNU
      Emacs is the most widely used version of Emacs today.
      See "Free Software Foundation."
      A part of an 'awk' program that can be invoked from every point of
      the program, to perform a task.  'awk' has several built-in
      functions.  Users can define their own functions in every part of
      the program.  Function can be recursive, i.e., they may invoke
      themselves.  ⇒Functions.  In 'gawk' it is also possible to
      have functions shared among different programs, and included where
      required using the '@include' directive (⇒Include Files).
      In 'gawk' the name of the function that should be invoked can be
      generated at run time, i.e., dynamically.  The 'gawk' extension API
      provides constructor functions (⇒Constructor Functions).
      The GNU implementation of 'awk'.
 General Public License
      This document describes the terms under which 'gawk' and its source
      code may be distributed.  (⇒Copying.)
      "Greenwich Mean Time."  This is the old term for UTC. It is the
      time of day used internally for Unix and POSIX systems.  See also
      "Epoch" and "UTC."
      "GNU's not Unix".  An on-going project of the Free Software
      Foundation to create a complete, freely distributable,
      POSIX-compliant computing environment.
      A variant of the GNU system using the Linux kernel, instead of the
      Free Software Foundation's Hurd kernel.  The Linux kernel is a
      stable, efficient, full-featured clone of Unix that has been ported
      to a variety of architectures.  It is most popular on PC-class
      systems, but runs well on a variety of other systems too.  The
      Linux kernel source code is available under the terms of the GNU
      General Public License, which is perhaps its most important aspect.
      See "General Public License."
      Base 16 notation, where the digits are '0'-'9' and 'A'-'F', with
      'A' representing 10, 'B' representing 11, and so on, up to 'F' for
      15.  Hexadecimal numbers are written in C using a leading '0x', to
      indicate their base.  Thus, '0x12' is 18 ((1 x 16) + 2).  ⇒
      Abbreviation for "Input/Output," the act of moving data into and/or
      out of a running program.
 Input Record
      A single chunk of data that is read in by 'awk'.  Usually, an 'awk'
      input record consists of one line of text.  (⇒Records.)
      A whole number, i.e., a number that does not have a fractional
      The process of writing or modifying a program so that it can use
      multiple languages without requiring further source code changes.
      A program that reads human-readable source code directly, and uses
      the instructions in it to process data and produce results.  'awk'
      is typically (but not always) implemented as an interpreter.  See
      also "Compiler."
 Interval Expression
      A component of a regular expression that lets you specify repeated
      matches of some part of the regexp.  Interval expressions were not
      originally available in 'awk' programs.
      The International Organization for Standardization.  This
      organization produces international standards for many things,
      including programming languages, such as C and C++.  In the
      computer arena, important standards like those for C, C++, and
      POSIX become both American national and ISO international standards
      simultaneously.  This Info file refers to Standard C as "ISO C"
      throughout.  See the ISO website
      ( for more information about
      the name of the organization and its language-independent
      three-letter acronym.
      A modern programming language originally developed by Sun
      Microsystems (now Oracle) supporting Object-Oriented programming.
      Although usually implemented by compiling to the instructions for a
      standard virtual machine (the JVM), the language can be compiled to
      native code.
      In the 'awk' language, a keyword is a word that has special
      meaning.  Keywords are reserved and may not be used as variable
      'gawk''s keywords are: 'BEGIN', 'BEGINFILE', 'END', 'ENDFILE',
      'break', 'case', 'continue', 'default' 'delete', 'do...while',
      'else', 'exit', '', 'for', 'function', 'func', 'if',
      'next', 'nextfile', 'switch', and 'while'.
 Korn Shell
      The Korn Shell ('ksh') is a Unix shell which was developed by David
      Korn at Bell Laboratories in the early 1980s.  The Korn Shell is
      backward-compatible with the Bourne shell and includes many
      features of the C shell.  See also "Bourne Shell."
 Lesser General Public License
      This document describes the terms under which binary library
      archives or shared objects, and their source code may be
      See "Lesser General Public License."
      See "GNU/Linux."
      The process of providing the data necessary for an
      internationalized program to work in a particular language.
 Logical Expression
      An expression using the operators for logic, AND, OR, and NOT,
      written '&&', '||', and '!' in 'awk'.  Often called Boolean
      expressions, after the mathematician who pioneered this kind of
      mathematical logic.
      An expression that can appear on the left side of an assignment
      operator.  In most languages, lvalues can be variables or array
      elements.  In 'awk', a field designator can also be used as an
      The act of testing a string against a regular expression.  If the
      regexp describes the contents of the string, it is said to "match"
      Characters used within a regexp that do not stand for themselves.
      Instead, they denote regular expression operations, such as
      repetition, grouping, or alternation.
      Nesting is where information is organized in layers, or where
      objects contain other similar objects.  In 'gawk' the '@include'
      directive can be nested.  The "natural" nesting of arithmetic and
      logical operations can be changed using parentheses (⇒
      An operation that does nothing.
 Null String
      A string with no characters in it.  It is represented explicitly in
      'awk' programs by placing two double quote characters next to each
      other ('""').  It can appear in input data by having two successive
      occurrences of the field separator appear next to each other.
      A numeric-valued data object.  Modern 'awk' implementations use
      double precision floating-point to represent numbers.  Ancient
      'awk' implementations used single precision floating-point.
      Base-eight notation, where the digits are '0'-'7'.  Octal numbers
      are written in C using a leading '0', to indicate their base.
      Thus, '013' is 11 ((1 x 8) + 3).  ⇒Nondecimal-numbers.
 Output Record
      A single chunk of data that is written out by 'awk'.  Usually, an
      'awk' output record consists of one or more lines of text.  ⇒
      Patterns tell 'awk' which input records are interesting to which
      A pattern is an arbitrary conditional expression against which
      input is tested.  If the condition is satisfied, the pattern is
      said to "match" the input record.  A typical pattern might compare
      the input record against a regular expression.  (⇒Pattern
      An acronym describing what is possibly the most frequent source of
      computer usage problems.  (Problem Exists Between Keyboard And
      See "Extensions."
      The name for a series of standards that specify a Portable
      Operating System interface.  The "IX" denotes the Unix heritage of
      these standards.  The main standard of interest for 'awk' users is
      'IEEE Standard for Information Technology, Standard 1003.1-2008'.
      The 2008 POSIX standard can be found online at
      The order in which operations are performed when operators are used
      without explicit parentheses.
      Variables and/or functions that are meant for use exclusively by
      library functions and not for the main 'awk' program.  Special care
      must be taken when naming such variables and functions.  (⇒
      Library Names.)
 Range (of input lines)
      A sequence of consecutive lines from the input file(s).  A pattern
      can specify ranges of input lines for 'awk' to process or it can
      specify single lines.  (⇒Pattern Overview.)
      See "Input record" and "Output record."
      When a function calls itself, either directly or indirectly.  If
      this is clear, stop, and proceed to the next entry.  Otherwise,
      refer to the entry for "recursion."
      Redirection means performing input from something other than the
      standard input stream, or performing output to something other than
      the standard output stream.
      You can redirect input to the 'getline' statement using the '<',
      '|', and '|&' operators.  You can redirect the output of the
      'print' and 'printf' statements to a file or a system command,
      using the '>', '>>', '|', and '|&' operators.  (⇒Getline,
      and ⇒Redirection.)
 Reference Counts
      An internal mechanism in 'gawk' to minimize the amount of memory
      needed to store the value of string variables.  If the value
      assumed by a variable is used in more than one place, only one copy
      of the value itself is kept, and the associated reference count is
      increased when the same value is used by an additional variable,
      and decreased when the related variable is no longer in use.  When
      the reference count goes to zero, the memory space used to store
      the value of the variable is freed.
      See "Regular Expression."
 Regular Expression
      A regular expression ("regexp" for short) is a pattern that denotes
      a set of strings, possibly an infinite set.  For example, the
      regular expression 'R.*xp' matches any string starting with the
      letter 'R' and ending with the letters 'xp'.  In 'awk', regular
      expressions are used in patterns and in conditional expressions.
      Regular expressions may contain escape sequences.  (⇒
 Regular Expression Constant
      A regular expression constant is a regular expression written
      within slashes, such as '/foo/'.  This regular expression is chosen
      when you write the 'awk' program and cannot be changed during its
      execution.  (⇒Regexp Usage.)
 Regular Expression Operators
      See "Metacharacters."
      Rounding the result of an arithmetic operation can be tricky.  More
      than one way of rounding exists, and in 'gawk' it is possible to
      choose which method should be used in a program.  ⇒Setting the
      rounding mode.
      A segment of an 'awk' program that specifies how to process single
      input records.  A rule consists of a "pattern" and an "action".
      'awk' reads an input record; then, for each rule, if the input
      record satisfies the rule's pattern, 'awk' executes the rule's
      action.  Otherwise, the rule does nothing for that input record.
      A value that can appear on the right side of an assignment
      operator.  In 'awk', essentially every expression has a value.
      These values are rvalues.
      A single value, be it a number or a string.  Regular variables are
      scalars; arrays and functions are not.
 Search Path
      In 'gawk', a list of directories to search for 'awk' program source
      files.  In the shell, a list of directories to search for
      executable programs.
      See "Stream Editor."
      The initial value, or starting point, for a sequence of random
      The command interpreter for Unix and POSIX-compliant systems.  The
      shell works both interactively, and as a programming language for
      batch files, or shell scripts.
      The nature of the 'awk' logical operators '&&' and '||'.  If the
      value of the entire expression is determinable from evaluating just
      the lefthand side of these operators, the righthand side is not
      evaluated.  (⇒Boolean Ops.)
 Side Effect
      A side effect occurs when an expression has an effect aside from
      merely producing a value.  Assignment expressions, increment and
      decrement expressions, and function calls have side effects.
      (⇒Assignment Ops.)
 Single Precision
      An internal representation of numbers that can have fractional
      parts.  Single precision numbers keep track of fewer digits than do
      double precision numbers, but operations on them are sometimes less
      expensive in terms of CPU time.  This is the type used by some
      ancient versions of 'awk' to store numeric values.  It is the C
      type 'float'.
      The character generated by hitting the space bar on the keyboard.
 Special File
      A file name interpreted internally by 'gawk', instead of being
      handed directly to the underlying operating system--for example,
      '/dev/stderr'.  (⇒Special Files.)
      An expression inside an 'awk' program in the action part of a
      pattern-action rule, or inside an 'awk' function.  A statement can
      be a variable assignment, an array operation, a loop, etc.
 Stream Editor
      A program that reads records from an input stream and processes
      them one or more at a time.  This is in contrast with batch
      programs, which may expect to read their input files in entirety
      before starting to do anything, as well as with interactive
      programs which require input from the user.
      A datum consisting of a sequence of characters, such as 'I am a
      string'.  Constant strings are written with double quotes in the
      'awk' language and may contain escape sequences.  (⇒Escape
      The character generated by hitting the 'TAB' key on the keyboard.
      It usually expands to up to eight spaces upon output.
 Text Domain
      A unique name that identifies an application.  Used for grouping
      messages that are translated at runtime into the local language.
      A value in the "seconds since the epoch" format used by Unix and
      POSIX systems.  Used for the 'gawk' functions 'mktime()',
      'strftime()', and 'systime()'.  See also "Epoch," "GMT," and "UTC."
      A computer operating system originally developed in the early
      1970's at AT&T Bell Laboratories.  It initially became popular in
      universities around the world and later moved into commercial
      environments as a software development system and network server
      system.  There are many commercial versions of Unix, as well as
      several work-alike systems whose source code is freely available
      (such as GNU/Linux, NetBSD (, FreeBSD
      (, and OpenBSD (
      The accepted abbreviation for "Universal Coordinated Time."  This
      is standard time in Greenwich, England, which is used as a
      reference time for day and date calculations.  See also "Epoch" and
      A name for a value.  In 'awk', variables may be either scalars or
      A sequence of space, TAB, or newline characters occurring inside an
      input record or a string.
© 2000-2018
Individual documents may contain additional copyright information.