manpagez: man pages & more
info gawk
Home | html | info | man

File: gawk.info,  Node: Controlling Scanning,  Prev: Scanning an Array,  Up: Array Basics

8.1.6 Using Predefined Array Scanning Orders with 'gawk'
--------------------------------------------------------

This node describes a feature that is specific to 'gawk'.

   By default, when a 'for' loop traverses an array, the order is
undefined, meaning that the 'awk' implementation determines the order in
which the array is traversed.  This order is usually based on the
internal implementation of arrays and will vary from one version of
'awk' to the next.

   Often, though, you may wish to do something simple, such as "traverse
the array by comparing the indices in ascending order," or "traverse the
array by comparing the values in descending order."  'gawk' provides two
mechanisms that give you this control:

   * Set 'PROCINFO["sorted_in"]' to one of a set of predefined values.
     We describe this now.

   * Set 'PROCINFO["sorted_in"]' to the name of a user-defined function
     to use for comparison of array elements.  This advanced feature is
     described later in *note Array Sorting::.

   The following special values for 'PROCINFO["sorted_in"]' are
available:

'"@unsorted"'
     Array elements are processed in arbitrary order, which is the
     default 'awk' behavior.

'"@ind_str_asc"'
     Order by indices in ascending order compared as strings; this is
     the most basic sort.  (Internally, array indices are always
     strings, so with 'a[2*5] = 1' the index is '"10"' rather than
     numeric 10.)

'"@ind_num_asc"'
     Order by indices in ascending order but force them to be treated as
     numbers in the process.  Any index with a non-numeric value will
     end up positioned as if it were zero.

'"@val_type_asc"'
     Order by element values in ascending order (rather than by
     indices).  Ordering is by the type assigned to the element (*note
     Typing and Comparison::).  All numeric values come before all
     string values, which in turn come before all subarrays.  (Subarrays
     have not been described yet; *note Arrays of Arrays::.)

     If you choose to use this feature in traversing 'FUNCTAB' (*note
     Auto-set::), then the order is built-in functions first (*note
     Built-in::), then user-defined functions (*note User-defined::)
     next, and finally functions loaded from an extension (*note Dynamic
     Extensions::).

'"@val_str_asc"'
     Order by element values in ascending order (rather than by
     indices).  Scalar values are compared as strings.  If the string
     values are identical, the index string values are compared instead.
     When comparing non-scalar values, '"@val_type_asc"' sort ordering
     is used, so subarrays, if present, come out last.

'"@val_num_asc"'
     Order by element values in ascending order (rather than by
     indices).  Scalar values are compared as numbers.  Non-scalar
     values are compared using '"@val_type_asc"' sort ordering, so
     subarrays, if present, come out last.  When numeric values are
     equal, the string values are used to provide an ordering: this
     guarantees consistent results across different versions of the C
     'qsort()' function,(1) which 'gawk' uses internally to perform the
     sorting.  If the string values are also identical, the index string
     values are compared instead.

'"@ind_str_desc"'
     Like '"@ind_str_asc"', but the string indices are ordered from high
     to low.

'"@ind_num_desc"'
     Like '"@ind_num_asc"', but the numeric indices are ordered from
     high to low.

'"@val_type_desc"'
     Like '"@val_type_asc"', but the element values, based on type, are
     ordered from high to low.  Subarrays, if present, come out first.

'"@val_str_desc"'
     Like '"@val_str_asc"', but the element values, treated as strings,
     are ordered from high to low.  If the string values are identical,
     the index string values are compared instead.  When comparing
     non-scalar values, '"@val_type_desc"' sort ordering is used, so
     subarrays, if present, come out first.

'"@val_num_desc"'
     Like '"@val_num_asc"', but the element values, treated as numbers,
     are ordered from high to low.  If the numeric values are equal, the
     string values are compared instead.  If they are also identical,
     the index string values are compared instead.  Non-scalar values
     are compared using '"@val_type_desc"' sort ordering, so subarrays,
     if present, come out first.

   The array traversal order is determined before the 'for' loop starts
to run.  Changing 'PROCINFO["sorted_in"]' in the loop body does not
affect the loop.  For example:

     $ gawk '
     > BEGIN {
     >    a[4] = 4
     >    a[3] = 3
     >    for (i in a)
     >        print i, a[i]
     > }'
     -| 4 4
     -| 3 3
     $ gawk '
     > BEGIN {
     >    PROCINFO["sorted_in"] = "@ind_str_asc"
     >    a[4] = 4
     >    a[3] = 3
     >    for (i in a)
     >        print i, a[i]
     > }'
     -| 3 3
     -| 4 4

   When sorting an array by element values, if a value happens to be a
subarray then it is considered to be greater than any string or numeric
value, regardless of what the subarray itself contains, and all
subarrays are treated as being equal to each other.  Their order
relative to each other is determined by their index strings.

   Here are some additional things to bear in mind about sorted array
traversal:

   * The value of 'PROCINFO["sorted_in"]' is global.  That is, it
     affects all array traversal 'for' loops.  If you need to change it
     within your own code, you should see if it's defined and save and
     restore the value:

          ...
          if ("sorted_in" in PROCINFO) {
              save_sorted = PROCINFO["sorted_in"]
              PROCINFO["sorted_in"] = "@val_str_desc" # or whatever
          }
          ...
          if (save_sorted)
              PROCINFO["sorted_in"] = save_sorted

   * As already mentioned, the default array traversal order is
     represented by '"@unsorted"'.  You can also get the default
     behavior by assigning the null string to 'PROCINFO["sorted_in"]' or
     by just deleting the '"sorted_in"' element from the 'PROCINFO'
     array with the 'delete' statement.  (The 'delete' statement hasn't
     been described yet; *note Delete::.)

   In addition, 'gawk' provides built-in functions for sorting arrays;
see *note Array Sorting Functions::.

   ---------- Footnotes ----------

   (1) When two elements compare as equal, the C 'qsort()' function does
not guarantee that they will maintain their original relative order
after sorting.  Using the string value to provide a unique ordering when
the numeric values are equal ensures that 'gawk' behaves consistently
across different environments.

© manpagez.com 2000-2025
Individual documents may contain additional copyright information.