manpagez: man pages & more
info gawk
Home | html | info | man

File: gawk.info,  Node: General Data Types,  Next: Memory Allocation Functions,  Prev: Extension API Functions Introduction,  Up: Extension API Description

17.4.2 General-Purpose Data Types
---------------------------------

     I have a true love/hate relationship with unions.
                          -- _Arnold Robbins_

     That's the thing about unions: the compiler will arrange things so
     they can accommodate both love and hate.
                            -- _Chet Ramey_

   The extension API defines a number of simple types and structures for
general-purpose use.  Additional, more specialized, data structures are
introduced in subsequent minor nodes, together with the functions that
use them.

   The general-purpose types and structures are as follows:

'typedef void *awk_ext_id_t;'
     A value of this type is received from 'gawk' when an extension is
     loaded.  That value must then be passed back to 'gawk' as the first
     parameter of each API function.

'#define awk_const ...'
     This macro expands to 'const' when compiling an extension, and to
     nothing when compiling 'gawk' itself.  This makes certain fields in
     the API data structures unwritable from extension code, while
     allowing 'gawk' to use them as it needs to.

'typedef enum awk_bool {'
'    awk_false = 0,'
'    awk_true'
'} awk_bool_t;'
     A simple Boolean type.

'typedef struct awk_string {'
'    char *str;      /* data */'
'    size_t len;     /* length thereof, in chars */'
'} awk_string_t;'
     This represents a mutable string.  'gawk' owns the memory pointed
     to if it supplied the value.  Otherwise, it takes ownership of the
     memory pointed to.  _Such memory must come from calling one of the
     'gawk_malloc()', 'gawk_calloc()', or 'gawk_realloc()' functions!_

     As mentioned earlier, strings are maintained using the current
     multibyte encoding.

'typedef enum {'
'    AWK_UNDEFINED,'
'    AWK_NUMBER,'
'    AWK_STRING,'
'    AWK_REGEX,'
'    AWK_STRNUM,'
'    AWK_ARRAY,'
'    AWK_SCALAR,         /* opaque access to a variable */'
'    AWK_VALUE_COOKIE,   /* for updating a previously created value */'
'    AWK_BOOL'
'} awk_valtype_t;'
     This 'enum' indicates the type of a value.  It is used in the
     following 'struct'.

'typedef struct awk_value {'
'    awk_valtype_t val_type;'
'    union {'
'        awk_string_t       s;'
'        awknum_t           n;'
'        awk_array_t        a;'
'        awk_scalar_t       scl;'
'        awk_value_cookie_t vc;'
'        awk_bool_t         b;'
'    } u;'
'} awk_value_t;'
     An "'awk' value."  The 'val_type' member indicates what kind of
     value the 'union' holds, and each member is of the appropriate
     type.

'#define str_value      u.s'
'#define strnum_value   str_value'
'#define regex_value    str_value'
'#define num_value      u.n.d'
'#define num_type       u.n.type'
'#define num_ptr        u.n.ptr'
'#define array_cookie   u.a'
'#define scalar_cookie  u.scl'
'#define value_cookie   u.vc'
'#define bool_value     u.b'
     Using these macros makes accessing the fields of the 'awk_value_t'
     more readable.

'enum AWK_NUMBER_TYPE {'
'    AWK_NUMBER_TYPE_DOUBLE,'
'    AWK_NUMBER_TYPE_MPFR,'
'    AWK_NUMBER_TYPE_MPZ'
'};'
     This 'enum' is used in the following structure for defining the
     type of numeric value that is being worked with.  It is declared at
     the top level of the file so that it works correctly for C++ as
     well as for C.

'typedef struct awk_number {'
'    double d;'
'    enum AWK_NUMBER_TYPE type;'
'    void *ptr;'
'} awk_number_t;'
     This represents a numeric value.  Internally, 'gawk' stores every
     number as either a C 'double', a GMP integer, or an MPFR
     arbitrary-precision floating-point value.  In order to allow
     extensions to also support GMP and MPFR values, numeric values are
     passed in this structure.

     The double-precision 'd' element is always populated in data
     received from 'gawk'.  In addition, by examining the 'type' member,
     an extension can determine if the 'ptr' member is either a GMP
     integer (type 'mpz_ptr'), or an MPFR floating-point value (type
     'mpfr_ptr_t'), and cast it appropriately.

          CAUTION: Any MPFR or MPZ values that you create and pass to
          'gawk' to save are _copied_.  This means you are responsible
          to release the storage once you're done with it.  See the
          sample 'intdiv' extension for some example code.

'typedef void *awk_scalar_t;'
     Scalars can be represented as an opaque type.  These values are
     obtained from 'gawk' and then passed back into it.  This is
     discussed in a general fashion in the text following this list, and
     in more detail in *note Symbol table by cookie::.

'typedef void *awk_value_cookie_t;'
     A "value cookie" is an opaque type representing a cached value.
     This is also discussed in a general fashion in the text following
     this list, and in more detail in *note Cached values::.

   Scalar values in 'awk' are numbers, strings, strnums, or typed
regexps.  The 'awk_value_t' struct represents values.  The 'val_type'
member indicates what is in the 'union'.

   Representing numbers is easy--the API uses a C 'double'.  Strings
require more work.  Because 'gawk' allows embedded NUL bytes in string
values, a string must be represented as a pair containing a data pointer
and length.  This is the 'awk_string_t' type.

   A strnum (numeric string) value is represented as a string and
consists of user input data that appears to be numeric.  When an
extension creates a strnum value, the result is a string flagged as user
input.  Subsequent parsing by 'gawk' then determines whether it looks
like a number and should be treated as a strnum, or as a regular string.

   This is useful in cases where an extension function would like to do
something comparable to the 'split()' function which sets the strnum
attribute on the array elements it creates.  For example, an extension
that implements CSV splitting would want to use this feature.  This is
also useful for a function that retrieves a data item from a database.
The PostgreSQL 'PQgetvalue()' function, for example, returns a string
that may be numeric or textual depending on the contents.

   Typed regexp values (*note Strong Regexp Constants::) are not of much
use to extension functions.  Extension functions can tell that they've
received them, and create them for scalar values.  Otherwise, they can
examine the text of the regexp through 'regex_value.str' and
'regex_value.len'.

   Identifiers (i.e., the names of global variables) can be associated
with either scalar values or with arrays.  In addition, 'gawk' provides
true arrays of arrays, where any given array element can itself be an
array.  Discussion of arrays is delayed until *note Array
Manipulation::.

   The various macros listed earlier make it easier to use the elements
of the 'union' as if they were fields in a 'struct'; this is a common
coding practice in C. Such code is easier to write and to read, but it
remains _your_ responsibility to make sure that the 'val_type' member
correctly reflects the type of the value in the 'awk_value_t' struct.

   Conceptually, the first three members of the 'union' (number, string,
and array) are all that is needed for working with 'awk' values.
However, because the API provides routines for accessing and changing
the value of a global scalar variable only by using the variable's name,
there is a performance penalty: 'gawk' must find the variable each time
it is accessed and changed.  This turns out to be a real issue, not just
a theoretical one.

   Thus, if you know that your extension will spend considerable time
reading and/or changing the value of one or more scalar variables, you
can obtain a "scalar cookie"(1) object for that variable, and then use
the cookie for getting the variable's value or for changing the
variable's value.  The 'awk_scalar_t' type holds a scalar cookie, and
the 'scalar_cookie' macro provides access to the value of that type in
the 'awk_value_t' struct.  Given a scalar cookie, 'gawk' can directly
retrieve or modify the value, as required, without having to find it
first.

   The 'awk_value_cookie_t' type and 'value_cookie' macro are similar.
If you know that you wish to use the same numeric or string _value_ for
one or more variables, you can create the value once, retaining a "value
cookie" for it, and then pass in that value cookie whenever you wish to
set the value of a variable.  This saves storage space within the
running 'gawk' process and reduces the time needed to create the value.

   ---------- Footnotes ----------

   (1) See the "cookie" entry in the Jargon file
(http://catb.org/jargon/html/C/cookie.html) for a definition of
"cookie", and the "magic cookie" entry in the Jargon file
(http://catb.org/jargon/html/M/magic-cookie.html) for a nice example.
See also the entry for "Cookie" in the *note Glossary::.

© manpagez.com 2000-2025
Individual documents may contain additional copyright information.