manpagez: man pages & more
info gawk
Home | html | info | man

File: gawk.info,  Node: Performance bugs,  Next: Asking for help,  Prev: Usenet,  Up: Bugs

B.4.4 What To Do If You Think There Is A Performance Issue
----------------------------------------------------------

If you think that 'gawk' is too slow at doing a particular task, you
should investigate before sending in a bug report.  Here are the steps
to follow:

  1. Run 'gawk' with the '--profile' option (*note Options::) to see
     what your program is doing.  It may be that you have written it in
     an inefficient manner.  For example, you may be doing something for
     every record that could be done just once, for every file.  (Use a
     'BEGINFILE' rule; *note BEGINFILE/ENDFILE::.)  Or you may be doing
     something for every file that only needs to be done once per run of
     the program.  (Use a 'BEGIN' rule; *note BEGIN/END::.)

  2. If profiling at the 'awk' level doesn't help, then you will need to
     compile 'gawk' itself for profiling at the C language level.

     To do that, start with the latest released version of 'gawk'.
     Unpack the source code in a new directory, and configure it:

          $ tar -xpzvf gawk-X.Y.Z.tar.gz
          -| ...                                Output omitted
          $ cd gawk-X.Y.Z
          $ ./configure
          -| ...                                Output omitted

  3. Edit the files 'Makefile' and 'support/Makefile'.  Change every
     instance of '-O2' or '-O' to '-pg'.  This causes 'gawk' to be
     compiled for profiling.

  4. Compile the program by running the 'make' command:

          $ make
          -| ...                                Output omitted

  5. Run the freshly compiled 'gawk' on a _real_ program, using _real_
     data.  Using an artificial program to try to time one particular
     feature of 'gawk' is useless; real 'awk' programs generally spend
     most of their time doing I/O, not computing.  If you want to prove
     that something is slow, it _must_ be done using a real program and
     real data.

     Use a data file that is large enough for the statistical profiling
     to measure where 'gawk' spends its time.  It should be at least 100
     megabytes in size.

          $ ./gawk -f realprogram.awk realdata > /dev/null

  6. When done, you should have a file in the current directory named
     'gmon.out'.  Run the command 'gprof gawk gmon.out > gprof.out'.

  7. Submit a bug report explaining what you think is slow.  Include the
     'gprof.out' file with it.

     Preferably, you should also submit the program and the data, or
     else indicate where to get the data if the file is large.

  8. If you have not submitted your program and data, be prepared to
     apply patches and rerun the profiling in order to see if the
     patches were effective.

   If you are incapable or unwilling to do the steps listed above, then
you will just have to live with 'gawk' as it is.

© manpagez.com 2000-2025
Individual documents may contain additional copyright information.