manpagez: man pages & more
man pt_astree(n)
Home | html | info | man
pt::ast(n)                       Parser Tools                       pt::ast(n)



______________________________________________________________________________


NAME

       pt::ast - Abstract Syntax Tree Serialization


SYNOPSIS

       package require Tcl  8.5

       package require pt::ast  ?1.1?

       ::pt::ast verify serial ?canonvar?

       ::pt::ast verify-as-canonical serial

       ::pt::ast canonicalize serial

       ::pt::ast print serial

       ::pt::ast bottomup cmdprefix ast

       cmdprefix ast

       ::pt::ast topdown cmdprefix pe

       ::pt::ast equal seriala serialb

       ::pt::ast terminal loc

       ::pt::ast nonterminal s start end ?child...?

_________________________________________________________________


DESCRIPTION

       Are  you  lost ?  Do you have trouble understanding this document ?  In
       that case please read the overview  provided  by  the  Introduction  to
       Parser  Tools.  This document is the entrypoint to the whole system the
       current package is a part of.

       This package provides commands  to  work  with  the  serializations  of
       abstract  syntax trees as managed by the Parser Tools, and specified in
       section AST serialization format.

       This is a supporting package in the Core Layer of Parser Tools.

       IMAGE: arch_core_support



API

       ::pt::ast verify serial ?canonvar?
              This command verifies that the content  of  serial  is  a  valid
              serialization of an abstract syntax tree and will throw an error
              if that is not the case. The result of the command is the  empty
              string.

              If  the  argument canonvar is specified it is interpreted as the
              name of a variable in the calling context. This variable will be
              written  to  if and only if serial is a valid regular serializa-
              tion. Its value will be a boolean, with True indicating that the
              serialization  is not only valid, but also canonical. False will
              be written for a valid, but non-canonical serialization.

              For the specification of  serializations  see  the  section  AST
              serialization format.

       ::pt::ast verify-as-canonical serial
              This  command  verifies  that  the  content of serial is a valid
              canonical serialization of an  abstract  syntax  tree  and  will
              throw  an  error if that is not the case. The result of the com-
              mand is the empty string.

              For the specification of canonical serializations see  the  sec-
              tion AST serialization format.

       ::pt::ast canonicalize serial
              This command assumes that the content of serial is a valid regu-
              lar serialization of an abstract syntax and will throw an  error
              if that is not the case.

              It  will then convert the input into the canonical serialization
              of the contained tree and return it as its result. If the  input
              is already canonical it will be returned unchanged.

              For  the  specification  of regular and canonical serializations
              see the section AST serialization format.

       ::pt::ast print serial
              This command assumes that the argument serial contains  a  valid
              serialization  of  an  abstract syntax tree and returns a string
              containing that tree in a human readable form.

              The exact format of this form is not  specified  and  cannot  be
              relied on for parsing or other machine-based activities.

              For  the  specification  of  serializations  see the section AST
              serialization format.

       ::pt::ast bottomup cmdprefix ast
              This command walks the abstract syntax tree ast from the  bottom
              up  to  the root, invoking the command prefix cmdprefix for each
              node. This implies that the children of a  node  N  are  handled
              before N.

              The command prefix has the signature

              cmdprefix ast
                     I.e.  it  is  invoked  with the ast node the walk is cur-
                     rently at.

                     The result returned by the command prefix replaces ast in
                     the  node  it was a child of, allowing transformations of
                     the tree.

                     This also means that for all inner node the  contents  of
                     the children elements are the results of the command pre-
                     fix invoked for the children of this node.

       ::pt::ast topdown cmdprefix pe
              This command walks the abstract syntax tree ast  from  the  root
              down  to  the  leaves, invoking the command prefix cmdprefix for
              each node. This implies that the children of a node N  are  han-
              dled after N.

              The  command  prefix has the same signature as for bottomup, see
              above.

              The result returned by the command prefix is ignored.

       ::pt::ast equal seriala serialb
              This command tests the two sbstract  syntax  trees  seriala  and
              serialb  for structural equality. The result of the command is a
              boolean value. It will be set to true if the trees  are  identi-
              cal, and false otherwise.

              String  equality  is  usable  only if we can assume that the two
              trees are pure Tcl lists.

       ::pt::ast terminal loc
              This command command constructs the  ast  for  a  terminal  node
              refering to the position loc in the input, and returns it as the
              result of the command.

       ::pt::ast nonterminal s start end ?child...?
              This command command constructs the ast for a  nonterminal  node
              refering  to  the symbol s covering the range of positions start
              to end in the input, and the set of child nodes child ...,  from
              left  right.  The  latter  may be empty. The constructed node is
              returned as the result of the command.



AST SERIALIZATION FORMAT

       Here we specify the format  used  by  the  Parser  Tools  to  serialize
       Abstract Syntax Trees (ASTs) as immutable values for transport, compar-
       ison, etc.

       Each node in an AST represents a nonterminal symbol of a  grammar,  and
       the  range of tokens/characters in the input covered by it. ASTs do not
       contain terminal symbols, i.e. tokens/characters. These can  be  recov-
       ered from the input given a symbol's location.

       We  distinguish  between regular and canonical serializations.  While a
       tree may have more than one regular serialization only exactly  one  of
       them will be canonical.

       Regular serialization

              [1]    The  serialization of any AST is the serialization of its
                     root node.

              [2]    The serialization of any node is a Tcl list containing at
                     least three elements.

                     [1]    The  first  element is the name of the nonterminal
                            symbol stored in the node.

                     [2]    The second and third element are the locations  of
                            the  first  and last token in the token stream the
                            node represents (covers).

                            [1]    Locations  are  provided  as   non-negative
                                   integer  offsets  from the beginning of the
                                   token stream, with the first token found in
                                   the stream located at offset 0 (zero).

                            [2]    The  end  location  has  to  be equal to or
                                   larger than the start location.

                     [3]    All elements after the first three  represent  the
                            children  of the node, which are themselves nodes.
                            This means that the serializations of nodes  with-
                            out  children, i.e. leaf nodes, have exactly three
                            elements.  The children are  stored  in  the  list
                            with  the  leftmost child first, and the rightmost
                            child last.

       Canonical serialization
              The canonical serialization of an abstract syntax tree  has  the
              format  as specified in the previous item, and then additionally
              satisfies the constraints below, which make it unique among  all
              the possible serializations of this tree.

              [1]    The  string  representation of the value is the canonical
                     representation of a pure Tcl list. I.e. it does not  con-
                     tain superfluous whitespace.



   EXAMPLE
       Assuming the parsing expression grammar below


       PEG calculator (Expression)
           Digit      <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9'   ;
           Sign       <- '-' / '+'                       ;
           Number     <- Sign? Digit+                         ;
           Expression <- '(' Expression ')' / (Factor (MulOp Factor)*)  ;
           MulOp      <- '*' / '/'                       ;
           Factor     <- Term (AddOp Term)*                   ;
           AddOp      <- '+'/'-'                         ;
           Term       <- Number                     ;
       END;


       and the input string
        120+5
       then a parser should deliver the abstract syntax tree below (except for
       whitespace)


       set ast {Expression 0 4
           {Factor 0 4
               {Term 0 2
                   {Number 0 2
                       {Digit 0 0}
                       {Digit 1 1}
                       {Digit 2 2}
                   }
               }
               {AddOp 3 3}
               {Term 4 4
                   {Number 4 4
                       {Digit 4 4}
                   }
               }
           }
       }


       Or, more graphical

       IMAGE: expr_ast



BUGS, IDEAS, FEEDBACK

       This document, and the package it describes, will  undoubtedly  contain
       bugs  and other problems.  Please report such in the category pt of the
       Tcllib  SF  Trackers  [http://sourceforge.net/tracker/?group_id=12883].
       Please  also  report any ideas for enhancements you may have for either
       package and/or documentation.


KEYWORDS

       EBNF, LL(k), PEG, TDPL, context-free  languages,  expression,  grammar,
       matching,  parser, parsing expression, parsing expression grammar, push
       down automaton, recursive descent, state, top-down  parsing  languages,
       transducer


CATEGORY

       Parsing and Grammars


COPYRIGHT

       Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>




pt                                    1.1                           pt::ast(n)

Mac OS X 10.8 - Generated Tue Sep 11 05:46:08 CDT 2012
© manpagez.com 2000-2025
Individual documents may contain additional copyright information.