manpagez: man pages & more
man djvuxmlparser(1)
Home | html | info | man
djvuxml(1)                    DjVuLibre XML Tools                   djvuxml(1)




NAME

       djvutoxml, djvuxmlparser - DjVuLibre XML Tools.



SYNOPSIS

       djvutoxml [options] inputdjvufile [outputxmlfile]
       djvuxmlparser [ -o djvufile ] inputxmlfile




DESCRIPTION

       The  DjVuLibre  XML  Tools provide for editing the metadata, hyperlinks
       and hidden text associated with  DjVu  files.   Unlike  djvused(1)  the
       DjVuLibre  XML  Tools rely on the XML technology and can take advantage
       of XML editors and verifiers.



DJVUTOXML

       Program djvutoxml creates a XML file outputxmlfile containing a  refer-
       ence  to  the  original  DjVu  document  inputdjvufile  as well as tags
       describing the metadata, hyperlinks, and hidden  text  associated  with
       the DjVu file.

       The following options are supported:

       --page pagenum
              Select  a  page  in a multi-page document.  Without this option,
              djvutoxml outputs the XML corresponding to all pages of the doc-
              ument.

       --with-text
              Specifies  the  HIDDENTEXT  element  for  each  page  should  be
              included in the output.  If specified  without  the  --with-anno
              flag then the --without-anno is implied.  If none of the --with-
              text, --without-text, --with-anno, or --without-anno, flags  are
              specified,  then  the  --with-text  and  --with-anno  flags  are
              implied.

       --without-text
              Specifies not to output the HIDDENTEXT element  for  each  page.
              If  specified  without  the --without-anno flag then the --with-
              anno flag is implied.

       --with-anno
              Specifies the area MAP element for each page should be  included
              in  the  output.  If specified without the --with-text flag then
              the --without-text flag is implied.

       --without-anno
              Specifies the area MAP element  for  each  page  should  not  be
              included in the output.  If specified without the --without-text
              flag then the --with-text flag is implied.




DJVUXMLPARSER

       Files produced by djvutoxml can then be modified using  either  a  text
       editor  or  a  XML  editor.   Program djvuxmlparser parses the XML file
       inputxmlfile in order to modify the metadata of the corresponding  DjVu
       file.

       -o djvufile
              In  principle the target DjVu file is the file referenced by the
              OBJECT element of the XML file.  This option provides the  means
              to override the filename specified in the OBJECT element.



DJVUXML DOCUMENT TYPE DEFINITION

       The document type definition file (DTD)

         /opt/local/share/djvu/pubtext/DjVuXML-s.dtd

       defines the input and output of the DjVu XML tools.

       The DjVuXML-s DTD is a simplification of the HTML DTD:

         http://www.w3c.org/TR/1998/REC-html40-19980424/sgml/dtd.html

       with  a  few new attributes added specific to DjVu.  Each of the speci-
       fied pages of a DjVu document are represented as OBJECT elements within
       the BODY element of the XML file.  Each OBJECT element may contain mul-
       tiple PARAM elements to specify attributes like page name,  resolution,
       and gamma factor.  Each OBJECT element may also contain one HIDDENTTEXT
       element to specify the hidden  text  (usually  generated  with  an  OCR
       engine) within the DjVu page.  In addition each OBJECT element may ref-
       erence a single area MAP element which contains multiple AREA  elements
       to represent all the hyperlink and highlight areas within the DjVu doc-
       ument.


   PARAM Elements
       Legal PARAM elements of a DjVu OBJECT include but are  not  limited  to
       PAGE  for specifying the page-name, GAMMA for specifying the gamma cor-
       rection factor (normally 2.2), and DPI for specifying the page  resolu-
       tion.


   HIDDENTEXT Elements
       The  HIDDENTEXT  elements  consists  of nested elements of PAGECOLUMNS,
       REGION, PARAGRAPH, LINE, and WORD.   The  most  deeply  nested  element
       specified,  should  specify  the bounding coordinates of the element in
       top-down orientation.  The body  of  the  most  deeply  nested  element
       should  contain  the text.  Most DjVu documents use either LINE or WORD
       as the lowest level element, but any element is  legal  as  the  lowest
       level element.  A white space is always added between WORD elements and
       a line feed is always added between  LINE  elements.   Since  languages
       such  as  Japanese  do not use spaces between words, it is quite common
       for Asian OCR engines to use WORD as characters instead.


   MAP Elements
       The body of the MAP elements consist of AREA elements.  In addition  to
       the attributes listed in

         http://www.w3.org/TR/1998/REC-html40-19980424/struct/objects.html#edef-AREA,

       the attributes bordertype, bordercolor, border, and highlight have been
       added to specify border type, border color, border width, and highlight
       colors respectively.  Legal values for each  of  these  attributes  are
       listed  in  the  DjVuXML-s  DTD.   In addition, the shape oval has been
       added to the legal list of shapes.  An oval uses a rectangular bounding
       box.



BUGS

       Perhaps it would have been better to use CC2 style sheets with standard
       HTML elements instead of defining the HIDDENTEXT element.



CREDITS

       The DjVu XML tools and DTD  were  written  by  Bill  C.  Riemers  <doc-
       bill@sourceforge.net> and Fred Crary.



SEE ALSO

       djvu(1), djvused(1), and utf8(7).



DjVuLibre XML Tools               11/15/2002                        djvuxml(1)

djvulibre 3.5.27 - Generated Sat Mar 14 18:31:42 CDT 2015
© manpagez.com 2000-2021
Individual documents may contain additional copyright information.