manpagez: man pages & more
info texinfo
Home | html | info | man
[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

22.4.4 HTML Cross-reference 8-bit Character Expansion

Usually, characters other than plain 7-bit ASCII are transformed into the corresponding Unicode code point(s) in Normalization Form C, which uses precomposed characters where available. (This is the normalization form recommended by the W3C and other bodies.) This holds when that code point is 0xffff or less, as it almost always is.

These will then be further transformed by the rules above into the string ‘_xxxx’, where xxxx is the code point in hex.

For example, combining this rule and the previous section:

 
@node @b{A} @TeX{} @u{B} @point{}@enddots{}
⇒ A-TeX-B_0306-_2605_002e_002e_002e

Notice: 1) @enddots expands to three periods which in turn expands to three ‘_002e’'s; 2) @u{B} is a `B' with a breve accent, which does not exist as a pre-accented Unicode character, therefore expands to ‘B_0306’ (B with combining breve).

When the Unicode code point is above 0xffff, the transformation is ‘__xxxxxx’, that is, two leading underscores followed by six hex digits. Since Unicode has declared that their highest code point is 0x10ffff, this is sufficient. (We felt it was better to define this extra escape than to always use six hex digits, since the first two would nearly always be zeros.)

This method works fine if the node name consists mostly of ASCII characters and contains only few 8-bit ones. If the document is written in a language whose script is not based on the Latin alphabet (such as, e.g. Ukrainian), it will create file names consisting entirely of ‘_xxxx’ notations, which is inconvenient.

To handle such cases, makeinfo offers ‘--transliterate-file-names’ command line option. This option enables transliteration of node names into ASCII characters for the purposes of file name creation and referencing. The transliteration is based on phonetic principle, which makes the produced file names easily readable.

For the definition of Unicode Normalization Form C, see Unicode report UAX#15, http://www.unicode.org/reports/tr15/. Many related documents and implementations are available elsewhere on the web.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]
© manpagez.com 2000-2025
Individual documents may contain additional copyright information.