manpagez: man pages & more
info libidn2
Home | html | info | man

libidn2: Invoking idn2

 
 5 Invoking idn2
 ***************
 
 ‘idn2’ translates internationalized domain names to the IDNA2008 encoded
 format, either for lookup or registration.
 
    If strings are specified on the command line, they are used as input
 and the computed output is printed to standard output ‘stdout’.  If no
 strings are specified on the command line, the program read data, line
 by line, from the standard input ‘stdin’, and print the computed output
 to standard output.  What processing is performed (e.g., lookup or
 register) is indicated by options.  If any errors are encountered, the
 execution of the applications is aborted.
 
    All strings are expected to be encoded in the preferred charset used
 by your locale.  Use ‘--debug’ to find out what this charset is.  On
 POSIX systems you may use the ‘LANG’ environment variable to specify a
 different locale.
 
    To process a string that starts with ‘-’, for example ‘-foo’, use
 ‘--’ to signal the end of parameters, as in ‘idn2 -r -- -foo’.
 
 5.1 Options
 ===========
 
 ‘idn2’ recognizes these commands:
 
   -h, --help               Print help and exit
 
   -V, --version            Print version and exit
 
   -d, --decode             Decode (punycode) domain name
 
   -l, --lookup             Lookup domain name (default)
 
   -r, --register           Register label
 
   -T, --tr46t              Enable TR46 transitional processing
 
   -N, --tr46nt             Enable TR46 non-transitional processing
 
       --debug              Print debugging information
 
       --quiet              Silent operation
 
 5.2 Environment Variables
 =========================
 
 On POSIX systems the LANG environment variable can be used to override
 the system locale for the command being invoked.  The system locale may
 influence what character set is used to decode data (i.e., strings on
 the command line or data read from the standard input stream), and to
 encode data to the standard output.  If your system is set up correctly,
 however, the application will use the correct locale and character set
 automatically.  Example usage:
 
      $ LANG=en_US.UTF-8 idn2
      ...
 
 5.3 Examples
 ============
 
 Standard usage, reading input from standard input and disabling license
 and usage instructions:
 
      jas@latte:~$ idn2 --quiet
      räksmörgås.se
      xn--rksmrgs-5wao1o.se
      ...
 
    Reading input from the command line:
 
      jas@latte:~$ idn2 räksmörgås.se blåbærgrød.no
      xn--rksmrgs-5wao1o.se
      xn--blbrgrd-fxak7p.no
      jas@latte:~$
 
    Testing the IDNA2008 Register function:
 
      jas@latte:~$ idn2 --register fußball
      xn--fuball-cta
      jas@latte:~$
 
 5.4 Troubleshooting
 ===================
 
 Getting character data encoded right, and making sure Libidn2 use the
 same encoding, can be difficult.  The reason for this is that most
 systems may encode character data in more than one character encoding,
 i.e., using ‘UTF-8’ together with ‘ISO-8859-1’ or ‘ISO-2022-JP’.  This
 problem is likely to continue to exist until only one character encoding
 come out as the evolutionary winner, or (more likely, at least to some
 extents) forever.
 
    The first step to troubleshooting character encoding problems with
 Libidn2 is to use the ‘--debug’ parameter to find out which character
 set encoding ‘idn2’ believe your locale uses.
 
      jas@latte:~$ idn2 --debug --quiet ""
      Charset: UTF-8
 
      jas@latte:~$
 
    If it prints ‘ANSI_X3.4-1968’ (i.e., ‘US-ASCII’), this indicate you
 have not configured your locale properly.  To configure the locale, you
 can, for example, use ‘LANG=sv_SE.UTF-8; export LANG’ at a ‘/bin/sh’
 prompt, to set up your locale for a Swedish environment using ‘UTF-8’ as
 the encoding.
 
    Sometimes ‘idn2’ appear to be unable to translate from your system
 locale into ‘UTF-8’ (which is used internally), and you will get an
 error message like this:
 
      idn2: lookup: could not convert string to UTF-8
 
    One explanation is that you didn’t install the ‘iconv’ conversion
 tools.  You can find it as a standalone library in GNU Libiconv
 (<https://www.gnu.org/software/libiconv/>).  On many GNU/Linux systems,
 this library is part of the system, but you may have to install
 additional packages to be able to use it.
 
    Another explanation is that the error is correct and you are feeding
 ‘idn2’ invalid data.  This can happen inadvertently if you are not
 careful with the character set encoding you use.  For example, if your
 shell run in a ‘ISO-8859-1’ environment, and you invoke ‘idn2’ with the
 ‘LANG’ environment variable as follows, you will feed it ‘ISO-8859-1’
 characters but force it to believe they are ‘UTF-8’.  Naturally this
 will lead to an error, unless the byte sequences happen to be valid
 ‘UTF-8’.  Note that even if you don’t get an error, the output may be
 incorrect in this situation, because ‘ISO-8859-1’ and ‘UTF-8’ does not
 in general encode the same characters as the same byte sequences.
 
      jas@latte:~$ idn2 --quiet --debug ""
      Charset: ISO-8859-1
 
      jas@latte:~$ LANG=sv_SE.UTF-8 idn2 --debug räksmörgås
      Charset: UTF-8
      input[0] = 0x72
      input[1] = 0xc3
      input[2] = 0xa4
      input[3] = 0xc3
      input[4] = 0xa4
      input[5] = 0x6b
      input[6] = 0x73
      input[7] = 0x6d
      input[8] = 0xc3
      input[9] = 0xb6
      input[10] = 0x72
      input[11] = 0x67
      input[12] = 0xc3
      input[13] = 0xa5
      input[14] = 0x73
      UCS-4 input[0] = U+0072
      UCS-4 input[1] = U+00e4
      UCS-4 input[2] = U+00e4
      UCS-4 input[3] = U+006b
      UCS-4 input[4] = U+0073
      UCS-4 input[5] = U+006d
      UCS-4 input[6] = U+00f6
      UCS-4 input[7] = U+0072
      UCS-4 input[8] = U+0067
      UCS-4 input[9] = U+00e5
      UCS-4 input[10] = U+0073
      output[0] = 0x72
      output[1] = 0xc3
      output[2] = 0xa4
      output[3] = 0xc3
      output[4] = 0xa4
      output[5] = 0x6b
      output[6] = 0x73
      output[7] = 0x6d
      output[8] = 0xc3
      output[9] = 0xb6
      output[10] = 0x72
      output[11] = 0x67
      output[12] = 0xc3
      output[13] = 0xa5
      output[14] = 0x73
      UCS-4 output[0] = U+0072
      UCS-4 output[1] = U+00e4
      UCS-4 output[2] = U+00e4
      UCS-4 output[3] = U+006b
      UCS-4 output[4] = U+0073
      UCS-4 output[5] = U+006d
      UCS-4 output[6] = U+00f6
      UCS-4 output[7] = U+0072
      UCS-4 output[8] = U+0067
      UCS-4 output[9] = U+00e5
      UCS-4 output[10] = U+0073
      xn--rksmrgs-5waap8p
      jas@latte:~$
 
    The sense moral here is to forget about ‘LANG’ (instead, configure
 your system locale properly) unless you know what you are doing, and if
 you want to use ‘LANG’, do it carefully and after verifying with
 ‘--debug’ that you get the desired results.
 
© manpagez.com 2000-2024
Individual documents may contain additional copyright information.