Copyright © 2002 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover Texts being "A GNU Manual," and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled "GNU Free Documentation License."(a) The FSF's Back-Cover Text is: "You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development."
GNU Libidn is an implementation of the Stringprep (with the Nameprep and Kerberos 5 profiles), Punycode and IDNA specifications defined by the IETF Internationalized Domain Names (IDN) working group, used for internationalized domain names. It is licensed under the GNU Lesser General Public License.
The library contains a generic Stringprep implementation (including Unicode 3.2 NFKC normalization, table mapping of characters, and Bidirectional Character handling), two Stringprep profiles; Nameprep (for IDN) and Kerberos5, as well as a Punycode and IDNA.
The Stringprep API consists of two main functions, one for converting data from the system's native representation into UTF-8, and one function to perform the Stringprep processing. Each stringprep profile has a corresponding CPP macro. Adding a new Stringprep profile for your application within the API is straightforward. The Punycode API consists of one encoding function and one decoding function. The IDNA API consists of the ToASCII and ToUnicode functions.
The library is currently used by forthcoming SASL and Kerberos libraries to process user names and passwords before they are input to cryptographic operations. It is also used by experimental code for domain name related purposes.
GNU Libidn is developed for the GNU/Linux system, but runs on over 20 Unix platforms (including Solaris, IRIX, AIX, and Tru64) and Windows.
This manual documents the library programming interface. All functions and data types provided by the library are explained.
The reader is assumed to possess basic familiarity with internationalization concepts and network programming in C or C++.
This manual can be used in several ways. If read from the beginning to the end, it gives a good introduction into the library and how it can be used in an application. Forward references are included where necessary. Later on, the manual can be used as a reference manual to get just the information needed about any particular interface of the library. Experienced programmers might want to start looking at the examples at the end of the manual (see Examples), and then only read up those parts of the interface which are unclear.
This library might have a couple of advantages over other libraries doing a similar job.
Libidn has at some point in time been tested on the following platforms.
GCC 2.95.4 and GNU Make. This is the main development platform. alphaev67-unknown-linux-gnu, alphaev6-unknown-linux-gnu, hppa64-unknown-linux-gnu, i686-pc-linux-gnu, ia64-unknown-linux-gnu.
Tru64 UNIX C compiler and Tru64 Make. alphaev68-dec-osf5.1.
GCC 2.96 and GNU Make. alphaev67-unknown-linux-gnu.
GCC 3.0 and GNU Make. ia64-unknown-linux-gnu.
GCC 2.96 and GNU Make. i686-pc-linux-gnu.
GCC 3.2 and GNU Make. i686-pc-linux-gnu.
mips-sgi-irix6.5, MIPS C compiler, IRIX Make.
rs6000-ibm-aix4.3.2.0, IBM C for AIX compiler, AIX Make.
GCC 3.2, GNU make. i686-pc-cygwin
HP-UX C compiler and HP Make. hppa2.0w-hp-hpux11.11.
Sun WorkShop Compiler C 6.0 and SUN Make. sparc-sun-solaris2.8.
GCC 2.95.3 and GNU Make. alpha-unknown-netbsd1.6, i386-unknown-netbsdelf1.6.
GCC 2.95.3 and GNU Make. i386-unknown-openbsd3.1.
GCC 2.95.4 and GNU Make. alpha-unknown-freebsd4.7, i386-unknown-freebsd4.7.
If you use Libidn on, or port Libidn to, a new platform please report it to the author.
If you think you have found a bug in Libidn, please investigate it and report it.
Please make an effort to produce a self-contained report, with something definite that can be tested or debugged. Vague queries or piecemeal messages are difficult to act on and don't help the development effort.
If your bug report is good, we will do our best to help you to get a corrected version of the software; if the bug report is poor, we won't do anything about it (apart from asking you to send better bug reports).
If you think something in this manual is unclear, or downright incorrect, or if the language needs to be improved, please also send a note.
Send your bug report to:
bug-libidn@gnu.org
To use `Libidn, you have to perform some changes to your sources and the build system. The necessary changes are small and explained in the following sections. At the end of this chapter, it is described how the library is initialized, and how the requirements of the library are verified.
A faster way to find out how to adapt your application for use with `Libidn' may be to look at the examples at the end of this manual (see Examples).
The library contains a few independent parts, and each part export the interfaces (data types and functions) in a header file. You must include the appropriate header files in all programs using the library, either directly or through some other header file, like this:
#include <stringprep.h>
The header files and the functions they define are categorized as follows:
The name space of the stringprep part of Libidn is stringprep_*
for function names, Stringprep*
for data types and
STRINGPREP_*
for other symbols. In addition the same name
prefixes with one prepended underscore are reserved for internal use
and should never be used by an application.
This header file uses the same namespace as the main stringprep.h
header file.
This header file uses the same namespace as the main stringprep.h
header file.
This header file uses the same namespace as the main stringprep.h
header file.
The name space of the punycode part of Libidn is punycode_*
for
function names, Punycode*
for data types and PUNYCODE_*
for other symbols. In addition the same name prefixes with one
prepended underscore are reserved for internal use and should never be
used by an application.
The name space of the IDNA part of Libidn is idna_*
for
function names, Idna*
for data types and IDNA_*
for
other symbols. In addition the same name prefixes with one prepended
underscore are reserved for internal use and should never be used by
an application.
Libidn is stateless and does not need any initialization.
It is often desirable to check that the version of `Libidn' used is indeed one which fits all requirements. Even with binary compatibility new features may have been introduced but due to problem with the dynamic linker an old version is actually used. So you may want to check that the version is okay right after program startup.
const char * stringprep_check_version (const char * req_version) | Function |
req_version: Required version number, or NULL. Check that the the version of the library is at minimum the requested one and return the version string; return NULL if the condition is not satisfied. If a NULL is passed to this function, no check is done, but the version string is simply returned. See STRINGPREP_VERSION for a suitable Version string of run-time library, or NULL if the run-time library does not meet the required version number. |
The normal way to use the function is to put something similar to the
following first in your main()
:
if (!stringprep_check_version (STRINGPREP_VERSION)) { printf ("stringprep_check_version() failed:\n" "Header file incompatible with shared library.\n"); exit(1); }
If you want to compile a source file including e.g. the `idna.h' header
file, you must make sure that the compiler can find it in the
directory hierarchy. This is accomplished by adding the path to the
directory in which the header file is located to the compilers include
file search path (via the -I
option).
However, the path to the include file is determined at the time the
source is configured. To solve this problem, `Libidn' uses the
external package pkg-config
that knows the path to the
include file and other configuration options. The options that need
to be added to the compiler invocation at compile time are output by
the --cflags
option to pkg-config libidn
. The
following example shows how it can be used at the command line:
gcc -c foo.c `pkg-config libidn --cflags`
Adding the output of pkg-config libidn --cflags
to the
compilers command line will ensure that the compiler can find e.g. the
idna.h header file.
A similar problem occurs when linking the program with the library.
Again, the compiler has to find the library files. For this to work,
the path to the library files has to be added to the library search
path (via the -L
option). For this, the option
--libs
to pkg-config libidn
can be used. For
convenience, this option also outputs all other options that are
required to link the program with the `libidn' libarary. The example
shows how to link foo.o
with the `libidn' library to a program
foo
.
gcc -o foo foo.o `pkg-config libidn --libs`
Of course you can also combine both examples to a single command by
specifying both options to pkg-config
:
gcc -o foo foo.c `pkg-config libidn --cflags --libs`
Stringprep describes a framework for preparing Unicode text strings in order to increase the likelihood that string input and string comparison work in ways that make sense for typical users throughout the world. The stringprep protocol is useful for protocol identifier values, company and personal names, internationalized domain names, and other text strings.
int stringprep (char * in, int maxlen, int flags, Stringprep_profile * profile) | Function |
in: input/ouput array with string to prepare. maxlen: maximum length of input/output array. flags: optional stringprep profile flags. profile: pointer to stringprep profile to use. Prepare the input UTF-8 string according to the stringprep profile.
Normally application programmers use stringprep profile macros such
as Returns 0 iff successful, or an error code. |
unsigned long stringprep_utf8_to_unichar (const char * p) | Function |
p: a pointer to Unicode character encoded as UTF-8 Converts a sequence of bytes encoded as UTF-8 to a Unicode character.
If the resulting character |
int stringprep_unichar_to_utf8 (unsigned long c, char * outbuf) | Function |
c: a ISO10646 character code outbuf: output buffer, must have at least 6 bytes of space.
If NULL, the length will be computed and returned
and nothing will be written to Converts a single character to UTF-8. number of bytes written |
unsigned long * stringprep_utf8_to_ucs4 (const char * str, int len, int * items_written) | Function |
str: a UTF-8 encoded string len: the maximum length of items_written: location to store the number of characters in the result, or NULL. Convert a string from UTF-8 to a 32-bit fixed width representation as UCS-4, assuming valid UTF-8 input. This function does no error checking on the input. a pointer to a newly allocated UCS-4 string.
This value must be freed with |
char * stringprep_ucs4_to_utf8 (const unsigned long * str, int len, int * items_read, int * items_written) | Function |
str: a UCS-4 encoded string len: the maximum length of items_read: location to store number of characters read read, or NULL. items_written: location to store number of bytes written or NULL. The value here stored does not include the trailing 0 byte. Convert a string from a 32-bit fixed width representation as UCS-4. to UTF-8. The result will be terminated with a 0 byte. a pointer to a newly allocated UTF-8 string.
This value must be freed with |
char * stringprep_utf8_nfkc_normalize (const char * str, int len) | Function |
str: a UTF-8 encoded string. len: length of Converts a string into canonical form, standardizing
such issues as whether a character with an accent
is represented as a base character and combining
accent or as a single precomposed character. You
should generally call The normalization mode is NFKC (ALL COMPOSE). It standardizes differences that do not affect the text content, such as the above-mentioned accent representation. It standardizes the "compatibility" characters in Unicode, such as SUPERSCRIPT THREE to the standard forms (in this case DIGIT THREE). Formatting information may be lost but for most text operations such characters should be considered the same. It returns a result with composed forms rather than a maximally decomposed form. a newly allocated string, that is the
NFKC normalized form of |
unsigned long * stringprep_ucs4_nfkc_normalize (unsigned long * str, int len) | Function |
str: a Unicode string. len: length of Converts UCS4 string into UTF-8 and runs
a newly allocated Unicode string, that is the NFKC
normalized form of |
const char * stringprep_locale_charset ( void) | Function |
Return the character set used by the system locale. It will never return NULL, but use "ASCII" as a fallback. |
char * stringprep_convert (const char * str, const char * to_codeset, const char * from_codeset) | Function |
str: input zero-terminated string. to_codeset: name of destination character set. from_codeset: name of origin character set, as used by Convert the string from one character set to another using the
system's Returns newly allocated zero-terminated string which
is |
char * stringprep_locale_to_utf8 (const char * str) | Function |
str: input zero terminated string. Convert string encoded in the locale's character set into UTF-8 by
using Returns newly allocated zero-terminated string which
is |
char * stringprep_utf8_to_locale (const char * str) | Function |
str: input zero terminated string. Convert string encoded in UTF-8 into the locale's character set by
using Returns newly allocated zero-terminated string which
is |
Punycode is a simple and efficient transfer encoding syntax designed for use with Internationalized Domain Names in Applications. It uniquely and reversibly transforms a Unicode string into an ASCII string. ASCII characters in the Unicode string are represented literally, and non-ASCII characters are represented by ASCII characters that are allowed in host name labels (letters, digits, and hyphens). This document defines a general algorithm called Bootstring that allows a string of basic code points to uniquely represent any string of code points drawn from a larger set. Punycode is an instance of Bootstring that uses particular parameter values specified by this document, appropriate for IDNA.
int punycode_encode (size_t input_length, const unsigned long input[], const unsigned char case_flags[], size_t * output_length, char output[]) | Function |
input_length: The input_length is the number of code points in the input. output_length: The output_length is an in/out argument: the caller passes in the maximum number of code points that it can receive, and on successful return it will contain the number of code points actually output. Converts Unicode to Punycode. The return value can be any of the punycode_status values defined above except punycode_bad_input; if not punycode_success, then output_size and output might contain garbage. |
int punycode_decode (size_t input_length, const char input[], size_t * output_length, unsigned long output[], unsigned char case_flags[]) | Function |
input_length: The input_length is the number of code points in the input. output_length: The output_length is an in/out argument: the caller passes in the maximum number of code points that it can receive, and on successful return it will contain the actual number of code points output. Converts Punycode to Unicode. The return value can be any of the punycode_status values defined above; if not punycode_success, then output_length, output, and case_flags might contain garbage. On success, the decoder will never need to write an output_length greater than input_length, because of how the encoding is defined. |
Until now, there has been no standard method for domain names to use characters outside the ASCII repertoire. The IDNA document defines internationalized domain names (IDNs) and a mechanism called IDNA for handling them in a standard fashion. IDNs use characters drawn from a large repertoire (Unicode), but IDNA allows the non-ASCII characters to be represented using only the ASCII characters already allowed in so-called host names today. This backward-compatible representation is required in existing protocols like DNS, so that IDNs can be introduced with no changes to the existing infrastructure. IDNA is only meant for processing domain names, not free text.
int idna_to_ascii (const unsigned long * in, size_t inlen, char * out, int allowunassigned, int usestd3asciirules) | Function |
in: input array with unicode code points. inlen: length of input array with unicode code points. out: output zero terminated string that must have room for at least 63 characters plus the terminating zero. allowunassigned: boolean value as per IDNA specification. usestd3asciirules: boolean value as per IDNA specification. The ToASCII operation takes a sequence of Unicode code points that make up one label and transforms it into a sequence of code points in the ASCII range (0..7F). If ToASCII succeeds, the original sequence and the resulting sequence are equivalent labels. It is important to note that the ToASCII operation can fail. ToASCII fails if any step of it fails. If any step of the ToASCII operation fails on any label in a domain name, that domain name MUST NOT be used as an internationalized domain name. The method for deadling with this failure is application-specific. The inputs to ToASCII are a sequence of code points, the AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of ToASCII is either a sequence of ASCII code points or a failure condition. ToASCII never alters a sequence of code points that are all in the ASCII range to begin with (although it could fail). Applying the ToASCII operation multiple times has exactly the same effect as applying it just once. Returns 0 on success, or an error code. |
int idna_to_unicode (const unsigned long * in, size_t inlen, unsigned long * out, size_t * outlen, int allowunassigned, int usestd3asciirules) | Function |
in: input array with unicode code points. inlen: length of input array with unicode code points. out: output array with unicode code points. outlen: on input, maximum size of output array with unicode code points, on exit, actual size of output array with unicode code points. allowunassigned: boolean value as per IDNA specification. usestd3asciirules: boolean value as per IDNA specification. The ToUnicode operation takes a sequence of Unicode code points that make up one label and returns a sequence of Unicode code points. If the input sequence is a label in ACE form, then the result is an equivalent internationalized label that is not in ACE form, otherwise the original sequence is returned unaltered. ToUnicode never fails. If any step fails, then the original input sequence is returned immediately in that step. The ToUnicode output never contains more code points than its input. Note that the number of octets needed to represent a sequence of code points depends on the particular character encoding used. The inputs to ToUnicode are a sequence of code points, the AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of ToUnicode is always a sequence of Unicode code points. Returns error condition, but it must only be used for debugging purposes. The output buffer is always guaranteed to contain the correct data according to the specification (sans malloc induced errors). NB! This means that you normally ignore the return code from this function, as checking it means breaking the standard. |
int idna_ucs4_to_ace (const unsigned long * input, char ** output) | Function |
input: zero terminated input Unicode string. output: pointer to newly allocated output string. Convert UCS-4 domain name to ASCII string. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller. Returns IDNA_SUCCESS on success, or error code. |
int idna_utf8_to_ace (const char * input, char ** output) | Function |
input: zero terminated input UTF-8 string. output: pointer to newly allocated output string. Convert UTF-8 domain name to ASCII string. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller. Returns IDNA_SUCCESS on success, or error code. |
int idna_locale_to_ace (const char * input, char ** output) | Function |
input: zero terminated input UTF-8 string. output: pointer to newly allocated output string. Convert domain name in the locale's encoding to ASCII string. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller. Returns IDNA_SUCCESS on success, or error code. |
int idna_ucs4ace_to_ucs4 (const unsigned long * input, unsigned long ** output) | Function |
input: zero-terminated Unicode string. output: pointer to newly allocated output Unicode string. Convert possibly ACE encoded domain name in UCS-4 format into a UCS-4 string. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller. Returns IDNA_SUCCESS on success, or error code. |
int idna_utf8ace_to_ucs4 (const char * input, unsigned long ** output) | Function |
input: zero-terminated UTF-8 string. output: pointer to newly allocated output Unicode string. Convert possibly ACE encoded domain name in UTF-8 format into a UCS-4 string. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller. Returns IDNA_SUCCESS on success, or error code. |
int idna_utf8ace_to_utf8 (const char * input, char ** output) | Function |
input: zero-terminated UTF-8 string. output: pointer to newly allocated output UTF-8 string. Convert possibly ACE encoded domain name in UTF-8 format into a UTF-8 string. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller. Returns IDNA_SUCCESS on success, or error code. |
int idna_utf8ace_to_locale (const char * input, char ** output) | Function |
input: zero-terminated UTF-8 string. output: pointer to newly allocated output string encoded in the current locale's character set. Convert possibly ACE encoded domain name in UTF-8 format into a string encoded in the current locale's character set. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller. Returns IDNA_SUCCESS on success, or error code. |
int idna_localeace_to_locale (const char * input, char ** output) | Function |
input: zero-terminated string encoded in the current locale's character set. output: pointer to newly allocated output string encoded in the current locale's character set. Convert possibly ACE encoded domain name in the locale's character set into a string encoded in the current locale's character set. The AllowUnassigned flag is false and std3asciirules flag is false. The domain name may contain several labels, separated by dots. The output buffer must be deallocated by the caller. Returns IDNA_SUCCESS on success, or error code. |
This chapter contains example code which illustrate how `Libidn' can be used when writing your own application.
This example demonstrates how the stringprep functions are used.
/* example.c Example code showing how to use stringprep(). * Copyright (C) 2002 Simon Josefsson * * This file is part of Libstringprep. * * Libstringprep is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * Libstringprep is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with Libstringprep; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <stringprep_nameprep.h> /* * Compiling using libtool and pkg-config is recommended: * * $ libtool cc -o example example.c `pkg-config --cflags --libs libstringprep` * $ ./example * Input string encoded as `ISO-8859-1': ª * Before locale2utf8 (length 2): aa 0a * Before stringprep (length 3): c2 aa 0a * After stringprep (length 2): 61 0a * $ * */ int main (int argc, char *argv[]) { char buf[BUFSIZ]; char *p; int rc, i; printf ("Input string encoded as `%s': ", stringprep_locale_charset ()); fflush (stdout); fgets (buf, BUFSIZ, stdin); printf ("Before locale2utf8 (length %d): ", strlen (buf)); for (i = 0; i < strlen (buf); i++) printf ("%02x ", buf[i] & 0xFF); printf ("\n"); p = stringprep_locale_to_utf8 (buf); if (p) { strcpy (buf, p); free (p); } else printf ("Could not convert string to UTF-8, continuing anyway...\n"); printf ("Before stringprep (length %d): ", strlen (buf)); for (i = 0; i < strlen (buf); i++) printf ("%02x ", buf[i] & 0xFF); printf ("\n"); rc = stringprep (buf, BUFSIZ, 0, stringprep_nameprep); if (rc != STRINGPREP_OK) printf ("Stringprep failed with rc %d...\n", rc); else { printf ("After stringprep (length %d): ", strlen (buf)); for (i = 0; i < strlen (buf); i++) printf ("%02x ", buf[i] & 0xFF); printf ("\n"); } return 0; }
This example demonstrates how the punycode functions are used.
/* example2.c Example code showing how to use punycode. * Copyright (C) 2002 Adam M. Costello * Copyright (C) 2002 Simon Josefsson * * This file is part of Libstringprep. * * Libstringprep is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * Libstringprep is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with Libstringprep; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * */ /* * This file is derived from from draft-ietf-idn-punycode-03.txt by * Adam M. Costello. * * Disclaimer and license: Regarding this entire document or any * portion of it (including the pseudocode and C code), the author * makes no guarantees and is not responsible for any damage resulting * from its use. The author grants irrevocable permission to anyone * to use, modify, and distribute it in any way that does not diminish * the rights of anyone else to use, modify, and distribute it, * provided that redistributed derivative works do not contain * misleading author or version information. Derivative works need * not be licensed under similar terms. * */ #include <assert.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include "punycode.h" /* For testing, we'll just set some compile-time limits rather than */ /* use malloc(), and set a compile-time option rather than using a */ /* command-line option. */ enum { unicode_max_length = 256, ace_max_length = 256 }; static void usage (char **argv) { fprintf (stderr, "\n" "%s -e reads code points and writes a Punycode string.\n" "%s -d reads a Punycode string and writes code points.\n" "\n" "Input and output are plain text in the native character set.\n" "Code points are in the form u+hex separated by whitespace.\n" "Although the specification allows Punycode strings to contain\n" "any characters from the ASCII repertoire, this test code\n" "supports only the printable characters, and needs the Punycode\n" "string to be followed by a newline.\n" "The case of the u in u+hex is the force-to-uppercase flag.\n", argv[0], argv[0]); exit (EXIT_FAILURE); } static void fail (const char *msg) { fputs (msg, stderr); exit (EXIT_FAILURE); } static const char too_big[] = "input or output is too large, recompile with larger limits\n"; static const char invalid_input[] = "invalid input\n"; static const char overflow[] = "arithmetic overflow\n"; static const char io_error[] = "I/O error\n"; /* The following string is used to convert printable */ /* characters between ASCII and the native charset: */ static const char print_ascii[] = "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n" "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n" " !\"#$%&'()*+,-./" "0123456789:;<=>?" "\x40ABCDEFGHIJKLMNO" "PQRSTUVWXYZ[\\]^_" "`abcdefghijklmno" "pqrstuvwxyz{|}~\n"; int main (int argc, char **argv) { int status; int r; unsigned int input_length, output_length, j; unsigned char case_flags[unicode_max_length]; if (argc != 2) usage (argv); if (argv[1][0] != '-') usage (argv); if (argv[1][2] != 0) usage (argv); if (argv[1][1] == 'e') { unsigned long input[unicode_max_length]; unsigned long codept; char output[ace_max_length + 1], uplus[3]; int c; /* Read the input code points: */ input_length = 0; for (;;) { r = scanf ("%2s%lx", uplus, &codept); if (ferror (stdin)) fail (io_error); if (r == EOF || r == 0) break; if (r != 2 || uplus[1] != '+' || codept > (unsigned long) -1) { fail (invalid_input); } if (input_length == unicode_max_length) fail (too_big); if (uplus[0] == 'u') case_flags[input_length] = 0; else if (uplus[0] == 'U') case_flags[input_length] = 1; else fail (invalid_input); input[input_length++] = codept; } /* Encode: */ output_length = ace_max_length; status = punycode_encode (input_length, input, case_flags, &output_length, output); if (status == PUNYCODE_BAD_INPUT) fail (invalid_input); if (status == PUNYCODE_BIG_OUTPUT) fail (too_big); if (status == PUNYCODE_OVERFLOW) fail (overflow); assert (status == PUNYCODE_SUCCESS); /* Convert to native charset and output: */ for (j = 0; j < output_length; ++j) { c = output[j]; assert (c >= 0 && c <= 127); if (print_ascii[c] == 0) fail (invalid_input); output[j] = print_ascii[c]; } output[j] = 0; r = puts (output); if (r == EOF) fail (io_error); return EXIT_SUCCESS; } if (argv[1][1] == 'd') { char input[ace_max_length + 2], *p, *pp; unsigned long output[unicode_max_length]; /* Read the Punycode input string and convert to ASCII: */ fgets (input, ace_max_length + 2, stdin); if (ferror (stdin)) fail (io_error); if (feof (stdin)) fail (invalid_input); input_length = strlen (input) - 1; if (input[input_length] != '\n') fail (too_big); input[input_length] = 0; for (p = input; *p != 0; ++p) { pp = strchr (print_ascii, *p); if (pp == 0) fail (invalid_input); *p = pp - print_ascii; } /* Decode: */ output_length = unicode_max_length; status = punycode_decode (input_length, input, &output_length, output, case_flags); if (status == PUNYCODE_BAD_INPUT) fail (invalid_input); if (status == PUNYCODE_BIG_OUTPUT) fail (too_big); if (status == PUNYCODE_OVERFLOW) fail (overflow); assert (status == PUNYCODE_SUCCESS); /* Output the result: */ for (j = 0; j < output_length; ++j) { r = printf ("%s+%04lX\n", case_flags[j] ? "U" : "u", (unsigned long) output[j]); if (r < 0) fail (io_error); } return EXIT_SUCCESS; } usage (argv); return EXIT_SUCCESS; /* not reached, but quiets compiler warning */ }
This example demonstrates how the library is used to convert internationalized domain names into ASCII compatible names.
/* example3.c Example code showing how to use Libidn. * Copyright (C) 2002 Simon Josefsson * * This file is part of GNU Libidn. * * GNU Libidn is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * GNU Libidn is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with GNU Libidn; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <stringprep.h> /* stringprep_locale_charset() */ #include <idna.h> /* idna_locale_to_ace() */ /* * Compiling using libtool and pkg-config is recommended: * * $ libtool cc -o example3 example3.c `pkg-config --cflags --libs libidn` * $ ./example3 * Input domain encoded as `ISO-8859-1': www.räksmörgåsª.example * Read string (length 23): 77 77 77 2e 72 e4 6b 73 6d f6 72 67 e5 73 aa 2e 65 78 61 6d 70 6c 65 * ACE label (length 33): 'www.iesg--rksmrgsa-0zap8p.example' * 77 77 77 2e 69 65 73 67 2d 2d 72 6b 73 6d 72 67 73 61 2d 30 7a 61 70 38 70 2e 65 78 61 6d 70 6c 65 * $ * */ int main (int argc, char *argv[]) { char buf[BUFSIZ]; char *p; int rc, i; printf ("Input domain encoded as `%s': ", stringprep_locale_charset ()); fflush (stdout); fgets (buf, BUFSIZ, stdin); buf[strlen(buf)-1] = '\0'; printf ("Read string (length %d): ", strlen (buf)); for (i = 0; i < strlen (buf); i++) printf ("%02x ", buf[i] & 0xFF); printf ("\n"); rc = idna_locale_to_ace (buf, &p); if (rc != IDNA_SUCCESS) { printf("ToASCII() failed... %d\n", rc); exit(1); } printf ("ACE label (length %d): '%s'\n", strlen (p), p); for (i = 0; i < strlen (p); i++) printf ("%02x ", p[i] & 0xFF); printf ("\n"); free(p); return 0; }
This example demonstrates how the library is used to convert ASCII compatible names to internationalized domain names.
/* example4.c Example code showing how to use Libidn. * Copyright (C) 2002 Simon Josefsson * * This file is part of GNU Libidn. * * GNU Libidn is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * GNU Libidn is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with GNU Libidn; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <stringprep.h> /* stringprep_locale_charset() */ #include <idna.h> /* idna_localeace_to_locale() */ /* * Compiling using libtool and pkg-config is recommended: * * $ libtool cc -o example4 example4.c `pkg-config --cflags --libs libidn` * $ ./example4 * Input domain encoded as `ISO-8859-1': www.iesg--rksmrgsa-0zap8p.example * Read string (length 33): 77 77 77 2e 69 65 73 67 2d 2d 72 6b 73 6d 72 67 73 61 2d 30 7a 61 70 38 70 2e 65 78 61 6d 70 6c 65 * ACE label (length 23): 'www.räksmörgåsa.example' * 77 77 77 2e 72 e4 6b 73 6d f6 72 67 e5 73 61 2e 65 78 61 6d 70 6c 65 * $ * */ int main (int argc, char *argv[]) { char buf[BUFSIZ]; char *p; int rc, i; printf ("Input domain encoded as `%s': ", stringprep_locale_charset ()); fflush (stdout); fgets (buf, BUFSIZ, stdin); buf[strlen(buf)-1] = '\0'; printf ("Read string (length %d): ", strlen (buf)); for (i = 0; i < strlen (buf); i++) printf ("%02x ", buf[i] & 0xFF); printf ("\n"); rc = idna_localeace_to_locale (buf, &p); if (rc != IDNA_SUCCESS) { printf("ToUnicode() failed... %d\n", rc); exit(1); } printf ("ACE label (length %d): '%s'\n", strlen (p), p); for (i = 0; i < strlen (p); i++) printf ("%02x ", p[i] & 0xFF); printf ("\n"); free(p); return 0; }
Copyright © 2000 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
The purpose of this License is to make a manual, textbook, or other written document free in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.
This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.
This License applies to any manual or other work that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you".
A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.
A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (For example, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.
The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License.
The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License.
A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, whose contents can be viewed and edited directly and straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup has been designed to thwart or discourage subsequent modification by readers is not Transparent. A copy that is not "Transparent" is called "Opaque".
Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML designed for human modification. Opaque formats include PostScript, PDF, proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML produced by some word processors for output purposes only.
The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.
You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may publicly display copies.
If you publish printed copies of the Document numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a publicly-accessible computer-network location containing a complete Transparent copy of the Document, free of added material, which the general network-using public has access to download anonymously at no charge using public-standard network protocols. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.
You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:
If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.
You may add a section entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties--for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.
You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice.
The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections entitled "History" in the various original documents, forming one section entitled "History"; likewise combine any sections entitled "Acknowledgments", and any sections entitled "Dedications". You must delete all sections entitled "Endorsements."
You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.
A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, does not as a whole count as a Modified Version of the Document, provided no compilation copyright is claimed for the compilation. Such a compilation is called an "aggregate", and this License does not apply to the other self-contained works thus compiled with the Document, on account of their being thus compiled, if they are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one quarter of the entire aggregate, the Document's Cover Texts may be placed on covers that surround only the Document within the aggregate. Otherwise they must appear on covers around the whole aggregate.
Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License provided that you also include the original English version of this License. In case of a disagreement between the translation and the original English version of this License, the original English version will prevail.
You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.
The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation.
To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page:
Copyright (C) year your name. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being list their titles, with the Front-Cover Texts being list, and with the Back-Cover Texts being list. A copy of the license is included in the section entitled ``GNU Free Documentation License''.
If you have no Invariant Sections, write "with no Invariant Sections" instead of saying which ones are invariant. If you have no Front-Cover Texts, write "no Front-Cover Texts" instead of "Front-Cover Texts being list"; likewise for Back-Cover Texts.
If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.