punycode

Name

punycode -- Punycode: A Bootstring encoding of Unicode for IDNA.

Synopsis



int         punycode_encode                 (size_t input_length,
                                             unsigned long input[],
                                             unsigned char case_flags[],
                                             size_t *output_length,
                                             char output[]);
int         punycode_decode                 (size_t input_length,
                                             const char input[],
                                             size_t *output_length,
                                             unsigned long output[],
                                             unsigned char case_flags[]);

Description

Punycode is a simple and efficient transfer encoding syntax designed for use with Internationalized Domain Names in Applications. It uniquely and reversibly transforms a Unicode string into an ASCII string. ASCII characters in the Unicode string are represented literally, and non-ASCII characters are represented by ASCII characters that are allowed in host name labels (letters, digits, and hyphens). This document defines a general algorithm called Bootstring that allows a string of basic code points to uniquely represent any string of code points drawn from a larger set. Punycode is an instance of Bootstring that uses particular parameter values specified by this document, appropriate for IDNA.

Details

punycode_encode ()

int         punycode_encode                 (size_t input_length,
                                             unsigned long input[],
                                             unsigned char case_flags[],
                                             size_t *output_length,
                                             char output[]);

Converts Unicode to Punycode.

input_length :

The input_length is the number of code points in the input.

Param2 :

case_flags :

The case_flags array holds input_length boolean values, where nonzero suggests that the corresponding Unicode character be forced to uppercase after being decoded (if possible), and zero suggests that it be forced to lowercase (if possible). ASCII code points are encoded literally, except that ASCII letters are forced to uppercase or lowercase according to the corresponding uppercase flags. If case_flags is a null pointer then ASCII letters are left as they are, and other code points are treated as if their uppercase flags were zero.

output_length :

The output_length is an in/out argument: the caller passes in the maximum number of code points that it can receive, and on successful return it will contain the number of code points actually output.

output :

The output will be represented as an array of ASCII code points. The output string is *not* null-terminated; it will contain zeros if and only if the input contains zeros. (Of course the caller can leave room for a terminator and add one if needed.)

Returns :

The return value can be any of the punycode_status values defined above except punycode_bad_input; if not punycode_success, then output_size and output might contain garbage.


punycode_decode ()

int         punycode_decode                 (size_t input_length,
                                             const char input[],
                                             size_t *output_length,
                                             unsigned long output[],
                                             unsigned char case_flags[]);

Converts Punycode to Unicode.

input_length :

The input_length is the number of code points in the input.

input :

The input is represented as an array of ASCII code points.

output_length :

The output_length is an in/out argument: the caller passes in the maximum number of code points that it can receive, and on successful return it will contain the actual number of code points output.

Param4 :

case_flags :

The case_flags array needs room for at least output_length values, or it can be a null pointer if the case information is not needed. A nonzero flag suggests that the corresponding Unicode character be forced to uppercase by the caller (if possible), while zero suggests that it be forced to lowercase (if possible). ASCII code points are output already in the proper case, but their flags will be set appropriately so that applying the flags would be harmless.

Returns :

The return value can be any of the punycode_status values defined above; if not punycode_success, then output_length, output, and case_flags might contain garbage. On success, the decoder will never need to write an output_length greater than input_length, because of how the encoding is defined.