Monday, April 23, 2012

Unicode character in PHP string


This question looks embarrasingly simple, but I haven't been able to find an answer.



What is the PHP equivalent to the following C# line of code?




string str="\u1000";



(That creates a string with a single unicode character whose "unicode numeric value" is 0x1000 (decimal 4096)).



That is. In PHP, How can I create a string with a single unicode character whose "unicode numeric value" is one that I know?



Thank you.


Source: Tips4all

2 comments:

  1. Because JSON directly supports the \uxxxx syntax the first thing that comes into my mind is:

    $unicodeChar = '\u1000';
    echo json_decode('"'.$unicodeChar.'"');


    Another option would be to use mb_convert_encoding()

    echo mb_convert_encoding('က', 'UTF-8', 'HTML-ENTITIES');


    or make use of the direct mapping between UTF-16BE (big endian) and the Unicode codepoint:

    echo mb_convert_encoding("\x10\x00", 'UTF-8', 'UTF-16BE');

    ReplyDelete
  2. PHP does not know these Unicode escape sequences. But as unknown escape sequences remain unaffected, you can write your own function that converts such Unicode escape sequences:

    function unicodeString($str, $encoding=null) {
    if (is_null($encoding)) $encoding = ini_get('mbstring.internal_encoding');
    return preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/u', create_function('$match', 'return mb_convert_encoding(pack("H*", $match[1]), '.var_export($encoding, true).', "UTF-16BE");'), $str);
    }


    Or with an anonymous function expression instead of create_function:

    function unicodeString($str, $encoding=null) {
    if (is_null($encoding)) $encoding = ini_get('mbstring.internal_encoding');
    return preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/u', function($match) use ($encoding) {
    return mb_convert_encoding(pack('H*', $match[1]), $encoding, 'UTF-16BE');
    }, $str);
    }


    Its usage:

    $str = unicodeString("\u1000");

    ReplyDelete