UTF-8 safe equivalent of ord or charCodeAt() in PHP
✔ Recommended Answer
ord()
works byte per byte (as most of PHPs standard string functions - if not all). You would need to convert it your own, for example with the help of the multibyte string extension:
$utf8Character = 'Ą';list(, $ord) = unpack('N', mb_convert_encoding($utf8Character, 'UCS-4BE', 'UTF-8'));echo $ord; # 260
Source: stackoverflow.com
Answered By: hakre
In PHP, the ord
function returns the ASCII value of the first character of a string, but it may not work correctly with multibyte strings or UTF-8 encoded characters.
To safely get the Unicode code point of a character in a UTF-8 encoded string in PHP, you can use the mb_ord
function, which is available from PHP 7.2 onwards. Here's an example:
php$string = '€'; // UTF-8 encoded Euro symbol
$code_point = mb_ord($string, 'UTF-8');
echo $code_point; // Output: 8364
In this example, we use the mb_ord
function to get the Unicode code point of the Euro symbol (€) in a UTF-8 encoded string. The function takes two parameters: the string to get the code point from, and the encoding of the string (in this case, UTF-8).
If you're using an older version of PHP that doesn't have the mb_ord
function, you can use the unpack
function with the 'N' format code to unpack a 32-bit unsigned integer from a binary string representing a UTF-8 encoded character. Here's an example:
php$string = '€'; // UTF-8 encoded Euro symbol
$code_point = unpack('N', mb_convert_encoding($string, 'UCS-4BE', 'UTF-8'))[1];
echo $code_point; // Output: 8364
In this example, we first convert the UTF-8 encoded string to UCS-4BE encoding using the mb_convert_encoding
function. We then use the unpack
function with the 'N' format code to unpack a 32-bit unsigned integer from the binary string, and get the first element of the resulting array using [1]
to get the code point.
Comments
Post a Comment