RFC 3986 states that new URI scheme should be encoded to UTF-8 first before being percent encoded. However, this does not apply to previous URI versions.
Is it safe to assume that all multibyte, percent encoded URI turns into UTF-8 encoded string after being passed through urldecode()
?
For example, if the contents of $_SERVER['REQUEST_URI']
is being percent encoded as such:
/b%C3%BCch/w%C3%B6rterb%C3%BCch
After I pass this string to urldecode()
, I should have a multibyte string. But how do I know in what encoding the string is? In the above example, it's UTF-8, but is it safe to always assume so?
If it's not safe to assume so, is there a way (other than mb_detect_encoding
) to detect the encoding of the string? I've checked request headers, they don't seem to have anything helpful.
Source: Tips4all
No comments:
Post a Comment