Thursday, April 19, 2012

Is it safe to assume decoded percent-encoded URIs turn into UTF-8?


RFC 3986 states that new URI scheme should be encoded to UTF-8 first before being percent encoded. However, this does not apply to previous URI versions.



Is it safe to assume that all multibyte, percent encoded URI turns into UTF-8 encoded string after being passed through urldecode() ?



For example, if the contents of $_SERVER['REQUEST_URI'] is being percent encoded as such:




/b%C3%BCch/w%C3%B6rterb%C3%BCch



After I pass this string to urldecode() , I should have a multibyte string. But how do I know in what encoding the string is? In the above example, it's UTF-8, but is it safe to always assume so?



If it's not safe to assume so, is there a way (other than mb_detect_encoding ) to detect the encoding of the string? I've checked request headers, they don't seem to have anything helpful.


Source: Tips4all

No comments:

Post a Comment