Tuesday, February 14, 2012

How to make simplest servlet filter respect setted character encoding


Gentlemen, it feels like I'm stuck. I'm trying to write the simplest servlet Filter (and deploy it to tomcat). It's a groovy code, but actually I'm heavily usin java approaches here, so it is almost copypaste, that's the reason I've added java tag as well.



My question is - how can I insert UTF-8 string to filter? Here is the code:



`public class SimpleFilter implements javax.servlet.Filter {




...

public void doFilter(ServletRequest request, ServletResponse response,
FilterChain chain)
throws java.io.IOException, javax.servlet.ServletException
{
PrintWriter out = response.getWriter()
chain.doFilter(request, wrapResponse((HttpServletResponse) response))

response.setCharacterEncoding('UTF-8')
response.setContentType('text/plain')

def saw = 'АБВГДЕЙКА ЭТО НЕПРОСТАЯ ПЕРЕДАЧА ABCDEFGHIJKLMNOP!!!'
def bytes = saw.getBytes('UTF-8')
def content = new String(bytes, 'UTF-8')

response.setContentLength(content.length())
out.write(content);
out.close();
}

private static HttpServletResponse wrapResponse(HttpServletResponse response) {
return new HttpServletResponseWrapper(response) {
@Override
public PrintWriter getWriter() {
def writer = new OutputStreamWriter(new ByteArrayOutputStream(), 'UTF-8')
return new PrintWriter(writer)
}
}
}



}`



Content-Type of the filtered page is text/plain;charset=ISO-8859-1 . So, content type have changed, but charset is ignored.



As you can see, I've take some measures (i guess quite naive) to make sure content is UTF-8, but none of these steps actually was helpful.



I've also tried to add URIEncoding="UTF-8" or useBodyEncodingForUri="true" attributes to Connector in tomcat conf/server.xml



It would be nice if somebody exlain me what I'm doing wrong.



UPD: just a bit of explanation - I'm writing XSLT-applying filter, that is the real reason I'm trying to discard whole request.

3 comments:

  1. def saw = 'АБВГДЕЙКА ЭТО НЕПРОСТАЯ ПЕРЕДАЧА ABCDEFGHIJKLMNOP!!!'
    def bytes = saw.getBytes('UTF-8')
    def content = new String(bytes, 'UTF-8')


    Does not change a thing between saw and content. What you want is to do (using the outputstream and not the writer, this is why the charset is reset to ISO-8859-1 See tomcat doc):

    out.write(saw.getBytes("UTF-8);


    Your code looks okay to set the charset as UTF-8.

    I don't understand what you are doing with HttpResponseWrapper.

    To make it clear, this will work:

    public void doFilter(ServletRequest request, ServletResponse response,
    FilterChain chain)
    throws java.io.IOException, javax.servlet.ServletException
    {
    OutputStream out = response.getOutputStream()

    response.setCharacterEncoding('UTF-8')
    response.setContentType('text/plain')

    def saw = 'АБВГДЕЙКА ЭТО НЕПРОСТАЯ ПЕРЕДАЧА ABCDEFGHIJKLMNOP!!!'

    response.setContentLength(saw.length())
    out.write(content.getBytes("UTF-8"));

    }

    ReplyDelete
  2. This might be the problem you're having, or at least it's one part of the problem. As the documentation of setCharacterEncoding() says:


    This method has no effect if it is called after getWriter has been
    called or after the response has been committed.


    You should set the encoding, and only after, get the writer.

    ReplyDelete
  3. You are trying to set the content type after committing the response by calling getWriter.
    See the documentation on getWriter and setCharacterEncoding for details.

    To fix you code just move the setting of content type and encoding a few lines earlier.

    response.setCharacterEncoding('UTF-8')
    response.setContentType('text/plain')
    PrintWriter out = response.getWriter()

    ReplyDelete