Monday, May 7, 2012

Can"t understand why Zend_Mail::addHeader() strips newlines


(Since this is my first SO question, let me just say I hope it's not too Zend-specific. As far as I can tell this shouldn't be a problem. Although I could have posted it in a Zend-specific forum, I feel like I'm at least as likely to get a good answer here, especially since the answer might involve MIME-related issues that transcend Zend Framework. I'm basically trying to understand whether the issue I'm facing should be considered a ZF bug, or if I'm misunderstanding something or misusing it.)



I've been using Zend_Mail to build up a MIME message that gets sent through SendGrid, an email distribution service. Their platform allows you to send emails through their SMTP server, but gives added features when you use a special header (X-SMTPAPI) whose value is a JSON-encoded string of proprietary parameters, which can get quite long.



Eventually, the header I was passing got too long (I think >1000 chars), and I got errors. I was confused because I knew that it was getting passed through PHP's native wordwrap() function before I passed the value to Zend_Mail::addHeader(), so I thought line length should never be a problem.



It turns out that addHeader() strips newlines very deliberately, and with no particular explanation by way of comments.




// In Zend_Mail::addHeader()
$value = $this->_filterOther($value);


// In Zend_Mail::_filterOther()
$rule = array("\r" => '',
"\n" => '',
"\t" => '',
);
return strtr($data, $rule);



Ok, this seemed reasonable at first -- maybe ZF wants full control of formatting and line-wrapping. The next method called in Zend_Mail::addHeader() is




$value = $this->_encodeHeader($value);



This method encodes the value (either quoted-printable or base64 as appropriate) and chunks it into lines of appropriate length, but only if it contains "non-printable characters", as determined by Zend_Mime::isPrintable($value).



Looking into that method, newlines (\n) are indeed considered non-printable characters! So if only they hadn't been stripped out of the string in the previous method call, the long header would get encoded as QP and chunked into 72-char lines, and everything would work fine. In fact, I did a test where I commented out the call to _filterOther(), and the long header gets encoded and goes through with no problem. But now I've just made a careless hack to ZF without really understanding the purpose behind the line I removed, so this can't be a long-term solution.



My medium-term solution has been to extend Zend_Mail and create a new method, addHeaderForceEncode(), which will always encode the value of the header, and thus always chunk it into short lines. But I'm still not satisfied because I don't understand why that _filterOther() call was necessary in the first place -- maybe I shouldn't be working around it at all.



Can anyone explain to me why this behaviour exists of stripping newlines? It seems to inevitably lead to situations where a header can get too long if it doesn't contain any "non-printable characters" other than newlines.



I've done a bunch of different searches on this subject and looked through some ZF bug reports, but haven't seen anyone talking about this. Surprisingly it seems to be a really obscure issue. FYI I'm working with ZF 1.11.11.





Update: In case anyone wants to follow the ZF issue I opened about this, here it is: Zend_Mail::addHeader() UNfolds long headers, then throws exception


Source: Tips4all

1 comment:

  1. You're probably running into a few things. Per RFC 2821, text lines in SMTP can't exceed 1000 characters:


    text line

    The maximum total length of a text line including the is
    1000 characters (not counting the leading dot duplicated for
    transparency). This number may be increased by the use of SMTP
    Service Extensions.


    A header can't contain newlines, so that's probably why Zend is stripping them. For long headers, it's common to insert a line break (CRLF in SMTP) and a tab to "wrap" them.

    Excerpt from RFC 822:


    Each header field can be viewed as a single, logical line of
    ASCII characters, comprising a field-name and a field-body.
    For convenience, the field-body portion of this conceptual
    entity can be split into a multiple-line representation; this
    is called "folding". The general rule is that wherever there
    may be linear-white-space (NOT simply LWSP-chars), a CRLF
    immediately followed by AT LEAST one LWSP-char may instead be
    inserted.


    I would say that the _encodeHeader() function should possibly look at line length, and if the header is longer than some magic value, do the "wrap and tab" to have it span multiple lines.

    ReplyDelete