Thursday, May 24, 2012

Cached, PHP generated Thumbnails load slow: How to find problem/solution from Waterfall graphs?


Question Part A ▉ (100 bountys, awarded)

Main question was how to make this site, load faster. First we needed to read these waterfalls. Thanks all for your suggestions on the waterfall readout analysis. Evident from the various waterfall graphs shown here is the main bottleneck: the PHP-generated thumbnails. The protocol-less jquery loading from CDN advised by David got my bounty, albeit making my site only 3% faster overall, and while not answering the site's main bottleneck. Time for for clarification of my question, and, another bounty:



Question Part B ▉ (100 bountys, awarded)

The new focus was now to solve the problem that the 6 jpg images had, which are causing the most of the loading-delay. These 6 images are PHP-generated thumbnails, tiny and only 3~5 kb, but loading relatively very slowly. Notice the " time to first byte " on the various graphs. The problem remained unsolved, but a bounty went to James, who fixed the header error that RedBot underlined : "An If-Modified-Since conditional request returned the full content unchanged." .



Question Part C ▉ (my last bounty: 250 points)

Unfortunately, after even REdbot.org header error was fixed, the delay caused by the PHP-generated images remained untouched. What on earth are these tiny puny 3~5Kb thumbnails thinking? All that header information can send a rocket to moon and back. Any suggestions on this bottleneck is much appreciated and treated as possible answer, since I am stuck at this bottleneckish problem for already seven months now. My thanks in advance.



[Some background info on my site: CSS is at the top. JS at the bottom (Jquery,JQuery UI, bought menu awm/menu.js engines, tabs js engine, video swfobject.js) The black lines on the second image show whats initiating what to load. The angry robot is my pet "ZAM". He is harmless and often happier.]





Load Waterfall: Chronological | http://webpagetest.org enter image description here





Parallel Domains Grouped | http://webpagetest.org enter image description here





Site-Perf Waterfall | http://site-perf.com enter image description here





Pingdom Tools Waterfall | http://tools.pingdom.com



enter image description here





GTmetrix Waterfall | http://gtmetrix.com



enter image description here




Source: Tips4all

18 comments:

  1. First, using those multiple domains requires several DNS lookups. You'd be better off combining many of those images into a sprite instead of spreading the requests.

    Second, when I load your page, I see most of the blocking (~1.25s) on all.js. I see that begins with (an old version of) jQuery. You should reference that from the Google CDN, to not only decrease load time, but potentially avoid an HTTP request for it entirely.

    Specifically, the most current jQuery and jQuery UI libraries can be referenced at these URLs (see this post if you're interested why I omitted the http:):

    //ajax.googleapis.com/ajax/libs/jquery/1.4.4/jquery.min.js

    //ajax.googleapis.com/ajax/libs/jqueryui/1.8.9/jquery-ui.min.js


    If you're using one of the default jQuery UI themes, you can also pull its CSS and images off the Google CDN.

    With the jQuery hosting optimized, you should also combine awmlib2.js and tooltiplib.js into a single file.

    If you address those things, you should see a significant improvement.

    ReplyDelete
  2. I had a similar problem a few days ago & i found head.js.
    It's a Javascript Plugin which allows you to load all JS files paralell.
    Hope that helps.

    ReplyDelete
  3. I am far from an expert but...

    In regards to this:
    "An If-Modified-Since conditional request returned the full content unchanged."
    and my comments.

    The code used to generate the Thumbnails should be checking for the following:


    Is there a cached version of the thumbnail.
    Is the cached version newer than the original image.


    If either of these are false the thumbnail should be generated and returned no matter what. If they are both true then the following check should be made:


    Is there a HTTP_IF_MODIFIED_SINCE header
    Is the cached version's last modified time the same as the HTTP_IF_MODIFIED_SINCE


    If either of these are false the cached thumbnail should be returned.

    If both of these are true then a 304 http status should be returned. I'm not sure if its required but I also personally return the Cache-Control, Expires and Last-Modified headers along with the 304.

    In regards to GZipping, I've been informed that there is no need to GZip images so ignore that part of my comment.

    Edit: I didn't notice your addition to your post.

    session_cache_limiter('public');
    header("Content-type: " . $this->_mime);
    header("Expires: " . gmdate("D, d M Y H:i:s", time() + 2419200) . " GMT");
    // I'm sure Last-Modified should be a static value. not dynamic as you have it here.
    header("Last-Modified: " . gmdate("D, d M Y H:i:s",time() - 404800000) . " GMT");


    I'm also sure that your code needs to check for the HTTP_IF_MODIFIED_SINCE header and react to it. Just setting these headers and your .htaccess file won't provide the required result.

    I think you need something like this:

    $date = 'D, d M Y H:i:s T'; // DATE_RFC850
    $modified = filemtime($filename);
    $expires = strtotime('1 year'); // 1 Year

    header(sprintf('Cache-Control: %s, max-age=%s', 'public', $expires - time()));
    header(sprintf('Expires: %s', date($date, $expires)));
    header(sprintf('Last-Modified: %s', date($date, $modified)));
    header(sprintf('Content-Type: %s', $mime));

    if(isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
    if(strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE']) === $modified) {
    header('HTTP/1.1 304 Not Modified', true, 304);
    // Should have been an exit not a return. After sending the not modified http
    // code, the script should end and return no content.
    exit();
    }
    }
    // Render image data

    ReplyDelete
  4. Wow, it's hard to explain things using that image.. But here, some tries:


    files 33-36 load that late, because they are dynamically loaded within the swf, and the swf (25) is loaded first completely before it loads any additional content
    files 20 & 21 are maybe (I don't know, because I don't know your code) libraries that are loaded by all.js (11), but for 11 to execute, it waits for the whole page (and assets) to load (you should change that to domready)
    files 22-32 are loaded by those two libraries, again after those are completely loaded

    ReplyDelete
  5. Just a simple guess because this kind of analysis requires a lot of A/B testing: your .ch domain seems to be hard to reach (long, green bands before the first byte arrives).

    This would mean that either the .ch website is poorly hosted or that you ISP does not have a good route to them.

    Given the diagrams, this could explain a big performance hit.

    On a side note, there is this cool tool cuzillion that could help you sort out things depending on your ordering of ressource loading.

    ReplyDelete
  6. Try running Y!Slow and Page Speed tests on your site/page, and follow the guidelines to sort out possible performance bottlenecks. You should be getting huge performance gains once you score higher in Y!Slow or Page Speed.

    These tests will tell you what's wrong and what to change.

    ReplyDelete
  7. So your PHP script is generating the thumbnails on every page load? First off, if the images that are being thumbnailed are not changing that often, could you set up a cache such that they don't have to be parsed each time the page loads? Secondly, is your PHP script using something like imagecopyresampled() to create the thumbnails? That's a non-trivial downsample and the PHP script won't return anything until its done shrinking things down. Using imagecopymerged() instead will reduce the quality of the image, but speed up the process. And how much of a reduction are you doing? Are these thumbnails 5% the size of the original image or 50%? A greater size of the original image likely is leading to a slowdown since the PHP script has to get the original image in memory before it can shrink it and output a smaller thumbnail.

    ReplyDelete
  8. Investigate PHP's usage of session data. Maybe (just maybe), the image-generating PHP script is waiting to get a lock on the session data, which is locked by the still-rendering main page or other image-rendering scripts. This would make all the JavaScript/browser optimizations almost irrelevant, since the browser's waiting for the server.

    PHP locks the session data for every script running, from the moment the session handling starts, to the moment the script finishes, or when session_write_close() is called. This effectively serializes things. Check out the PHP page on sessions, especially the comments, like this one.

    ReplyDelete
  9. I've found the URL of your website and checked an individual jpg file from the homepage.
    While the loading time is reasonable now (161ms), it's waiting for 126ms, which is far too much.

    Your last-modified headers are all set to Sat, 01 Jan 2011 12:00:00 GMT, which looks too "round" to be the real date of generation ;-)

    Since Cache-control is "public, max-age=14515200", arbitrary last-modified headers will could cause problem after 168 days.

    Anyway, this is not the real reason for delays.

    You have to check what your thumbnail generator do when the thumbnail already exists and what could consume so much time checking and delivering the picture.

    You could install xdebug to profile the script and see where the bottlenecks are.

    Maybe the whole thing uses a framework or connects to some database for nothing. I've seen very slow mysql_connect() on some servers, mostly because they were connecting using TCP and not socket, sometimes with some DNS issues.

    I understand you can't post your paid generator here but I'm afraid there are too many possible issues...

    ReplyDelete
  10. If there isn't a really good reason (usually there isn't) your images shouldn't invoke the PHP interpreter.

    Create a rewrite rule for your web server that servers the image directly if it is found on the file system. If it's not, redirect to your PHP script to generate the image. When you edit the image, change the images filename to force users that have a cached version to fetch the newly edited image.

    If it doesn't work at least you will now it doesn't have anything to do with the way the images are created and checked.

    ReplyDelete
  11. This is just a wild guess since I haven't looked at your code but I suspect sessions may be playing a role here, the following is from the PHP Manual entry on session_write_close():


    Session data is usually stored after
    your script terminated without the
    need to call session_write_close(),
    but as session data is locked to
    prevent concurrent writes only one
    script may operate on a session at any
    time. When using framesets together
    with sessions you will experience the
    frames loading one by one due to this
    locking. You can reduce the time
    needed to load all the frames by
    ending the session as soon as all
    changes to session variables are
    done.


    Like I said, I don't know what your code is doing but those graphs seem oddly suspicious. I had a similar issue when I coded a multipart file serving function and I had the same problem. When serving a large file I couldn't get the multipart functionality to work nor could I open another page until the download was completed. Calling session_write_close() fixed both my problems.

    ReplyDelete
  12. I think instead of using that thumbnail-generator script you must give TinySRC a try for rapid fast and cloud-hosted thumbnail generation.
    It has a very simple and easy to use API, you can use like:-

    http://i.tinysrc.mobi/ [height] / [width] /http://domain.tld/path_to_img.jpg

    [width] (optional):-
    This is a width in pixels (which overrides the adaptive- or family-sizing). If prefixed with ‘-’ or ‘x’, it will subtract from, or shrink to a percentage of, the determined size.

    [height] (optional):-
    This is a height in pixels, if width is also present. It also overrides adaptive- or family-sizing and can be prefixed with ‘-’ or ‘x’.

    You can check the API summary here



    FAQ

    What does tinySrc cost me?

    Nothing.

    When can I start using tinySrc?

    Now.

    How reliable is the service?

    We make no guarantees about the tinySrc service. However, it runs on a major, distributed cloud infrastructure, so it provides high availability worldwide. It should be sufficient for all your needs.

    How fast is it?

    tinySrc caches resized images in memory and in our datastore for up to 24 hours, and it will not fetch your original image each time. This makes the services blazingly fast from the user’s perspective. (And reduces your server load as a nice side-effect.)



    Good Luck. Just a suggestion, since u ain't showing us the code :p

    ReplyDelete
  13. As some browsers only download 2 parallels downloads per domain, could you not add additional domains to shard the requests over two to three different hostnames. e.g. 1.imagecdn.com 2.imagecdn.com

    ReplyDelete
  14. First of all, you need to handle If-Modified-Since requests and such appropriately, as James said. That error states that: "When I ask your server if that image is modified since the last time, it sends the whole image instead of a simple yes/no".

    The time between the connection and the first byte is generally the time your PHP script takes to run. It is apparent that something is happening when that script starts to run.


    Have you considered profiling it? It may have some issues.
    Combined with the above issue, your script may be running many more times than needed. Ideally, it should generate thumbs only if the original image is modified and send cached thumbs for every other request. Have you checked that the script is generating the images unnecessarily (e.g. for each request)?


    Generating proper headers through the application is a bit tricky, plus they may get overwritten by the server. And you are exposed to abuse as anyone sending some no-cache request headers will cause your thumbnail generator to run continuously (and raise loads). So, if possible, try to save those generated thumbs, call the saved images directly from your pages and manage headers from .htaccess. In this case, you wouldn't even need anything in your .htaccess if your server is configured properly.

    Other than these, you can apply some of the bright optimization ideas from the performance parts of this overall nice SO question on how to do websites the right way, like splitting your resources into cookieless subdomains, etc. But at any rate, a 3k image shouldn't take a second to load, this is apparent when compared to other items in the graphs. You should try to spot the problem before optimizing.

    ReplyDelete
  15. Have you tried to set up several subdomains under NGINX webserver specially for serving static data like images and stylesheets? Something helpful could be already found in this topic.

    ReplyDelete
  16. Have you tried replacing the php generated thumnails by regular images to see if there is any difference ?
    The problem could be around
    - a bug in your php code leading to a regeneration of the thumbnail upon each server invocation
    - a delay in your code ( sleep()?) associated with a clock problem
    - a hardrive issue causing a very bad race condition since all the thumbnails get loaded/generated at the same time.

    ReplyDelete
  17. Regarding the delayed thumbnails, try putting a call to flush() immediately after the last call to header() in your thumbnail generation script. Once done, regenerate your waterfall graph and see if the delay is now on the body instead of the headers. If so you need to take a long look at the logic that generates and/or outputs the image data.

    The script that handles the thumbnails should hopefully use some sort of caching so that whatever actions it takes on the images you're serving will only happen when absolutely necessary. It looks like some expensive operation is taking place every time you serve the thumbnails which is delaying any output (including the headers) from the script.

    ReplyDelete
  18. Sorry to say, you provide to few data. And you already had some good suggestions.

    How are you serving those images ? If you're streaming those via PHP you're doing a very bad thing, even if they are already generated.

    NEVER STREAM IMAGES WITH PHP. It will slow down your server, no matter the way you use it.

    Put them in a accessible folder, with a meaningful URI. Then call them directly with their real URI.
    If you need on the fly generation you should put an .htaccess in the images directory which redirects to a generator php-script only if the request image is missing. (this is called cache-on-request strategy).

    Doing that will fix php session, browser-proxy, caching, ETAGS, whatever all at once.

    WP-Supercache uses this strategy, if properly configured.

    I wrote this some time ago ( http://code.google.com/p/cache-on-request/source/detail?r=8 ), last revisions are broken, but I guess 8 or less should work and you can grab the .htaccess as an example just to test things out (although there are better ways to configure the .htaccess than the way I used to).

    I described that strategy in this blog post ( http://www.stefanoforenza.com/need-for-cache/ ). It is probably badly written but it may help clarifying things up.

    Further reading: http://meta.wikimedia.org/wiki/404_handler_caching

    ReplyDelete