Thursday, April 19, 2012

In PHP, what happens in memory when we use mysql_query


I used to fetch large amount of data using mysql_query then iterating through the result one by one to process the data. Ex:




$mysql_result = mysql_query("select * from user");
while($row = mysql_fetch_array($mysql_result)){
echo $row['email'] . "\n";
}



Recently I looked at a few framework and realized that they fetched all data to an array in memory and returning the array.




$large_array = $db->fetchAll("select * from user");
foreach($large_array as $user){
echo $user['email'] . "\n";
}



I would like to know the pros/cons of each method. It appears to me that loading everything in memory is a recipe for disaster if you have a very long list of items. But then again, a coworker told me that the mysql driver would have to put the result set in memory anyway. I'd like to get the opinion of someone who understand that the question is about performance. Please don't comment on the code, I just made it up as an example for the post.



Thanks


Source: Tips4all

4 comments:

  1. you're mixing matters.


    usability, which makes your code WAY smoother with arrays
    and unoptimized algorithm, when unexperienced programmer tend to load ALL data into script instead of making database to do all the calculations or get data in portions.


    So. Frameworks do not fetch all data. They fetch just what programmer wrote.
    So, a good programmer would not fetch large amounts of data into array. In these few cases when it really needed, one would use old line-by-line fetching (and every framework provide a method for this). In the all other cases smooth already-in-array fetching should be used.

    Please also note that frameworks will never do such things like echoing data right inside of database loop.
    Every good framework would use a template to output things, and in this case an array comes extremely handy.

    ReplyDelete
  2. When working with large result sets, I usually go through batches, like this:

    $current = 0;
    $batchSize = 1000;

    while (true) {
    $large_array = $db->fetchAll(sprintf("select * from user limit %s, %s", $current, $batchSize));
    if (sizeof($large_array) == 0) {
    break;
    }

    $current += sizeof($large_array);
    foreach($large_array as $user){
    echo $user['email'] . "\n";
    }
    }


    As long as your code doesn't leak memory, you won't have to worry about which method uses more memory. In terms of performance, loading the entire result set in one query probably is faster, though you're likely to hit the memory limit very soon this way, so performance isn't really your problem anymore.

    By the way, it's relatively easy to test this yourself, by setting up a script that measures the time (and peak memory) of both snippets. I'd wager they won't differ in time that much.

    ReplyDelete
  3. It is true that the MySQL library "normally" fetches all data in the client memory. It usually does so by using mysql_store_result(). You can split up too large queries as shown above with the LIMIT keyword, but there is the risk that data get inconsistent because they might change in-between. You can care for this using locks.

    Another approach could be to use mysql_use_result(), which uses more ressources on the server-side and requires to complete the fetching job ASAP.

    ReplyDelete
  4. Just something I learned when it comes to performance: foreach is faster than a while loop. Perhaps you should benchmark your results of each and see which one is faster and less memory intensive. IMHO, I like the latter approach better. But do you really need every single column within the user table? If not, than just define the columns that you need instead of using * to grab them all. Since this will also help with memory and speed as well.

    ReplyDelete