Friday, May 18, 2012

isset() vs strlen() - a fast/clear string length calculation


I came across this code...




if(isset($string[255])) {
// too long
}



isset() is between 6 and 40 faster than




if(strlen($string) > 255) {
// too long
}



The only drawback to the isset() is that the code is unclear - we cannot tell right away what is being done (see pekka's answer). We can wrap isset() within a function i.e. strlt($string,255) but we then loose the speed benefits of isset().



How can we use the faster isset() function while retaining readability of the code?



EDIT : test to show the speed http://codepad.org/ztYF0bE3




strlen() over 1000000 iterations 7.5193998813629
isset() over 1000000 iterations 0.29940009117126



EDIT2 : here's why isset() is faster




$string = 'abcdefg';
var_dump($string[2]);
Output: string(1) “c”

$string = 'abcdefg';
if (isset($string[7])){
echo $string[7].' found!';
}else{
echo 'No character found at position 7!';
}



This is faster than using strlen() because, “… calling a function is more expensive than using a language construct.” http://www.phpreferencebook.com/tips/use-isset-instead-of-strlen/



EDIT3 : I was always taught to be interested in mirco-optimisation. Probably because I was taught at a time when resources on computers were tiny. I'm open to the idea that it may not be important, there are some good arguments against it in the answers. I've started a new question exploring this... Is micro-optimisation important when coding?


Source: Tips4all

7 comments:

  1. OK so I ran the tests since I could hardly believe that the isset() method is faster, but yes it is, and considerably so. The isset() method is consistently about 6 times faster.

    I have tried with strings of various sizes and running a varying amount of iterations; the ratios remain the same, and also the total running length by the way (for strings of varying sizes), because both isset() and strlen() are O(1) (which makes sense - isset only needs to do a lookup in a C array, and strlen() only returns the size count that is kept for the string).

    I looked it up in the php source, and I think I roughly understand why. isset(), because it is not a function but a language construct, has its own opcode in the Zend VM. Therefore, it doesn't need to be looked up in the function table and it can do more specialized parameter parsing. Code is in zend_builtin_functions.c for strlen() and zend_compile.c for isset(), for those interested.

    To tie this back to the original question, I don't see any issues with the isset() method from a technical point of view; but imo it is harder to read for people who are not used to the idiom. Futhermore, the isset() method will be constant in time, while the strlen() method will be O(n) when varying the amount of functions that are build into PHP. Meaning, if you build PHP and statically compile in many functions, all function calls (including strlen()) will be slower; but isset() will be constant. However this difference will in practice be negligible; I also don't know how many function pointer tables are maintained, so if user-defined functions also have an influence. I seem to remember they are in a different table and therefore are irrelevant for this case, but it's been a while since I last really worked with this.

    For the rest I don't see any drawbacks to the isset() method. I don't know of other ways to get the length of a string, when not considering purposefully convoluted ones like explode+count and things like that.

    Finally, I also tested your suggestion above of wrapping isset() into a function. This is slower than even the strlen() method because you need another function call, and therefore another hash table lookup. The overhead of the extra parameter (for the size to check against) is negligible; as is the copying of the string when not passed by reference.

    ReplyDelete
  2. Any speed difference in this is of absolutely no consequence. It will be a few milliseconds at best.

    Use whatever style is best readable to you and anybody else working on the code - I personally would strongly vote for the second example because unlike the first one, it makes the intention (checking the length of a string) absolutely clear.

    ReplyDelete
  3. Your code is incomplete.

    Here, I fixed it for you:

    if(isset($string[255])) {
    // something taking 1 millisecond
    }


    vs

    if(strlen($string) > 255) {
    // something taking 1 millisecond
    }


    Now you don't have an empty loop, but a realistic one.
    Lets consider it takes 1 millisecond to do something.

    A modern CPU can do a lot of things in 1 millisecond - that is given. But things like a random hard drive access or a database request take multiple milliseconds - also a realistic scenario.

    Now lets calculate timings again:

    realistic routine + strlen() over 1000000 iterations 1007.5193998813629
    realistic routine + isset() over 1000000 iterations 1000.29940009117126


    See the difference?

    ReplyDelete
  4. Firstly, I want to point towards an answer by Artefacto explaining why function calls carry an overhead over language constructs.

    Secondly, I want to make you aware of the fact that XDebug greatly decreases performance of function calls, so if you are running XDebug you may get convoluted numbers. Reference (Second section of question). So, in production (where you hopefully do not have XDebug installed) the difference is even smaller. It goes down from 6x to 2x.

    Thirdly, you should know that, even though there is a measurable difference, this difference only shows up if this code runs in a tight loop with millions of iterations. In a normal web application the difference will not be measurable, it will go under in the noise of variance.

    Fourthly, please note that nowadays development time is much more expensive than server load. A developer spending even only half a second more understanding what the isset code does is much more expensive than the saving in CPU load. Furthermore server load can be by far better saved by applying optimizations that actually make a difference (like caching).

    ReplyDelete
  5. The drawback are that isset is not explicit at all while strlen is really clear about what your intention are. If someone read your code and have to understand what you're doing it might bugs him and not be really clear.

    Unless you are running facebook i doubt that strlen will be where your server will spend most of his resources, and you should keep using strlen.

    I just tested strlen is far faster the isset.

    0.01 seconds for 100000 iterations with isset

    0.04 seconds for 100000 iterations with strlen

    But doesn't change what i said just now.

    The script as some people just asked :

    $string = 'xdfksdjhfsdljkfhsdjklfhsdlkjfhsdjklfhsdkljfhsdkljfhsdljkfsdhlkfjshfljkhfsdljkfhsdkljfhsdkljfhsdklfhlkjfhkljfsdhfkljsdhfkljsdhfkljhsdfjklhsdjklfhsdkljfhklsdhfkljsdfhdjkshfjlhdskljfhsdkljfhsdjkfhsjkldhfklsdjhfkjlsfhdjkflsdhfjklfsdljfsdlkdlfkjflfkjsdfkl';

    for ($i = 0; $i < 100000; $i++) {
    if (strlen($string) == 255) {
    // if (isset($string[255])) {
    // do nothing
    }
    }

    ReplyDelete
  6. If you want to keep clarity you could do something like:

    function checklength(&$str, $len)
    {
    return isset($str[$len]);
    }

    ReplyDelete
  7. If isset is used that way for performance, shouldn't there be a comment in the code explaining why it's there? That way others will understand the performance benefit of isset over strlen.

    ReplyDelete