Benchmarking Array Indexes for String Comparison
I’ve been doing some work recently on a custom search engine and, as you might imagine, that involves a fair amount of string comparison and manipulation. When the first iteration of the search module was returning results after eleven minutes I started looking seriously at PHP and MySQL optimisations; queries are down to a few seconds now, but there are still savings to be made, and this week I’ve been testing the various ways to compare parts of strings.
Reducing Function Calls
My basic goal was to reduce the number of function calls, particularly those within large loops. Everyone knows tricks like setting the length of a loop outside of the actual check statement, and if you’ve read up on any PHP benchmarking you’ll know that some forms of loops are quicker than others:
// This is bad
for ($x = 0; $x < count($myArray); $i++)
{
// Do something
}
// This is better
for ($x = 0, $size = count($myArray); $x < $size; $i++)
{
// Do something
}
// This is best
$x = 0, $size = count($myArray);
while ($x < $size) {
// Do sometthing
$x++;
}
So my assumption was that removing function calls equals faster code. My loops were pretty well optimised—what about my string comparisons?
String Comparison Functions
I needed to compare subsections of strings, so I was using a lot of substr(), but I realised that where I knew the position of the start and end of the substring, I could use array indexes instead! So this:
if (substr($str, 0, 2) === $test) { ... }
Became this:
if ($str[0] . $str[1] === $test) { ... }
After congratulating myself for discovering this handy shortcut and optimisation trick—getting rid of that function call had to be faster, right?—I decided I should probably check that it was actually the improvement I assumed it was.
Benchmarking Results
The results are actually surprising. In tests, the substr() call is actually faster for strings of three or more letters; using pseudo-array indexes is only really worth doing if you’re comparing at most one or two characters.
It surprised me that the concatenation operations were so much slower than the substr() function call (at least 50% slower)—I was really expecting it to be a much faster way of comparing strings.
So, the lesson is that array indexes are useful for strings, but only really when you’re comparing a single character (where you can see a drop in execution time of around two-thirds)—otherwise, stick to substr().

I'd love to hear what you think - please use the form below to leave your comments. Some HTML is permitted:
b,i,em,del,ins,strong,pre,code,blockquote,abbr. URLs or email addresses will be automatically converted into links.bohemian at 1:23pm on 28th May, 2009 #
ekşi at 12:19pm on 9th September, 2009 #