May 08

Benchmarking Array Indexes for String Comparison

Categories:
Tags:
, ,
Published:
2:59pm on Friday 8th May, 2009

I’ve been doing some work recently on a custom search engine and, as you might imagine, that involves a fair amount of string comparison and manipulation. When the first iteration of the search module was returning results after eleven minutes I started looking seriously at PHP and MySQL optimisations; queries are down to a few seconds now, but there are still savings to be made, and this week I’ve been testing the various ways to compare parts of strings.

Reducing Function Calls

My basic goal was to reduce the number of function calls, particularly those within large loops. Everyone knows tricks like setting the length of a loop outside of the actual check statement, and if you’ve read up on any PHP benchmarking you’ll know that some forms of loops are quicker than others:

 // This is bad
for ($x = 0; $x < count($myArray); $i++)
{
    // Do something
}

 // This is better
for ($x = 0, $size = count($myArray); $x < $size; $i++)
{
    // Do something
}

// This is best
$x = 0, $size = count($myArray);
while ($x < $size) {
    // Do sometthing
    $x++;
}

So my assumption was that removing function calls equals faster code. My loops were pretty well optimised—what about my string comparisons?

String Comparison Functions

I needed to compare subsections of strings, so I was using a lot of substr(), but I realised that where I knew the position of the start and end of the substring, I could use array indexes instead! So this:

if (substr($str, 0, 2) === $test) { ... }

Became this:

if ($str[0] . $str[1] === $test) { ... }

After congratulating myself for discovering this handy shortcut and optimisation trick—getting rid of that function call had to be faster, right?—I decided I should probably check that it was actually the improvement I assumed it was.

Benchmarking Results

The results are actually surprising. In tests, the substr() call is actually faster for strings of three or more letters; using pseudo-array indexes is only really worth doing if you’re comparing at most one or two characters.

It surprised me that the concatenation operations were so much slower than the substr() function call (at least 50% slower)—I was really expecting it to be a much faster way of comparing strings.

So, the lesson is that array indexes are useful for strings, but only really when you’re comparing a single character (where you can see a drop in execution time of around two-thirds)—otherwise, stick to substr().

I'd love to hear what you think - please use the form below to leave your comments. Some HTML is permitted: b, i, em, del, ins, strong, pre, code, blockquote, abbr. URLs or email addresses will be automatically converted into links.

Comment Form

  1. bohemian's Gravatar

    bohemian at 1:23pm on 28th May, 2009 #

    Nice article there…i never knew that such small changes can optimize d code to execute in few seconds than that for 11 minutes!!
    for my 3rd year engineering project i m doing an e-library project which includes a search-engine to search inside documents stored in the server. this article will help a lot to optimize my code!
    if its possible…can you write an article about indexing and searching through the index in this site! i’ve done it myself but m not sure if its gud enough so it’ll b a great help!!

    great article anyways…i’ll b visiting this site a lot now heheh!

  2. ekşi's Gravatar

    ekşi at 12:19pm on 9th September, 2009 #

    Good post. I try to optimize my code after the bechmark test, too. I use CodeIgniter for this. It has got very useful library.