Google Style Did You Mean....? in PHP
There are actually a lot of php functions out there to look for similar text. The most obvious one?
similar_text()
You must pass 2 parameters plus an optional third. The two first are
the strings to compare, and the optional one is the percentage of
"closeness" you want them to have. It is quite useful, although it is
too expensive in terms of time to use with huge database searches, so I
wouldn't recommend it.
There are two other methods that might be good for some cases, and
another function that is just the best. I'll show you first the best
way to achieve this:
It is the Levenshtein algorithm,
which basically finds the number of characters you must add, edit, or
remove from a string to make it match another one. At first it doesn't
sound too useful, but take a look at this example:
< ?php
// input misspelled word
$input = 'carrrot';
// array of words to check against
$words = array('apple','pineapple','banana','orange',
'radish','carrot','pea','bean','potato');
// no shortest distance found, yet
$shortest = -1;
// loop through words to find the closest
foreach ($words as $word) {
// calculate the distance between the input word,
// and the current word
$lev = levenshtein($input, $word);
// check for an exact match
if ($lev == 0) {
// closest word is this one (exact match)
$closest = $word;
$shortest = 0;
// break out of the loop; we've found an exact match
break;
}
// if this distance is less than the next found shortest
// distance, OR if a next shortest word has not yet been found
if ($lev <= $shortest || $shortest < 0) {
// set the closest match, and shortest distance
$closest = $word;
$shortest = $lev;
}
}
echo "Input word: $input\n";
if ($shortest == 0) {
echo "Exact match found: $closest\n";
} else {
echo "Did you mean: $closest?\n";
}
?>
This is an example where even a misspelled word can be found. It uses the Levenshtein to look for the word which is the most similar one, and then it is returned. This is the output of the code before:
Input word: carrrot
Did you mean: carrot?
The use of this function is quite simple, although there are many optional parameters for more precise use. See the php.net reference for this function.
The other ways I said that could be used for this are soudex and metaphone, although their use might be more complicated for this particular suggestions use.
Soundex will create a key that is the same for all words that are pronounced the same.For example, the following code:
< ?php
echo soundex('beard').'';
echo soundex('bird').'';
echo soundex('bear');
?>
Will produce this output:
B630
B630
B600
Where beard and bird are the same. This could make suggestions fast if you have already created a column in the mysql
tables with the soundex key of the tags for example, so that you could
search not only for the string, but also for its soundex key...
UPDATE: You can use MySQL's built in function SOUNDEX() to search both for the string as-is, or for the soundex too, to provide also misspelled words.
And finally, the metaphone function,
is a variation of the soundex key that produces also a key that is the
same for all words pronounced the same, but more accurately than
soundex, since metaphone actually knows the rules of English
pronounciation.
The use would be exactly the same as soundex, and if you are going to
use something of the sort I would recommend metaphone over soundex for
its improved accuracy.
But bear in mind that both soundex and metaphone won't probably work
fine in most other languages, or at least for languages with phonemes
that don't exist in English.
Hope you found this useful,
Alex
- Login or register to post comments
- 13275 reads
- Flag as offensive
- Email this Story
- Printer-friendly version
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)






