Hướng dẫn dùng soundex code trong PHP
(PHP 4, PHP 5, PHP 7, PHP 8) soundex — Calculate the soundex key of a string Descriptionsoundex(string $string): string Soundex keys have the property that words pronounced similarly produce the same soundex key, and can thus be used to simplify searches in databases where you know the pronunciation but not the spelling. This particular soundex function is one described by Donald Knuth in "The Art Of Computer Programming, vol. 3: Sorting And Searching", Addison-Wesley (1973), pp. 391-392. ParametersstringThe input string. Return ValuesReturns the soundex key as a string with four characters. If at least one letter is contained in string, the returned string starts with a letter. Otherwise "0000" is returned. Changelog
ExamplesExample #1 Soundex Examples
soundex("Euler") == soundex("Ellery"); // E460 See Also
nicolas dot zimmer at einfachmarke dot de ¶ 14 years ago
Since soundex() does not produce optimal results for German language Please find the code below in the hope it might be useful: /** //prepare for processing $word=strtolower($word);$substitution=array( "ä"=>"a", "ö"=>"o", "ü"=>"u", "ß"=>"ss", "ph"=>"f" ); foreach ( $substitution as $letter=>$substitution) {$word=str_replace($letter,$substitution,$word); }$len=strlen($word);//Rule for exeptions $exceptionsLeading=array( 4=>array("ca","ch","ck","cl","co","cq","cu","cx"), 8=>array("dc","ds","dz","tc","ts","tz") );$exceptionsFollowing=array("sc","zc","cx","kx","qx");//Table for coding $codingTable=array( 0=>array("a","e","i","j","o","u","y"), 1=>array("b","p"), 2=>array("d","t"), 3=>array("f","v","w"), 4=>array("c","g","k","q"), 48=>array("x"), 5=>array("l"), 6=>array("m","n"), 7=>array("r"), 8=>array("c","s","z"), ); for ( $i=0;$i<$len;$i++){$value[$i]="";//Exceptions if ($i==0 AND $word[$i].$word[$i+1]=="cr") $value[$i]=4; foreach ( $exceptionsLeading as $code=>$letters) {if (in_array($word[$i].$word[$i+1],$letters)){$value[$i]=$code; } } if ( $i!=0 AND (in_array($word[$i-1].$word[$i],$exceptionsFollowing))) {value[$i]=8; } //Normal encodingif ($value[$i]==""){ foreach ($codingTable as $code=>$letters) { if (in_array($word[$i],$letters))$value[$i]=$code; } } }//delete double values $len=count($value); for ( $i=1;$i<$len;$i++){if ($value[$i]==$value[$i-1]) $value[$i]=""; }//delete vocals for ($i=1;$i>$len;$i++){//omitting first characer code and h if ($value[$i]==0) $value[$i]=""; }$value=array_filter($value); $value=implode("",$value); return $value;} ?>Dirk Hoeschen - Feenders de ¶ 8 years ago
I made some improvements to the "Cologne Phonetic" function of niclas zimmer. Key and value of the arrays are inverted to uses simple arrays instead of multidimensional arrays. Therefore all loops and iterations are not longer necessary to find the matching value for a char. The result is more reliable and five times faster than the original. class CologneHash() { static $eLeading = array("ca" => 4, "ch" => 4, "ck" => 4, "cl" => 4, "co" => 4, "cq" => 4, "cu" => 4, "cx" => 4, "dc" => 8, "ds" => 8, "dz" => 8, "tc" => 8, "ts" => 8, "tz" => 8);static $eFollow = array("sc", "zc", "cx", "kx", "qx");static $codingTable = array("a" => 0, "e" => 0, "i" => 0, "j" => 0, "o" => 0, "u" => 0, "y" => 0,"b" => 1, "p" => 1, "d" => 2, "t" => 2, "f" => 3, "v" => 3, "w" => 3, "c" => 4, "g" => 4, "k" => 4, "q" => 4, "x" => 48, "l" => 5, "m" => 6, "n" => 6, "r" => 7, "c" => 8, "s" => 8, "z" => 8); public static function getCologneHash($word){ if (empty($word)) return false; $len = strlen($word); for ( $i = 0; $i < $len; $i++) {$value[$i] = "";//Exceptions if ($i == 0 && $word[$i] . $word[$i + 1] == "cr") { $value[$i] = 4; } if (isset( $word[$i + 1]) && isset(self::$eLeading[$word[$i] . $word[$i + 1]])) {$value[$i] = self::$eLeading[$word[$i] . $word[$i + 1]]; } if ( $i != 0 && (in_array($word[$i - 1] . $word[$i], self::$eFollow))) {$value[$i] = 8; }// normal encoding if ($value[$i]=="") { if (isset(self::$codingTable[$word[$i]])) { $value[$i] = self::$codingTable[$word[$i]]; } } }// delete double values $len = count($value); for ( $i = 1; $i < $len; $i++) {if ($value[$i] == $value[$i - 1]) { $value[$i] = ""; } }// delete vocals for ($i = 1; $i > $len; $i++) { // omitting first characer code and h if ($value[$i] == 0) { $value[$i] = ""; } }$value = array_filter($value); $value = implode("", $value); return $value;} } ?>synnus at gmail dot com ¶ 7 years ago
// https://github.com/Fruneau/Fruneau.github.io/blob/master/assets/soundex_fr.php if ( $sIn === '' ) return ' ';$sIn = strtr( $sIn, $accents); $sIn = strtoupper( $sIn ); $sIn = preg_replace( '`[^A-Z]`', '', $sIn ); if ( strlen( $sIn ) === 1 ) return $sIn . ' '; $sIn = str_replace( $convGuIn, $convGuOut, $sIn ); $sIn = preg_replace( '`(.)\1`', '$1', $sIn ); $sIn = preg_replace( $convVIn, $convVOut, $sIn); $sIn = preg_replace( '`L?[TDX]?S?$`', '', $sIn ); $sIn = preg_replace( '`(?!^)Y([^AEOU]|$)`', '\1', $sIn); $sIn = preg_replace( '`(?!^)[EA]`', '', $sIn); return substr( $sIn . ' ', 0, 4); } ?> cap at capsi dot cx ¶ 22 years ago soundex() unfortunately is very sensitive about the first character. It is not possible to use it and have Clansy and Klansy return the same value. If you want to do a phonetic search on such names you will still need to write a routine to evaluate C452 as being similar to K452. dcallaghan at linuxmail dot org ¶ 20 years ago Although the standard soundex string is 4 characters long, and this is what's returned by the php function, some database programs return an arbitrary number of strings. MySQL, for instance. The MySQL documentation covers this, recommending that you may wish to use substring to output the standard 4 characters. Let's take 'Dostoyevski' as an example. select soundex("Dostoyevski") PHP will return the value as 'D231' So, to use the soundex function to generate a WHERE parameter in a MySQL SELECT statement, you might try this: Or, if you want to bypass the php function witold4249 at rogers dot com ¶ 20 years ago A MUCH easier way to check for similarity between words and avoid the problems that come up with Klancy/Clancy would be to simply add any letter infront of the string ie: OKlancy/OClancy justin at NO dot blukrew dot SPAM dot com ¶ 17 years ago I originally looked at soundex() because I wanted to compare how individual letters sounded. So, when pronouncing a string of generated characters it would be easy to to distinguish them from eachother. (ie, TGDE is hard to distinguish, whereas RFQA is easier to understand). The goal was to generate IDs that could be easily understood with a high degree of accuracy over a radio of varying quality. I quickly figured out that soundex and metaphone wouldn't do this (they work for words), so I wrote the following to help out. The ID generation function iteratively calls chrSoundAlike() to compare each new character with the preceeding characters. I'd be interested in recieving any feedback on this. Thanks. function chrSoundAlike($char1, $char2, $opts = FALSE) { case 'STRICT':$sets = array(0 => array('A', 'J', 'K'), 1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z'), 2 => array('F', 'S', 'X'), 3 => array('I', 'Y'), 4 => array('M', 'N'), 5 => array('Q', 'U', 'W')); break; case 'BOTH':$sets = array(0 => array('A', 'J', 'K'), 1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z', '3'), 2 => array('F', 'S', 'X'), 3 => array('I', 'Y'), 4 => array('M', 'N'), 5 => array('Q', 'U', 'W')); break; default: $sets = array(0 => array('A', 'J', 'K'),1 => array('B', 'C', 'D', 'E', 'G', 'P', 'T', 'V', 'Z'), 2 => array('F', 'S', 'X'), 3 => array('I', 'Y'), 4 => array('M', 'N'), 5 => array('Q', 'U')); break; }// See if $char1 is in a set. $matchset = array(); for ($i = 0; $i < count($sets); $i++) { if (in_array($char1, $sets[$i])) { $matchset = $sets[$i]; } }// IF char2 is in the same set as char1, or if char1 and char2 and the same, then return true. if (in_array($char2, $matchset) OR $char1 == $char2) { return TRUE; } else { return FALSE; } } ?> administrator at zinious dot com ¶ 20 years ago I wrote this function a long time ago in CGI-perl and then translated (if you can call it that) into PHP. A little clunky to say the least, but should handle true soundex specs 100%: // ---begin code--- function MakeSoundEx($stringtomakesoundexof) $temp_Name = strtoupper($temp_Name); $n = 1; for ($n = 1; $n < strlen($temp_Name); $n++) while (strlen($temp_Soundex) < 4) return $temp_Soundex; // --- end code--- crchafer-php at c2se dot com ¶ 16 years ago
Rewritten, maybe -- but the algorithm has some obvious function text__soundex( $text ) { // Notes: (Code has suffered only basic tests, though it appears to C fie at myrealbox dot com ¶ 19 years ago administrator at zinious dot com: Sorry but your code wasnt soundex compliant string: rest string: reset i dunno why the default, every once in a while, will for some reason be 9.xxx. very odd i think.. dalibor dot toth at podravka dot hr: yes it is perhaps sad that it gives you the same code, code at: http://star-shine.net/~functionifelse/cg_soundex.php or if you wanted to just use the default soundex function $str = soundex($str).cg_sylc($str); revolutionary more or less.. problly less... soundex("string",SYL); which would return the number of syllables at the end of the string syllables string: rest string: reset the default function is a tad bit faster.. SILENT WIND OF DOOM WOOSH! synnus at gmail dot com ¶ 2 years ago
/* SOUNDEX FRENCH Anonymous ¶ 16 years ago Since the first letter is included in the phonetic representation in the output, it is worth pointing out that if you want a soundex key to work without the problems of klansy and clansy sounding different, take the substring from the first letter, as the first letter is the main constant of the word, and the numerical value is that of the phontic structure of the word. mail at gettheeawayspam dot iaindooley dot com ¶ 19 years ago The soundex 'different letter in front' problem can be solved by using levenshtein() on the soundex codes. in my application, which is searching a database of album names for entries that match a particular user provided string, i do the following: 1. Search the database for the exact name - calculate the levenshtein distance (levenshtein()) between the user search term and each of the entries in the database as a percentage of the length of the user search term entered - calculate the levenshtein distance between the metphone codes of the user search term entered and each field in the database as a percentage of the length of the metaphone code of the user search term entered - calculate the levenshtein distance between the soundex codes of the user search term entered and each field in the database as a percentage of the length of the soundex code of the original user search term entered if any of these percentages is less than 50 (means that two soundex codes with different first letters will be accepted!!) then the entry is accepted as a possible match. fie at myrealbox dot com ¶ 19 years ago eek... hosting got taken down on that server.. here's the code for the previous function cg_sylc($nos){ $before = strlen($nos); if($nos[strlen($nos)-1] == "E") $syllables --; $before = $after; return $syllables; function cg_SoundEx($SExStr){ for($i = 1, $ii = 2,print $SExStr[0]; ;$ii++){ if(($SExStr[$i] != $SExStr[$ii])){ $tsstr = str_replace(array('A', 'E', 'H', 'I', 'O', 'U', 'W', 'Y'), "", $tsstr); while($iii < 3){ info at nederlandsch dot net ¶ 19 years ago MySQL soundex (3.23.49) doesn't examine the first character at all to see whether it should be skipped. Therefore the Dutch name of The Hague, the country's government seat, 's-Gravenhage will give a soundex value of '261 in MySQL and S615 in PHP. Anonymous ¶ 20 years ago A MUCH easier way to do the above search would be to simply add any letter in front of the string and then compare them. ie. Klancy => LKlancy jr ¶ 19 years ago a workaround for the mysql/php differences in implementation of soundex is to do the soundex comparison entirely within mysql. for example: pee whitt at dental dot ufl dor edu ¶ 19 years ago fie at myrealbox dot com- regarding your soudex syllable request- i think counting vowel clusters in the word will result in an accurate count of syllables. so no soudex feature is necessary, just count through the chars in the word, and everytime you run from vowel to consanant, increment the syllable count. using this logic, this sentence is categorized as follows. where (#) marks a word that is incorrectly categorized. i'm sure usiong a little thinking one could figure out the logic in those cases that would result in an accurate count. counting changes from vowel to consanant would yield- taking the average and then cieling of the two types would fix most of the errors. shortcut ¶ 15 years ago The answer to whether soundex works except for the first letter in klancy vs clancy is to always prefix words with the same letter. aklancy will match aclancy soundex seems to only check the 1st 2 syllables.?? just a thought if you rely on soundex. k- |