Hướng dẫn dùng suggestion spelling trong PHP

It seems if you are trying to do something similar to the Google 'Did you mean:' suggestions and are selecting the first word given by the pspell_suggest() function, then it will not work well with custom dictionaries and replacements. Take the following code for example:

    $pspell_config = pspell_config_create("en");
   
pspell_config_personal($pspell_config, "/home/user/public_html/custom.pws");
   
pspell_config_repl($pspell_config, "/home/user/public_html/custom.repl");
   
$pspell_link = pspell_new_config($pspell_config);$words = preg_split ("/\s+/", $query);
   
$ii = count($words);

    global

$spellchecked;
   
$spellchecked = "";

        for(

$i=0;$i<$ii;$i++){

        if (

pspell_check($pspell_link, $words[$i]))
        {
           
$spellchecked .= $words[$i]." ";
        }
        else
        {
           
$erroneous = "yes";
           
$suggestions = pspell_suggest($pspell_link, $words[$i]);
           
$spellchecked .= $suggestions[0]." ";
        }
    }
    if(
$erroneous == "yes")
    {
        echo
"Did you mean: ".$spellchecked."?";
    }
    else
    {
        echo
$spellchecked . " is a valid word/phrase";
    }
?>

This works fine most of the time, and gives suggestions to what you meant when inserting a spelling mistake with most inputs. However, if you specify a custom replacement and then search for the misspelt word that you specified, then if it is not the first returned suggestion it wont be used in the 'Did you mean' end result. What you need to do is open up the custom dictionary using fopen and fread, and then for each of the suggested words, check if they are in the dictionary. If the suggested word is in the custom dictionary then use it in the 'Did you mean' part, if not, discard it and try the next. Hope this helps anyone who comes across this problem with trying to get more accurate suggestions.

I attempted to create a class that takes a list of phrases and compares that to the user inputs. What I was trying to do is get things like Porshre Ceyman to correct to Porsche Cayman for example.

This class requires an array of correct terms $this->full_model_list , and an array of the user input $search_terms. I took out the contruct so you will need to pass in the full_model_list. Note, this didn't fully work so I decided to scrap it, it was adapted from someone looking to correct large sentences ...

You would call it like so:

$sth = new SearchTermHelper;
$resArr = $sth->spellCheckModelKeywords($search_terms)

Code (VERY BETA) :

searchAgainst when compared to $this->input
    // --------------------------------------------------------------------------------------------------------------

    public function findBestMatchReturnString($searchAgainst, $input, $max_tolerance = 200, $max_length_diff = 200, $min_str = 3, $lower_case = true, $search_in_phrases = true)
    {
        if (empty($searchAgainst) || empty($input)) return "";

        //weed out strings we thing are too small for this
        if (strlen($input) <= $min_str) return $input;

        $foundbestmatch = -1;
        if ($lower_case) $input = strtolower($input);

        //sort list or else not best matches may be found first
        $counts = array();
        foreach ($searchAgainst as $s) {
            $counts[] = strlen($s);
        }
        array_multisort($counts, $searchAgainst);

        //get the metaphone equivalent for the input phrase
        $tempInput = implode(" ", $this->getMetaPhone($input));
        $list = array();

        foreach ($searchAgainst as $phrase) {

            if ($lower_case) $phrase = strtolower($phrase);

            if ($search_in_phrases) $phraseArr = explode(" ",$phrase);

            foreach ($phraseArr as $word) {
                //get the metaphone equivalent for each phrase we're searching against
                $tempSearchAgainst = implode(' ', $this->getMetaPhone($word));
                $similarity = levenshtein($tempInput, $tempSearchAgainst);

                if ($similarity == 0) // we found an exact match
                {
                    $closest = $word;
                    $foundbestmatch = 0;
                    echo "" . $closest . "(" . $foundbestmatch . ") 
"; break; } if ($similarity <= $foundbestmatch || $foundbestmatch < 0) { $closest = $word; $foundbestmatch = $similarity; //keep score if (array_key_exists($closest, $list)) { //echo "" . $closest . "(" . $foundbestmatch . ")
"; $list[$closest] += 1; } else { $list[$closest] = 1; } } } if ($similarity == 0 || $similarity <= $max_tolerance) break; } // if we find a bunch of a value, assume it to be what we wanted if (!empty($list)) { if ($most_occuring = array_keys($list, max($list)) && max($list) > 10) { return $closest; } } //echo "input:".$input."(".$foundbestmatch.") match: ".$closest."\n"; // disallow results to be all that much different in char length (if you want) if (abs(strlen($closest) - strlen($input)) > $max_length_diff) return ""; // based on tolerance of difference, return if match meets this requirement (0 = exact only 1 = close, 20+ = far) return ((int)$foundbestmatch <= (int)$max_tolerance) ? $closest : ""; } // -------------------------------------------------------------------------------------------------------------- // -- Handles passing arrays instead of a string above ( could have done this in the func above ) // -------------------------------------------------------------------------------------------------------------- public function findBestMatchReturnArray($searchAgainst, $inputArray, $max_tolerance = 200, $max_length_diff = 200, $min_str = 3) { $results = array(); $tempStr = ''; foreach ($inputArray as $item) { if ($tmpStr = $this->findBestMatchReturnString($searchAgainst, $item, $max_tolerance, $max_length_diff, $min_str)) $results[] = $tmpStr; } return (!empty($results)) ? $results : $results = array(); } // -------------------------------------------------------------------------------------------------------------- // -- Build combos of search terms -- So we can check Cayman S or S Cayman etc. // careful, this is very labor intensive ( O(n^k) ) // -------------------------------------------------------------------------------------------------------------- public function buildSearchCombinations(&$set, &$results) { for ($i = 0; $i < count($set); $i++) { $results[] = $set[$i]; $tempset = $set; array_splice($tempset, $i, 1); $tempresults = array(); $this->buildSearchCombinations($tempset, $tempresults); foreach ($tempresults as $res) { $results[] = trim($set[$i]) . " " . trim($res); } } } // -------------------------------------------------------------------------------------------------------------- // -- Model match function -- Get best model match from user input. // -------------------------------------------------------------------------------------------------------------- public function findBestSearchMatches($model_type, $search_terms, $models_list) { $partial_search_phrases = array(); if (count($search_terms) > 1) { $this->buildSearchCombinations($search_terms, $partial_search_phrases); // careful, this is very labor intensive ( O(n^k) ) $partial_search_phrases = array_diff($partial_search_phrases, $search_terms); for ($i = 0; $i < count($search_terms); $i++) $partial_search_phrases[] = $search_terms[$i]; $partial_search_phrases = array_values($partial_search_phrases); } else { $partial_search_phrases = $search_terms; } //sort list or else not best matches may be found first $counts = array(); foreach ($models_list as $m) { $counts[] = strlen($m); } array_multisort($counts,SORT_DESC,$models_list); unset($counts); //sort list or else not best matches may be found first foreach ($partial_search_phrases as $p) { $counts[] = strlen($p); } array_multisort($counts,SORT_DESC,$partial_search_phrases); $results = array("exact_match" => '', "partial_match" => ''); foreach ($partial_search_phrases as $term) { foreach ($models_list as $model) { foreach ($model_type as $mt) { if (strpos(strtolower($model), strtolower($mt)) !== false) { if ((strtolower($model) == strtolower($term) || strtolower($model) == strtolower($mt . " " . $term)) ) { // echo " " . $model . " === " . $term . "
"; if (strlen($model) > strlen($results['exact_match']) /*|| strtolower($term) != strtolower($mt)*/ ) { $results['exact_match'] = strtolower($model); return $results; } } else if (strpos(strtolower($model), strtolower($term)) !== false) { if (strlen($term) > strlen($results['partial_match']) || strtolower($term) != strtolower($mt) ) { $results['partial_match'] = $term; //return $results; } } } } } } return $results; } // -------------------------------------------------------------------------------------------------------------- // -- Get all models in DB for Make (e.g. porsche) (could include multiple makes) // -------------------------------------------------------------------------------------------------------------- public function initializeFullModelList($make) { $this->full_model_list = array(); $modelsDB = $this->inv->getAllModelsForMakeAndCounts($make); foreach ($modelsDB as $m) { $this->full_model_list[] = $m['model']; } } // -------------------------------------------------------------------------------------------------------------- // -- spell checker -- use algorithm to check model spelling (could expand to include english words) // -------------------------------------------------------------------------------------------------------------- public function spellCheckModelKeywords($search_terms) { // INPUTS: findBestMatchReturnArray($searchList, $inputArray,$tolerance,$differenceLenTolerance,$ignoreStringsOfLengthX,$useLowerCase); // // $searchList, - The list of items you want to get a match from // $inputArray, - The user input value or value array // $tolerance, - How close do we want the match to be 0 = exact, 1 = close, 2 = less close, etc. 20 = find a match 100% of the time // $lenTolerance, - the number of characters between input and match allowed, ie. 3 would mean match can be +- 3 in length diff // $ignoreStrLessEq, - min number of chars that must be before checking (i.e. if 3 ignore anything 3 in length to check) // $useLowerCase - puts the phrases in lower case for easier matching ( not needed per se ) // $searchInPhrases - compare against every word in searchList (which could be groups of words per array item (so search every word past to function $tolerance = 0; // 1-2 recommended $lenTolerance = 1; // 1-3 recommended $ignoreStrLessEq = 3; // may not want to correct tiny words, 3-4 recommended $useLowercase = true; // convert to lowercase matching = true $searchInPhrases = true; //match words not phrases, true recommended $spell_checked_search_terms = $this->findBestMatchReturnArray($this->full_model_list, $search_terms, $tolerance, $lenTolerance, $ignoreStrLessEq, $useLowercase,$searchInPhrases); $spell_checked_search_terms = array_values($spell_checked_search_terms); // return spell checked terms if (!empty($spell_checked_search_terms)) { if (strpos(strtolower(implode(" ", $spell_checked_search_terms)), strtolower(implode(" ", $search_terms))) === false //&& // strlen(implode(" ", $spell_checked_search_terms)) > 4 ) { return $spell_checked_search_terms; } } // or just return search terms as is return $search_terms; } } ?>