Hướng dẫn spacy synonyms python
If you have a look at the semantic relatedness produced by this model: http://sense2vec.spacy.io , would these results be sufficient for you? We don't have this integrated into spaCy yet. But that's the plan. For now you could use the built-in word vectors. The following function is relatively slow. You should probably iterate over the vocab and cache all the results. >>> def most_similar(word): ... by_similarity = sorted(word.vocab, key=lambda w: word.similarity(w), reverse=True) ... return [w.orth_ for w in by_similarity[:10]] ... >>> most_similar(nlp.vocab[u'dog']) [u'dog', u'Dog', u'DOG', u'DoG', u'doG', u'cat', u'Cat', u'CAT', u'dogs', u'Dogs'] >>> most_similar(nlp.vocab[u'scrape']) [u'scrape', u'Scrape', u'SCRAPE', u'rustle', u'Rustle', u'RUSTLE', u'gouge', u'Gouge', u'GOUGE', u'gnaw'] Looking at these results, it'd be nice to make it a bit case sensitive. We should also exclude rare terms: >>> def most_similar(word): ... queries = [w for w in word.vocab if w.is_lower == word.is_lower and w.prob >= -15] ... by_similarity = sorted(queries, key=lambda w: word.similarity(w), reverse=True) ... return by_similarity[:10] ... >>> [w.lower_ for w in most_similar(nlp.vocab[u'dog'])] [u'dog', u'cat', u'dogs', u'dachshund', u'pig', u'hamster', u'goat', u'rabbit', u'chimp', u'llama'] Finally, you can also consider the Brown cluster, as a way to speed up the search: >>> nlp.vocab[u'dog'].cluster 37 >>> nlp.vocab[u'cat'].cluster 37 >>> nlp.vocab[u'imagination'].cluster 1893 >>> nlp.vocab[u'always'].cluster 15994 >>> nlp.vocab[u'goat'].cluster 57 >>> nlp.vocab[u'pig'].cluster 121 Try restricting the candidates to words whose Brown cluster is within some distance of the word you're looking for. I haven't tried this, but it should work pretty well. I want to find synonyms of words. If word is I used Spacy.
I can't use this because it takes a lot of time neither I can use PhraseMatcher for this Please help me thanks in Advance asked May 3, 2020 at 11:52
you could try using beautiful soup to parse data from an online thesaurus or use a python module such as [py-thesaurus]:https://pypi.org/project/py-thesaurus/
answered May 3, 2020 at 12:31
steve2020steve2020 3322 silver badges9 bronze badges 1 So it's a little hard to tell from your example, but it looks like you're creating a new spaCy doc in every iteration of your loop, which will be slow. You should do something like this instead:
This way spaCy only has to create the query doc once. If you want to make repeated queries, you should put the vector for each doc in annoy or similar to get the most similar doc quickly. Also, I generally wouldn't call this finding "synonyms" since every example you gave is multiple words. You're really looking for similar phrases. "Synonyms" would usually imply single words, like you'd find in a thesaurus, but that won't help you here. answered May 4, 2020 at 7:43
polm23polm23 12.5k6 gold badges29 silver badges52 bronze badges Not the answer you're looking for? Browse other questions tagged python nlp nltk spacy or ask your own question. |