Find all words in a string python

Hello I am new into regex and I'm starting out with python. I'm stuck at extracting all words from an English sentence. So far I have:

import re

shop="hello seattle what have you got"
regex = r'[\w*] '
list1=re.findall[regex,shop]
print list1

This gives output:

['hello', 'seattle', 'what', 'have', 'you']

If I replace regex by

regex = r'[\w*]\W*'

then output:

['hello', 'seattle', 'what', 'have', 'you', 'got', '']

whereas I want this output

['hello', 'seattle', 'what', 'have', 'you', 'got']

Please point me where I am going wrong.

We sometimes come through situations where we require to get all the words present in the string, this can be a tedious task done using the native method. Hence having shorthands to perform this task is always useful. Additionally, this article also includes the cases in which punctuation marks have to be ignored.
Method #1 : Using split[] 
Using the split function, we can split the string into a list of words and this is the most generic and recommended method if one wished to accomplish this particular task. But the drawback is that it fails in cases the string contains punctuation marks.
 

Python3

test_string = "Geeksforgeeks is best Computer Science Portal"

print ["The original string is : " +  test_string]

res = test_string.split[]

print ["The list of words is : " +  str[res]]

Output: 
The original string is : Geeksforgeeks is best Computer Science Portal 
The list of words is : [‘Geeksforgeeks’, ‘is’, ‘best’, ‘Computer’, ‘Science’, ‘Portal’] 
 

  
Method #2 : Using regex[ findall[] ] 
In the cases which contain all the special characters and punctuation marks, as discussed above, the conventional method of finding words in string using split can fail and hence requires regular expressions to perform this task. findall function returns the list after filtering the string and extracting words ignoring punctuation marks.
 

Python3

import re

test_string = "Geeksforgeeks,    is best @# Computer Science Portal.!!!"

print ["The original string is : " +  test_string]

res = re.findall[r'\w+', test_string]

print ["The list of words is : " +  str[res]]

Output: 
The original string is : Geeksforgeeks, is best @# Computer Science Portal.!!! 
The list of words is : [‘Geeksforgeeks’, ‘is’, ‘best’, ‘Computer’, ‘Science’, ‘Portal’] 
 

  
Method #3 : Using regex[] + string.punctuation 
This method also used regular expressions, but string function of getting all the punctuations is used to ignore all the punctuation marks and get the filtered result string.
 

Python3

import re

import string

test_string = "Geeksforgeeks,    is best @# Computer Science Portal.!!!"

print ["The original string is : " +  test_string]

res = re.sub['['+string.punctuation+']', '', test_string].split[]

print ["The list of words is : " +  str[res]]

Output: 
The original string is : Geeksforgeeks, is best @# Computer Science Portal.!!! 
The list of words is : [‘Geeksforgeeks’, ‘is’, ‘best’, ‘Computer’, ‘Science’, ‘Portal’] 
 


In this article, we’ll take a look at how we can find a string in a list in Python.

There are various approaches to this problem, from the ease of use to efficiency.

Using the ‘in’ operator

We can use Python’s in operator to find a string in a list in Python. This takes in two operands a and b, and is of the form:

Here, ret_value is a boolean, which evaluates to True if a lies inside b, and False otherwise.

We can directly use this operator in the following way:

a = [1, 2, 3]

b = 4

if b in a:
    print['4 is present!']
else:
    print['4 is not present']

Output

We can also convert this into a function, for ease of use.

def check_if_exists[x, ls]:
    if x in ls:
        print[str[x] + ' is inside the list']
    else:
        print[str[x] + ' is not present in the list']


ls = [1, 2, 3, 4, 'Hello', 'from', 'AskPython']

check_if_exists[2, ls]
check_if_exists['Hello', ls]
check_if_exists['Hi', ls]

Output

2 is inside the list
Hello is inside the list
Hi is not present in the list

This is the most commonly used, and recommended way to search for a string in a list. But, for illustration, we’ll show you other methods as well.

Using List Comprehension

Let’s take another case, where you wish to only check if the string is a part of another word on the list and return all such words where your word is a sub-string of the list item.

Consider the list below:

ls = ['Hello from AskPython', 'Hello', 'Hello boy!', 'Hi']

If you want to search for the substring Hello in all elements of the list, we can use list comprehensions in the following format:

ls = ['Hello from AskPython', 'Hello', 'Hello boy!', 'Hi']

matches = [match for match in ls if "Hello" in match]

print[matches]

This is equivalent to the below code, which simply has two loops and checks for the condition.

ls = ['Hello from AskPython', 'Hello', 'Hello boy!', 'Hi']

matches = []

for match in ls:
    if "Hello" in match:
        matches.append[match]

print[matches]

In both cases, the output will be:

['Hello from AskPython', 'Hello', 'Hello boy!']

As you can observe, in the output, all the matches contain the string Hello as a part of the string. Simple, isn’t it?

Using the ‘any[]’ method

In case you want to check for the existence of the input string in any item of the list, We can use the any[] method to check if this holds.

For example, if you wish to test whether ‘AskPython’ is a part of any of the items of the list, we can do the following:

ls = ['Hello from AskPython', 'Hello', 'Hello boy!', 'Hi']

if any["AskPython" in word for word in ls]:
    print['\'AskPython\' is there inside the list!']
else:
    print['\'AskPython\' is not there inside the list']

Output

'AskPython' is there inside the list!

Using filter and lambdas

We can also use the filter[] method on a lambda function, which is a simple function that is only defined on that particular line. Think of lambda as a mini function, that cannot be reused after the call.

ls = ['Hello from AskPython', 'Hello', 'Hello boy!', 'Hi']

# The second parameter is the input iterable
# The filter[] applies the lambda to the iterable
# and only returns all matches where the lambda evaluates
# to true
filter_object = filter[lambda a: 'AskPython' in a, ls]

# Convert the filter object to list
print[list[filter_object]]

Output

We do have what we expected! Only one string matched with our filter function, and that’s indeed what we get!

Conclusion

In this article, we learned about how we can find a string with an input list with different approaches. Hope this helped you with your problem!

References

  • JournalDev article on finding a string in a List
  • StackOverflow question on finding a string inside a List

How do I get a list of words in a string in Python?

How to Convert a String to a List of Words. Another way to convert a string to a list is by using the split[] Python method. The split[] method splits a string into a list, where each list item is each word that makes up the string. Each word will be an individual list item.

How do I get only words from a string in Python?

Method #1 : Using split[] Using the split function, we can split the string into a list of words and this is the most generic and recommended method if one wished to accomplish this particular task. But the drawback is that it fails in cases the string contains punctuation marks.

How do you find all occurrences of string in a string Python?

Use the string. count[] Function to Find All Occurrences of a Substring in a String in Python. The string. count[] is an in-built function in Python that returns the quantity or number of occurrences of a substring in a given particular string.

How do you find if a string contains a word in Python?

The simplest way to check if a string contains a substring in Python is to use the in operator. This will return True or False depending on whether the substring is found. For example: sentence = 'There are more trees on Earth than stars in the Milky Way galaxy' word = 'galaxy' if word in sentence: print['Word found.

Chủ Đề