I am running through text and I want to save the items that do not contain certain text.
The text consists of single words, and multiple words.
so far i have:
def check_data[text]:
filter_words = ['subscribe','entertaining']
filter_bigrams = [{'free', 'ticket'}, {'current', 'price'}]
for filter in filter_words:
if filter in text:
return[0]
for filter in filter_bigrams:
if filter in text:
return[0]
return[1]
mytext = 'free xubscribes tickets now'
found = check_data[mytext]
print[found]
and the error i get it:
TypeError: 'in ' requires string as left operand, not set
In the above filter bigrams does not work. Please help?
Thanks
asked Dec 14, 2017 at 7:30
4
You can use this solution. You don't have to iterate over filter_words
to check if text
is one of the member of filter_words
. However you have to iterate over filter_bigrams
since it is the list of set. Note that this solution will only result first match.
import re
def check_data[text]:
all_words = re.findall[r'\b\w+\b', text]
filter_words = ['subscribe','entertaining']
filter_bigrams = [{'free', 'ticket'}, {'current', 'price'}]
for word in all_words
if word in filter_words:
return[0]
for filter in filter_bigrams:
if word in filter:
return[0]
return[1]
mytext = 'free xubscribes tickets now'
found = check_data[mytext]
print[found]
answered Dec 14, 2017 at 7:50
Sohaib FarooqiSohaib Farooqi
5,1173 gold badges29 silver badges41 bronze badges
I assume for bigrams you are just interested if both words is in "text" or not, and not concerned with their order.
This will work:-
import re
def check_data[text]:
all_words = re.findall[r'\b\w+\b', text]
filter_words = ['subscribe', 'entertaining']
filter_bigrams = [['free','tickets'],['current', 'price']
]
for word in all_words:
if word in filter_words:
return[0]
for filter_list in filter_bigrams:
if [filter_list[0] in all_words and filter_list[1] in all_words]:
return[0]
return[1]
Note:- In filter biagrams i have changed ticket to tickets, otherwise it won't work for "mytext"
answered Dec 14, 2017 at 8:22
MukuMuku
5183 silver badges17 bronze badges
1
In this article we will discuss when & how to use python’s filter[] function with lambda Python provides a method to filter out contents from a given sequence that can be a list, string or tuple etc. Arguments: Returns: Logic: Let’s
understand by examples Advertisements Suppose we have a list of strings i.e. Now let’s filter the contents of list and keep the strings with length 2 only using filter[] i.e. Output: So, filter[] iterated over all the strings in given list and the called isOfLengthFour[] for each string element. String elements for which
isOfLengthFour[] returned True were kept in a separate sequence and returned.filter[function, iterable]
filter[] iterates over all elements in the sequence and for each element it calls the given callback function. If this function returns False then that element is skipped, whereas elements for which it returned True are added into a new list. In the end it returns a new list with filtered contents based on the function passed to it as argument.Filter a list of strings in Python using filter[]
# List of string
listOfStr = ['hi', 'this' , 'is', 'a', 'very', 'simple', 'string' , 'for', 'us']
filteredList = list[filter[isOfLengthFour , listOfStr]]
print['Filtered List : ', filteredList]
Filtered List : ['hi', 'is', 'us']
Using filter[] with Lambda function
As you can see that we have created a separate function isOfLengthFour[] and passed it to filter[] function. We can completely avoid the creation of these kind of one time small function by using lambda function.
Let’s pass a lambda function to filter[] for selecting strings with length 2 only from the list i.e.
filteredList = list[filter[lambda x : len[x] == 2 , listOfStr]] print['Filtered List : ', filteredList]
Output:
Filtered List : ['hi', 'is', 'us']
It worked same as the previous example but we avoided creating extra function by using a lambda function.
Filter characters from a string in Python using filter[]
We can also use filter[] with a string as an iterable sequence and can filter out characters from it.
Suppose we have a string i.e.
strObj = 'Hi this is a sample string, a very sample string'
Now let’s use filter[] to remove or filter all occurrences of characters ‘s’ and ‘a’ from the above string i.e.
filteredChars = ''.join[[filter[lambda x: x not in ['a', 's'], strObj]]] print['Filtered Characters : ', filteredChars]
Output:
Filtered Characters : Hi thi i mple tring, very mple tring
filter[] basically returned a list of characters from above string by filtered all occurrences of ‘s’ & ‘a’. Then by using join[] we joined the filtered list of characters to a single string.
Filter an array in Python using filter[]
Suppose we have two array i.e.
array1 = [1,3,4,5,21,33,45,66,77,88,99,5,3,32,55,66,77,22,3,4,5] array2 = [5,3,66]
Now we want to filter the contents in array1 i.e. by removing numbers from array1 which are common in array1 and array2. For example new array should be,
[1, 4, 21, 33, 45, 77, 88, 99, 32, 55, 77, 22, 4]
Now let’s see how to do that using filter[] and lambda function
filteredArray = list[filter[lambda x : x not in array2, array1]] print['Filtered Array : ', filteredArray]
Output:
Filtered Array : [1, 4, 21, 33, 45, 77, 88, 99, 32, 55, 77, 22, 4]
It basically filtered out the elements from array1 which were present in array2.
Complete example is as follows,
''' Check if given string's length is 2 ''' def isOfLengthFour[strObj]: if len[strObj] == 2: return True else: return False def main[]: # List of string listOfStr = ['hi', 'this' , 'is', 'a', 'very', 'simple', 'string' , 'for', 'us'] print['Original List : ', listOfStr] print['*** Filter list using filter[] and a function ***'] filteredList = list[filter[isOfLengthFour , listOfStr]] print['Filtered List : ', filteredList] print['*** Filter list using filter[] and a Lambda Function ***'] filteredList = list[filter[lambda x : len[x] == 2 , listOfStr]] print['Filtered List : ', filteredList] print['*** Filter characters from a string using filter[] ***'] strObj = 'Hi this is a sample string, a very sample string' filteredChars = ''.join[[filter[lambda x: x not in ['a', 's'], strObj]]] print['Filtered Characters : ', filteredChars] print['*** Filter an array in Python using filter[] ***'] array1 = [1,3,4,5,21,33,45,66,77,88,99,5,3,32,55,66,77,22,3,4,5] array2 = [5,3,66] filteredArray = list[filter[lambda x : x not in array2, array1]] print['Filtered Array : ', filteredArray] if __name__ == '__main__': main[]
Output:
Original List : ['hi', 'this', 'is', 'a', 'very', 'simple', 'string', 'for', 'us'] *** Filter list using filter[] and a function *** Filtered List : ['hi', 'is', 'us'] *** Filter list using filter[] and a Lambda Function *** Filtered List : ['hi', 'is', 'us'] *** Filter characters from a string using filter[] *** Filtered Characters : Hi thi i mple tring, very mple tring *** Filter an array in Python using filter[] *** Filtered Array : [1, 4, 21, 33, 45, 77, 88, 99, 32, 55, 77, 22, 4]