How do you filter a text string in python?

I am running through text and I want to save the items that do not contain certain text.

The text consists of single words, and multiple words.

so far i have:

def check_data(text):
    filter_words = ['subscribe','entertaining']
    filter_bigrams = [{'free', 'ticket'}, {'current', 'price'}]


   for filter in filter_words:
       if filter in text:
           return(0)

   for filter in filter_bigrams:
       if filter in text:
           return(0)

   return(1)

mytext = 'free xubscribes tickets now'
found = check_data(mytext)
print(found)

and the error i get it:

TypeError: 'in ' requires string as left operand, not set

In the above filter bigrams does not work. Please help?

Thanks

asked Dec 14, 2017 at 7:30

4

You can use this solution. You don't have to iterate over filter_words to check if text is one of the member of filter_words. However you have to iterate over filter_bigrams since it is the list of set. Note that this solution will only result first match.

import re

def check_data(text):
    all_words = re.findall(r'\b\w+\b', text)
    filter_words = ['subscribe','entertaining']
    filter_bigrams = [{'free', 'ticket'}, {'current', 'price'}]

   for word in all_words
       if word in filter_words:
          return(0)


       for filter in filter_bigrams:
           if word in filter:
              return(0)

   return(1)

mytext = 'free xubscribes tickets now'
found = check_data(mytext)
print(found)

answered Dec 14, 2017 at 7:50

Sohaib FarooqiSohaib Farooqi

5,1173 gold badges29 silver badges41 bronze badges

I assume for bigrams you are just interested if both words is in "text" or not, and not concerned with their order.

This will work:-

import re

def check_data(text):
    all_words = re.findall(r'\b\w+\b', text)
    filter_words = ['subscribe', 'entertaining']
    filter_bigrams = [['free','tickets'],['current', 'price']
    ]

    for word in all_words:
        if word in filter_words:
            return(0)


    for filter_list in filter_bigrams:
        if (filter_list[0] in all_words and filter_list[1] in all_words):
            return(0)


    return(1)

Note:- In filter biagrams i have changed ticket to tickets, otherwise it won't work for "mytext"

answered Dec 14, 2017 at 8:22

MukuMuku

5183 silver badges17 bronze badges

1

In this article we will discuss when & how to use python’s filter() function with lambda

Python provides a method to filter out contents from a given sequence that can be a list, string or tuple etc.

filter(function, iterable)

Arguments:

  • An iterable sequence to be filtered.
  • a function that accepts an argument and returns bool i.e. True or False based on it’s logic.

Returns:

  • A new sequence of filtered contents.

Logic:
filter() iterates over all elements in the sequence and for each element it calls the given callback function. If this function returns False then that element is skipped, whereas elements for which it returned True are added into a new list. In the end it returns a new list with filtered contents based on the function passed to it as argument.

Let’s understand by examples

Advertisements

Filter a list of strings in Python using filter()

Suppose we have a list of strings i.e.

# List of string
listOfStr = ['hi', 'this' , 'is', 'a', 'very', 'simple', 'string' , 'for', 'us']

Now let’s filter the contents of list and keep the strings with length 2 only using filter() i.e.

filteredList = list(filter(isOfLengthFour , listOfStr))

print('Filtered List : ', filteredList)

Output:

Filtered List :  ['hi', 'is', 'us']

So, filter() iterated over all the strings in given list and the called isOfLengthFour() for each string element. String elements for which isOfLengthFour() returned True were kept in a separate sequence and returned.

Using filter() with Lambda function

As you can see that we have created a separate function isOfLengthFour() and passed it to filter() function. We can completely avoid the creation of these kind of one time small function by using lambda function.
Let’s pass a lambda function to filter() for selecting strings with length 2 only from the list i.e.

filteredList = list(filter(lambda x : len(x) == 2 , listOfStr))

print('Filtered List : ', filteredList)

Output:

Filtered List :  ['hi', 'is', 'us']

It worked same as the previous example but we avoided creating extra function by using a lambda function.

Filter characters from a string in Python using filter()

We can also use filter() with a string as an iterable sequence and can filter out characters from it.

Suppose we have a string i.e.

strObj = 'Hi this is a sample string, a very sample string'

Now let’s use filter() to remove or filter all occurrences of characters ‘s’ and ‘a’ from the above string i.e.

filteredChars = ''.join((filter(lambda x: x not in ['a', 's'], strObj)))

print('Filtered Characters  : ', filteredChars)

Output:

Filtered Characters  :  Hi thi i  mple tring,  very mple tring

filter() basically returned a list of characters from above string by filtered all occurrences of ‘s’ & ‘a’. Then by using join() we joined the filtered list of characters to a single string.

Filter an array in Python using filter()

Suppose we have two array i.e.

array1 = [1,3,4,5,21,33,45,66,77,88,99,5,3,32,55,66,77,22,3,4,5]

array2 = [5,3,66]

Now we want to filter the contents in array1 i.e. by removing numbers from array1 which are common in array1 and array2. For example new array should be,

[1, 4, 21, 33, 45, 77, 88, 99, 32, 55, 77, 22, 4]

Now let’s see how to do that using filter() and lambda function

filteredArray = list(filter(lambda x : x not in array2, array1))

print('Filtered Array  : ', filteredArray)

Output:

Filtered Array  :  [1, 4, 21, 33, 45, 77, 88, 99, 32, 55, 77, 22, 4]

It basically filtered out the elements from array1 which were present in array2.

Complete example is as follows,

'''
Check if given string's length is 2
'''
def isOfLengthFour(strObj):
    if len(strObj) == 2:
        return True
    else:
        return False

def main():
    # List of string
    listOfStr = ['hi', 'this' , 'is', 'a', 'very', 'simple', 'string' , 'for', 'us']

    print('Original List : ', listOfStr)

    print('*** Filter list using filter() and a function ***')

    filteredList = list(filter(isOfLengthFour , listOfStr))

    print('Filtered List : ', filteredList)

    print('*** Filter list using filter() and a Lambda Function ***')

    filteredList = list(filter(lambda x : len(x) == 2 , listOfStr))

    print('Filtered List : ', filteredList)

    print('*** Filter characters from a string using filter() ***')

    strObj = 'Hi this is a sample string, a very sample string'

    filteredChars = ''.join((filter(lambda x: x not in ['a', 's'], strObj)))

    print('Filtered Characters  : ', filteredChars)

    print('*** Filter an array in Python using filter() ***')

    array1 = [1,3,4,5,21,33,45,66,77,88,99,5,3,32,55,66,77,22,3,4,5]

    array2 = [5,3,66]

    filteredArray = list(filter(lambda x : x not in array2, array1))

    print('Filtered Array  : ', filteredArray)


if __name__ == '__main__':
    main()

Output:

Original List :  ['hi', 'this', 'is', 'a', 'very', 'simple', 'string', 'for', 'us']
*** Filter list using filter() and a function ***
Filtered List :  ['hi', 'is', 'us']
*** Filter list using filter() and a Lambda Function ***
Filtered List :  ['hi', 'is', 'us']
*** Filter characters from a string using filter() ***
Filtered Characters  :  Hi thi i  mple tring,  very mple tring
*** Filter an array in Python using filter() ***
Filtered Array  :  [1, 4, 21, 33, 45, 77, 88, 99, 32, 55, 77, 22, 4]

How do you filter text in a list in Python?

Python has a built-in function called filter() that allows you to filter a list (or a tuple) in a more beautiful way. The filter() function iterates over the elements of the list and applies the fn() function to each element. It returns an iterator for the elements where the fn() returns True .

Can you use filter on a string?

You can't use filter() on a string as it is an Array.

How do you filter strings?

In Java 8 and above, use chars() or codePoints() method of String class to get an IntStream of char values from the given sequence. Then call the filter() method of Stream for restricting the char values to match the given predicate.

How do you filter a specific character in Python?

Using 'str. replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.