How do i find the most repeated words in a text file python?

View Discussion

Improve Article

Save Article

  • Read
  • Discuss
  • View Discussion

    Improve Article

    Save Article

    Python provides inbuilt functions for creating, writing, and reading files. Two types of files can be handled in python, normal text files, and binary files [written in binary language,0s and 1s].

    • Text files: In this type of file, Each line of text is terminated with a special character called EOL [End of Line], which is the new line character [‘\n’] in python by default.
    • Binary files: In this type of file, there is no terminator for a line, and the data is stored after converting it into machine-understandable binary language.

    Here we are operating on the .txt file in Python. Through this program, we will find the most repeated word in a file.

    Approach:

    • We will take the content of the file as input.
    • We will save each word in a list after removing spaces and punctuation from the input string.
    • Find the frequency of each word.
    • Print the word which has a maximum frequency.

    Input File:

    Below is the implementation of the above approach:

    Python3

    file = open["gfg.txt","r"]

    frequent_word = ""

    frequency = 0 

    words = []

    for line in file:

        line_word = line.lower[].replace[',',''].replace['.',''].split[" "]; 

        for w in line_word: 

            words.append[w]; 

    for i in range[0, len[words]]: 

        count = 1

        for j in range[i+1, len[words]]: 

            if[words[i] == words[j]]: 

                count = count + 1

        if[count > frequency]: 

            frequency = count; 

            frequent_word = words[i]; 

    print["Most repeated word: " + frequent_word]

    print["Frequency: " + str[frequency]]

    file.close[];

    Output:

    Most repeated word: well
    Frequency: 3

    Explanation

    In this program, we need to find the most repeated word present in given text file. This can be done by opening a file in read mode using file pointer. Read the file line by line. Split a line at a time and store in an array. Iterate through the array and find the frequency of each word and compare the frequency with maxcount. If frequency is greater than maxcount then store the frequency in maxcount and corresponding word that in variable word. The content of data.txt file used in the program is shown below.

    A computer program is a collection of instructions that performs specific task when executed by a computer.

    Computer requires programs to function.

    Computer program is usually written by a computer programmer in programming language.

    A collection of computer programs, libraries, and related data are referred to as software.

    Computer programs may be categorized along functional lines, such as application software and system software.

    Algorithm

    1. Variable maxCount will store the count of most repeated word.
    2. Open a file in read mode using file pointer.
    3. Read a line from file. Convert each line into lowercase and remove the punctuation marks.
    4. Split the line into words and store it in an array.
    5. Use two loops to iterate through the array. Outer loop will select a word which needs to be count. Inner loop will match the selected word with rest of the array. If match found, increment count by 1.
    6. If count is greater than maxCount then, store value of count in maxCount and corresponding word in variable word.
    7. At the end, maxCount will hold the maximum count and variable word will hold most repeated word.

    Solution

    Python

    Output:

     Most repeated word: computer
    

    C

    Output:

    Most repeated word: computer
    

    JAVA

    Output:

    Most repeated word: computer
    

    C#

    Output:

    Most repeated word: computer
    

    PHP

    Output:

    Most repeated word: computer
    

    Next Topic#

    How do I find the most frequent words in a python file?

    Approach :.
    Import Counter class from collections module..
    Split the string into list using split[], it will return the lists of words..
    Now pass the list to the instance of Counter class..
    The function 'most-common[]' inside Counter will return the list of most frequent words from list and its count..

    How do you count occurrences of a word in a text file Python?

    Using the count[] Function The "standard" way [no external libraries] to get the count of word occurrences in a list is by using the list object's count[] function. The count[] method is a built-in function that takes an element as its only argument and returns the number of times that element appears in the list.

    How do I find the most repeated words in a string?

    Program:.
    import java. io. BufferedReader;.
    import java. io. FileReader;.
    import java. util. ArrayList;.
    public class MostRepeatedWord {.
    public static void main[String[] args] throws Exception {.
    String line, word = "";.
    int count = 0, maxCount = 0;.
    ArrayList words = new ArrayList[];.

    What is used for finding the frequency of words in some given text sample?

    Using FreqDist[] The natural language tool kit provides the FreqDist function which shows the number of words in the string as well as the number of distinct words. Applying the most_common[] gives us the frequency of each word.

    Chủ Đề