View Discussion
Improve Article
Save Article
View Discussion
Improve Article
Save Article
Python provides inbuilt functions for creating, writing, and reading files. Two types of files can be handled in python, normal text files, and binary files [written in binary language,0s and 1s].
- Text files: In this type of file, Each line of text is terminated with a special character called EOL [End of Line], which is the new line character [‘\n’] in python by default.
- Binary files: In this type of file, there is no terminator for a line, and the data is stored after converting it into machine-understandable binary language.
Here we are operating on the .txt file in Python. Through this program, we will find the most repeated word in a file.
Approach:
- We will take the content of the file as input.
- We will save each word in a list after removing spaces and punctuation from the input string.
- Find the frequency of each word.
- Print the word which has a maximum frequency.
Input File:
Below is the implementation of the above approach:
Python3
file
=
open
[
"gfg.txt"
,
"r"
]
frequent_word
=
""
frequency
=
0
words
=
[]
for
line
in
file
:
line_word
=
line.lower[].replace[
','
,'
'].replace['
.
','
'].split[
" "
];
for
w
in
line_word:
words.append[w];
for
i
in
range
[
0
,
len
[words]]:
count
=
1
;
for
j
in
range
[i
+
1
,
len
[words]]:
if
[words[i]
=
=
words[j]]:
count
=
count
+
1
;
if
[count > frequency]:
frequency
=
count;
frequent_word
=
words[i];
print
[
"Most repeated word: "
+
frequent_word]
print
[
"Frequency: "
+
str
[frequency]]
file
.close[];
Output:
Most repeated word: well Frequency: 3
Explanation
In this program, we need to find the most repeated word present in given text file. This can be done by opening a file in read mode using file pointer. Read the file line by line. Split a line at a time and store in an array. Iterate through the array and find the frequency of each word and compare the frequency with maxcount. If frequency is greater than maxcount then store the frequency in maxcount and corresponding word that in variable word. The content of data.txt file used in the program is shown below.
A computer program is a collection of instructions that performs specific task when executed by a computer.
Computer requires programs to function.
Computer program is usually written by a computer programmer in programming language.
A collection of computer programs, libraries, and related data are referred to as software.
Computer programs may be categorized along functional lines, such as application software and system software.
Algorithm
- Variable maxCount will store the count of most repeated word.
- Open a file in read mode using file pointer.
- Read a line from file. Convert each line into lowercase and remove the punctuation marks.
- Split the line into words and store it in an array.
- Use two loops to iterate through the array. Outer loop will select a word which needs to be count. Inner loop will match the selected word with rest of the array. If match found, increment count by 1.
- If count is greater than maxCount then, store value of count in maxCount and corresponding word in variable word.
- At the end, maxCount will hold the maximum count and variable word will hold most repeated word.
Solution
Python
Output:
Most repeated word: computer
C
Output:
Most repeated word: computer
JAVA
Output:
Most repeated word: computer
C#
Output:
Most repeated word: computer
PHP
Output:
Most repeated word: computer
Next Topic#