PHP program to get most repeated word in a file

By IncludeHelp Last updated : January 27, 2024

Problem statement

Given a text file, write a PHP program to get the most repeated word in a file.

Getting most repeated word in a file

To get the most repeated word in a file, first read the file and get the content of the file using the file_get_contents() function, tokenize the content into words using the str_word_count() function, count the occurrences of each word using the array_count_values() function, and then get the most repeated word by using the loop.

PHP program to get most repeated word in a file

There is a file with some content, this PHP script will find and print the most repeated word.

<?php

function find_most_repeated_word($filename)
{
    // Read the file and store the content in a variable
    $file_data = file_get_contents($filename);

    // The below regex will remove the punctuations
    // and convert the content in lowercase
    $file_data = strtolower(preg_replace("/[^\p{L}\p{N}\s]/u", "", $file_data));

    // Tokenize the file's data into words
    // store it into a variable (array-like)
    $words_arr = str_word_count($file_data, 1);

    // Now, count the occurrences of each word from the array
    // that contains the words after tokenizing
    $word_occurrences_arr = array_count_values($words_arr);

    // Variable to store the most repeated word
    $most_repeated_word = "";
    $max = 0;

    foreach ($word_occurrences_arr as $word => $count) {
        if ($count > $max) {
            $most_repeated_word = $word;
            $max = $count;
        }
    }

    // Return the result
    return $most_repeated_word;
}

// Main code
// Take a file
$filename = "file.txt";

// Call the function, get the most repeated word
$frequent = find_most_repeated_word($filename);

// Check condition and print
if (!empty($frequent)) {
    echo "Most repeated word is: $frequent";
} else {
    echo "No words found.\n";
}

?>

Output

File's content:

Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,
when an unknown printer took a galley of type and scrambled it to make a type specimen book.
It has survived not only five centuries, but also the leap into electronic typesetting, 
remaining essentially unchanged. It was popularised in the 1960s with the release
of Letraset sheets containing Lorem Ipsum passages, 
and more recently with desktop publishing software like 
Aldus PageMaker including versions of Lorem Ipsum.

The output of the above program is:

Most repeated word is: the

More PHP File Handling Programs »

Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.