Certain words in the English language occur more often than others. Your objective is to write a program to found out which are the most commonly occuring words in the English language.
Your program must read a document text file called doc.txt and record the frequency of every word found in the text file. You can assume that each line in doc.txt will never exceed 100 characters, and that none of the words will exceed 30 characters. Assume doc.txt always exist. The file doc.txt may contain multiple lines, and some of these lines may be blank. Suppose doc.txt contains:
Why Play Bridge? Why not? It is easy to learn! It is fun! Through duplicate play, opportunities abound to meet new people, travel to exciting cities, and experience the psychic rewards of being recognized as a winner! The ABA, through its local chapters and clubs, offers many levels of instruction and you may learn at your own pace. Your instructor will introduce you to one or more of the many easy-to-read books on bridge, some of which may be available at your local library. And, since bridge players love to play, you can always find a more experienced player to help you. Therefore, even if you do not play cards now, you can learn to play and enjoy bridge. If you are an experienced card player, you already know how much fun card playing can be. So, consider joining the world of competitive bridge through a membership in the ABA. Chances are that at least one of our over 5000 members lives in your city or town. He or she may be young or more experienced, a college student, a retiree, a government employee or official, a business person, or the teenage son or daughter of a bridge player. Our diverse membership adds to the interest of duplicate bridge. Membership and game fees are nominal.
Your program must read in doc.txt, convert all the words to all lowercase letters, count the frequency of every word that appears in the document, and write the output into the text file freq.txt. Assume that the list of delimiters is given by ",.?! :;()-" (do not forget the space). Overwrite the output file if it already exists.
The format of the output file freq.txt is:
The first ten lines of the output file freq.txt (based on the example document.txt given) is shown as follows:
to 9 a 8 of 8 you 8 bridge 7 or 7 the 7 and 6 play 5 your 4
This is because the word to occurs nine times, the words a, of, and you occur eight times, and so on. Notice that since a, of, and you occur the same number of times, they are sorted into alphabetical order.
The full contents of the above output file can be found in the file freq_correct.txt in your directory.
A sample doc.txt file has been placed in your directory by the pesetup program.
We will test your program with other doc.txt files.
All the best!
Some useful UNIX commands (in case you forgot what you did in Lab 0):