Assignment 2 (100 Points) - Due W October 05, 11:59PM

Announcements and Clarification

Word Ranking Program

Create a wordrank program capable of determining and reporting the top 10 most frequently used words within an input text file.

Commandline Arguments

Your program must be capable of utilizing a commandline argument to specify the input file.

wordrank inputFile


Your program must ensure the user has correctly provided the required commandline argument and display a usage statement if the provided arguments are incorrect.

Input Text File

For this assignment, the input text file will only consist of the uppercase and lowercase characters ('a' to 'z', 'A' to 'Z'), commas(','), and periods('.'). Each word within the text file will be separated by at least one whitespace character (' ', '\t', '\n'). One comma or period may appear at the end of each word, but is not considered part of the word.

Note: Text files on Windows based computers use a carriage return ('\r') and newline ('\n') at the end of each line. On Unix machines (such as the ece3 server), only the newline is used. For testing your program, you should utilize Unix formatted text files without the carriage return. If you edit your files using a Windows based program (such as Notepad), you may want to familiarize yourself with the dos2unix command available on the ece3 server.

List Data Structure

For keeping track of the individual words and how many times they are used within the input textile, you must implement a singly-linked or doubly-linked list data structure. The struct and typedef definitions that should be utilized within this assignment are defined below. Note that you only need to implement one of the two following options.

  • Singly-Linked List
    typedef struct ListElmt_ {
        char *word;
        int word_count;
        struct ListElmt_ *next;
    } ListElmt;

    typedef struct List_ {
        int size;
        ListElmt *head;
        ListElmt *tail;
    } List;
     
  • Doubly-Linked List
    typedef struct ListElmt_ {
        char *word;
        int word_count;
        struct ListElmt_ *next;
        struct ListElmt_ *prev;
    } ListElmt;

    typedef struct List_ {
        int size;
        ListElmt *head;
        ListElmt *tail;
    } List;

The functionality specific to each of these structures should be implemented within their own set of C source and header files. Specifically, the functionality for ListElmt should be implemented within listelmt.h and listelmt.c files, and the functionality for List should be implemented within list.h and list.c files.

Word Ranking

As each word is read from the input text file, your program should keep track of the number of times each word is utilized. While the input file may contain both uppercase and lowercase characters, the identification of unique words is case insensitive. For example, "Party", "party", and "PARty" are all considered the same word.

Once, the input file has been completely read, your program should output the top ten most frequently used words ordered from 10th to 1st most frequently used. Each of the top ten words should be output with the rank, the word displayed in all lowercase characters, and the number of times that word was used in parentheses. The following provides an example output:

Ten Most Frequently Used Words:

10 binary (101)
9  programming (102)
8  typso (312)
7  typoes (312)
6  typos (400)
5  futurama (664)
4  bender (1000)
3  awesome (1001)
2  space (2001)
1  the (5001)