CS1101C Lab 3 (Even Week)

Can You Read This?

The deadline for this lab question is Wednesday 23 March 2005, 23:59:59 hours.

The name of your C program file must be called scramble.c, files with any other name will not be marked.

Background

Aricodcng to a reeracsh at Cagimdrbe Ursviintey, it need not mttear in waht oredr the letters in a word are, the olny iapronmtt thing is that the frist and last lteetr be at the right pacle. The rset can be a toatl mess and you can still read it whiuott a pbelorm. This is baucese the hmaun mind neevr redas every leettr by iestlf, but the wrod as a wlohe. Azniamg huh?

Most of you will be able to make sense of the above even if the words have been terribly misspelled. This illustrates the phenomenal power of the human mind. Our task for this lab is simply to test the validity of the above hypothesis by generating scrambled text from the original text and determining if all of them are indeed readable.

Word

The following are definitions concerning words:

  1. A word is defined as a continuous sequence of one or more letters.
  2. Words are separated by one or more non-letters (digits, spaces, newlines, punctuation, etc.).
  3. A word that is separated by a hyphen or apostrophe is considered as two separate words. Examples:
  4. Assume that all letters in between the first and last letters of the word are always in lowercase.
  5. Assume that a word has between one to twenty letters inclusive; no word has more than twenty letters.

Scrambling

You will be given an input file text1.txt containing words in proper spelling. You are to write a program to generate scrambled text and print it to the screen. Each word is read from the input file and all the letters (except for the first and the last letters) of each word are scrambled. Each word is scrambled independently. For example, the first word “According” can be scrambled to form the word “Aniocdcrg” or “Ardccinog”. Scrambling does not mean that all the letters must be out of position; in the first scrambled word, the letters “o” and “d” are in their original positions, while in the second scrambled word, all the middle letters are out of position.

A word can be scrambled by repeatedly generating a pair of random numbers denoting the position of letters in which a swap is to be made. The limits for the pair of random numbers must be carefully chosen to fulfill the following criteria.

Consider the word “what”. It has four letters, of which we scramble only the middle two letters. There are thus two (2! = 2*1 = 2) possible permutations of the scrambled word:

  1. what
  2. waht

Your scrambling algorithm must ensure that each of the two permutations above have an equally likely chance of occurring. Note that permutation 1 is also a possible “scrambling” of the original word.

Now consider the word “right”. It has five letters, of which we scramble only the middle three letters. There are thus six (3! = 3*2*1 = 6) possible permutations of the scrambled word:

  1. right
  2. rihgt
  3. rhigt
  4. rhgit
  5. rgiht
  6. rghit

Your scrambling algorithm must ensure that each of the six permutations above have an equally likely chance of occurring. Note that permutation 1 is also a possible “scrambling” of the original word.

In general, for a word containing n letters where n > 3, there are (n-2)! possible permutations of the scrambled word, each of which must have an equally likely chance of occurring. If a word has three letters or less, then there is no need to scramble the word.

Differences

After scrambling a word, we are interested in the number of letters that are not in their original positions compared to the original word. We call this the “number of differences”. Consider the following examples which show the original word and the scrambled word:

  1. right” and “right”: all the middle letters are in their original positions, so number of differences = 0.
  2. right” and “rhgit”: “h” and “i” are not in their original positions, so number of differences = 2.
  3. right” and “rhigt”: all three middle letters are not in their original positions, so number of differences = 3.

Note the following:

  1. A word which has three letters or less always has zero differences.
  2. A word in which all the middle letters are identical always has zero differences. Example: “need”.
  3. For any word, it is impossible to have number of differences = 1. (Why?)

Statistics

In addition to printing the scrambled text, we want our program to print the following statistics:

  1. Total number of words in the text file.
  2. Total number of letters in the text file.
  3. Average word length.
  4. Number of words of length 1, 2, ..., 20.
  5. Total number of differences created by the scrambling.
  6. Average number of differences per word.
  7. Number of words with 0, 2, 3, 4, ..., 20 differences.

Sample Run

Assume that the text input file is always called text1.txt. In addition, another sample text input file called text2.txt is provided for your testing. You may copy both files into your current directory by typing the following cp (copy) commands at the Unix prompt. Highlight both lines, right-click on the mouse and choose “Copy”, then switching to your Secure Shell Window, right-click on the mouse and choose “Paste”. Press Enter to execute the commands.

cp ~cs1101cl/www/lab3/evenweek/text1.txt .
cp ~cs1101cl/www/lab3/evenweek/text2.txt .

The following shows three sample runs. Your program must work with other text1.txt files that contain different text. Note that $ indicates the Unix prompt.

Your result may differ since random numbers generated are system dependent. You are reminded to follow the sample output exactly, else marks will be deducted.

Sample Run 1

The following shows a sample run for the sample input file text1.txt with seed 7070.
$ gcc -Wall scramble.c -o scramble
$ ./scramble
Enter seed:
7070

----- Start of scrambled text -----
Aricodcng to a reeracsh at Cagimdrbe Ursviintey, it need not mttear in
waht oredr the letters in a word are, the olny iapronmtt thing is that
the frist and last lteetr be at the right pacle.  The rset can be a
toatl mess and you can still read it whiuott a pbelorm.  This is baucese
the hmaun mind neevr redas every leettr by iestlf, but the wrod as a
wlohe.  Azniamg huh?
----- End of scrambled text -----

Total number of words: 72
Total number of letters: 291
Average word length: 4.041667
Number of words of length 1: 5
Number of words of length 2: 13
Number of words of length 3: 16
Number of words of length 4: 12
Number of words of length 5: 12
Number of words of length 6: 4
Number of words of length 7: 5
Number of words of length 8: 1
Number of words of length 9: 3
Number of words of length 10: 1
Number of words of length 11: 0
Number of words of length 12: 0
Number of words of length 13: 0
Number of words of length 14: 0
Number of words of length 15: 0
Number of words of length 16: 0
Number of words of length 17: 0
Number of words of length 18: 0
Number of words of length 19: 0
Number of words of length 20: 0

Total number of differences: 83
Average number of differences per word: 1.152778
Number of words with 0 differences: 47
Number of words with 2 differences: 12
Number of words with 3 differences: 3
Number of words with 4 differences: 4
Number of words with 5 differences: 3
Number of words with 6 differences: 2
Number of words with 7 differences: 1
Number of words with 8 differences: 0
Number of words with 9 differences: 0
Number of words with 10 differences: 0
Number of words with 11 differences: 0
Number of words with 12 differences: 0
Number of words with 13 differences: 0
Number of words with 14 differences: 0
Number of words with 15 differences: 0
Number of words with 16 differences: 0
Number of words with 17 differences: 0
Number of words with 18 differences: 0
Number of words with 19 differences: 0
Number of words with 20 differences: 0
$

Things To Notice

  1. The scrambled text is printed in exactly the same format as the original text. All non-letters (digits, spaces, newlines, punctuation, etc.) are preserved. Only the middle letters of a word are scrambled.
  2. We indicate clearly the start and end of the scrambled text by printing appropriate text.
  3. Average word length = (1*5 + 2*13 + 3*16 + ... + 20*0)/72 = 291/72 = 4.041667.
  4. Average number of differences per word = (0*47 + 2*12 + 3*3 + ... + 20*0)/72 = 83/72 = 1.152778.
  5. Do not assume that the number of words is always 72 since we may test your program with different text files.

Sample Run 2

The following shows a sample run for the sample input file text1.txt with seed 7071.
$ gcc -Wall scramble.c -o scramble
$ ./scramble
Enter scramble seed:
7071

----- Start of scrambled text -----
Anccrodig to a rcreeash at Cbdgirmae Uvesirtniy, it need not mttear in
waht oerdr the ltteers in a word are, the olny inorpmatt tinhg is taht
the fsirt and lsat leettr be at the rghit palce.  The rset can be a
ttoal mses and you can stlil read it whoitut a pelobrm.  This is bcaseue
the huamn mind neevr rdeas eevry ltteer by itslef, but the wrod as a
whloe.  Amiazng huh?
----- End of scrambled text -----

Total number of words: 72
Total number of letters: 291
Average word length: 4.041667
Number of words of length 1: 5
Number of words of length 2: 13
Number of words of length 3: 16
Number of words of length 4: 12
Number of words of length 5: 12
Number of words of length 6: 4
Number of words of length 7: 5
Number of words of length 8: 1
Number of words of length 9: 3
Number of words of length 10: 1
Number of words of length 11: 0
Number of words of length 12: 0
Number of words of length 13: 0
Number of words of length 14: 0
Number of words of length 15: 0
Number of words of length 16: 0
Number of words of length 17: 0
Number of words of length 18: 0
Number of words of length 19: 0
Number of words of length 20: 0

Total number of differences: 101
Average number of differences per word: 1.402778
Number of words with 0 differences: 39
Number of words with 2 differences: 18
Number of words with 3 differences: 7
Number of words with 4 differences: 1
Number of words with 5 differences: 4
Number of words with 6 differences: 1
Number of words with 7 differences: 2
Number of words with 8 differences: 0
Number of words with 9 differences: 0
Number of words with 10 differences: 0
Number of words with 11 differences: 0
Number of words with 12 differences: 0
Number of words with 13 differences: 0
Number of words with 14 differences: 0
Number of words with 15 differences: 0
Number of words with 16 differences: 0
Number of words with 17 differences: 0
Number of words with 18 differences: 0
Number of words with 19 differences: 0
Number of words with 20 differences: 0
$

Sample Run 3

The following shows a sample run for the sample input file text2.txt with seed 3644.
Enter scramble seed:
3644

----- Start of scrambled text -----
http://www.rpibgrde.net/8t69.htm

Borad 1

Nroth Deals   S 10 9 3
None Vul      H 9 7 5 4
              D K 10 5
              C 9 8 6

S Q                       S J 7 6 5 4
H A K Q J 10 3            H -
D 8 7 6 4                 D A Q J 9 3
C 10 2                    C A K 5
              S A K 8 2
              H 8 6 2
              D 2
              C Q J 7 4 3

Off to a red-hot sartt! Esat-Wset can mkae a salm in ehteir red siut,
but it's hldray badiblde. A common aotiucn suolhd be:

Wset Ntroh East Sutoh
     Pass  1S   Psas
2H   Psas  3D   Psas
4H   All Psas

Some Sotuhs may snurcgoe a 2C orlvecal, wichh suohld not cngahe the
East-West biddnig. Wset mghit sense a salm atefr Esat's 3D bid, but
the fuor-crad fit is less apnpelaig whoiutt an honor. The key to slam is
Esat hivang aolsmt all his pnotis odtiuse of spaeds -- hard to
vliiuszae, ulness the Bigdre Lwas aoellwd an oeipnng of "one no
sdeaps."

In hraets, 12 tkrics are eaisly made by tiknag two dnoaimd fseinses.
Afetr a culb lead, it is tmnitpeg to try for all 13: Ruff the trihd
club, draw trmpus, fsniese dmoinads once and hope tehy silpt. Oops; the
pirce of geerd is tehn olny 11 tirkcs. Been trehe; done taht.

In didmnoas, the same 12 tircks are allbiavae wtih cueafrl play (ruff
two blcak crads, win trhee htraes, fisense tprums twice) but it is
dufutobl many wlil play terhe. Even if West sehettrcs to slam, it wulod
be qtuie a posiiton to cohsoe ddmoians oevr hrtaes at mnphtoicats.
----- End of scrambled text -----

Total number of words: 242
Total number of letters: 904
Average word length: 3.735537
Number of words of length 1: 48
Number of words of length 2: 29
Number of words of length 3: 37
Number of words of length 4: 53
Number of words of length 5: 24
Number of words of length 6: 25
Number of words of length 7: 10
Number of words of length 8: 11
Number of words of length 9: 4
Number of words of length 10: 0
Number of words of length 11: 1
Number of words of length 12: 0
Number of words of length 13: 0
Number of words of length 14: 0
Number of words of length 15: 0
Number of words of length 16: 0
Number of words of length 17: 0
Number of words of length 18: 0
Number of words of length 19: 0
Number of words of length 20: 0

Total number of differences: 296
Average number of differences per word: 1.223140
Number of words with 0 differences: 147
Number of words with 2 differences: 48
Number of words with 3 differences: 14
Number of words with 4 differences: 16
Number of words with 5 differences: 11
Number of words with 6 differences: 4
Number of words with 7 differences: 1
Number of words with 8 differences: 1
Number of words with 9 differences: 0
Number of words with 10 differences: 0
Number of words with 11 differences: 0
Number of words with 12 differences: 0
Number of words with 13 differences: 0
Number of words with 14 differences: 0
Number of words with 15 differences: 0
Number of words with 16 differences: 0
Number of words with 17 differences: 0
Number of words with 18 differences: 0
Number of words with 19 differences: 0
Number of words with 20 differences: 0
$

Tip

  1. You may wish to refer to Tutorial 8 Question 34.


This document, index.html, has been accessed 16 times since 25-Jun-24 11:57:13 +08. This is the 1st time it has been accessed today.

A total of 10 different hosts have accessed this document in the last 445 days; your host, nsrp-source.comp.nus.edu.sg, has accessed it 5 times.

If you're interested, complete statistics for this document are also available, including breakdowns by top-level domain, host name, and date.