CS1101C Lab 3 (Odd Week)

Birthday Paradox

The deadline for this lab question is Wednesday 16 March 2005, 23:59:59 hours.

The name of your C program file must be called paradox.c, files with any other name will not be marked.

Background

The birthday paradox is a classic elementary probability problem stated as follows:
"In a group of N randomly selected people, how large must N be so that there is more than 50% chance that at least two of them have the same birthday?"

Since the chance of any two person having the same birthday is remote, many of us would expect N to be rather large. However, it turns out that this is not the case, and hence the paradox. Before we proceed further, try to make a rough guess of the value of N and see if you are anywhere near the answer.

To make the problem simpler, we shall assume that nobody is born on the 29th of February. Your task is to write a program to simulate the birthday paradox and find the empirical value of N. To make the problem more interesting and challenging, let us generalize our simulation to the following scenario:

"In a group of N randomly selected people, how large must N be so that the probability of K or more person having the same birthday is at least P?"

Implementation

You will be provided with an input file consisting of different simulation scenarios. A typical line from the input file consist of six numbers separated by blank spaces as shown below:
1234 2 1000 5 3 0.75
The explanation for each number is given as follows:

1234Simulation ID
2Number of matches to be checked
1000Number of trials
5Starting group size
3Group size increment
0.75Terminating probability

We illustrate how a simulation is performed for the above. Beginning with group size (i.e. N) equal to 5, we initialize an array with the random birthdays of five people. We compare every pairwise combination of people (since the number of matches K is 2) in the group of five and check the existence of any two person having the same birthday. This will be repeated one thousand times with different groups of five people. If the average occurrence of two person having the same birthday in these one thousand trials exceeds 0.75 (i.e. the probability P), the simulation terminates. Otherwise, N is incremented by three, and the entire simulation of one thousand trials is repeated with randomized groups of eight people. Kindly note the following points.

As another example, consider the following line of input.

7588 3 500 20 5 0.5

Since the number of matches is three, we need to enumerate possible combinations of three persons, and check if all of them have the same birthday. We do this starting with a group size of 20 for 500 trials, and repeating the 500 trials with five more persons until the probability exceeds 0.5.

Assume that the text input file is always called paradoxin1.txt. You may copy the file paradoxin1.txt into your current directory by typing the following cp (copy) command at the Unix prompt. Highlight the entire line, right-click on the mouse and choose “Copy”, then switching to your Secure Shell Window, right-click on the mouse and choose “Paste”. Press Enter to execute the command.

cp ~cs1101cl/www/lab3/oddweek/paradoxin1.txt .

Sample Run

The following shows a sample run for the input file paradoxin1.txt. Note that $ indicates the Unix prompt.

Each output line within the [...] shows the group size, as well as the probability of birthday matches for this particular group size. Your result may differ since random numbers generated are system dependent. You are reminded to follow the sample outcome exactly, else marks will be deducted.

$ gcc -Wall paradox.c -o paradox
$ ./paradox

Simulation #1234: 1000 trials for 2 matches
[   5, 0.031000]
[   8, 0.067000]
[  11, 0.148000]
[  14, 0.214000]
[  17, 0.301000]
[  20, 0.425000]
[  23, 0.494000]
[  26, 0.595000]
[  29, 0.656000]
[  32, 0.750000]
[  35, 0.820000]

Simulation #8049: 1000 trials for 2 matches
[   2, 0.001000]
[   3, 0.011000]
[   4, 0.019000]
[   5, 0.024000]
[   6, 0.041000]
[   7, 0.053000]
[   8, 0.087000]
[   9, 0.093000]
[  10, 0.132000]
[  11, 0.148000]
[  12, 0.167000]
[  13, 0.201000]
[  14, 0.216000]
[  15, 0.255000]
[  16, 0.284000]
[  17, 0.321000]
[  18, 0.342000]
[  19, 0.389000]
[  20, 0.430000]
[  21, 0.467000]
[  22, 0.465000]
[  23, 0.518000]

Simulation #7588: 500 trials for 3 matches
[  20, 0.012000]
[  25, 0.012000]
[  30, 0.026000]
[  35, 0.056000]
[  40, 0.066000]
[  45, 0.096000]
[  50, 0.150000]
[  55, 0.142000]
[  60, 0.222000]
[  65, 0.272000]
[  70, 0.284000]
[  75, 0.374000]
[  80, 0.420000]
[  85, 0.516000]

Simulation #6047: 1000 trials for 3 matches
[  80, 0.386000]
[  81, 0.409000]
[  82, 0.459000]
[  83, 0.484000]
[  84, 0.456000]
[  85, 0.466000]
[  86, 0.476000]
[  87, 0.473000]
[  88, 0.497000]
[  89, 0.515000]

Simulation #3465: 500 trials for 4 matches
[ 100, 0.050000]
[ 105, 0.080000]
[ 110, 0.080000]
[ 115, 0.098000]
[ 120, 0.124000]
[ 125, 0.140000]
[ 130, 0.204000]
[ 135, 0.200000]
[ 140, 0.200000]
[ 145, 0.246000]
[ 150, 0.288000]
[ 155, 0.284000]
[ 160, 0.322000]
[ 165, 0.364000]
[ 170, 0.388000]
[ 175, 0.410000]
[ 180, 0.506000]

Simulation #4250: 1000 trials for 4 matches
[ 180, 0.488000]
[ 181, 0.478000]
[ 182, 0.490000]
[ 183, 0.486000]
[ 184, 0.504000]

Simulation #4317: 500 trials for 5 matches
[ 200, 0.084000]
[ 210, 0.118000]
[ 220, 0.154000]
[ 230, 0.142000]
[ 240, 0.186000]
[ 250, 0.224000]
[ 260, 0.250000]
[ 270, 0.324000]
[ 280, 0.318000]
[ 290, 0.370000]
[ 300, 0.456000]
[ 310, 0.492000]
[ 320, 0.522000]

Simulation #5016: 500 trials for 5 matches
[ 300, 0.478000]
[ 302, 0.448000]
[ 304, 0.478000]
[ 306, 0.512000]

Simulation #5207: 1000 trials for 5 matches
[ 310, 0.476000]
[ 311, 0.502000]
$

Tips


This document, index.html, has been accessed 46 times since 25-Jun-24 11:57:13 +08. This is the 1st time it has been accessed today.

A total of 31 different hosts have accessed this document in the last 445 days; your host, nsrp-source.comp.nus.edu.sg, has accessed it 10 times.

If you're interested, complete statistics for this document are also available, including breakdowns by top-level domain, host name, and date.