Birthday paradox
|
The birthday paradox states that if there are 23 people in a room then there is a slightly more than 50:50 chance that at least two of them will have the same birthday. This means that a higher probability applies to a typical school class size of thirty, where the 'paradox' is often cited. For 60 or more people, the probability is greater than 99%. This is not a paradox in the sense of leading to a logical contradiction; it is a paradox in the sense that it is a mathematical truth that contradicts common intuition. Most people estimate that the chance is much lower than 50:50. Calculating this probability (and related ones) is the birthday problem. The mathematics behind it has been used to devise a well-known cryptographical attack named the birthday attack.
Contents |
Understanding the paradox
The key to understanding the birthday paradox is to realize that there are many possible pairs of people whose birthdays could match. Specifically, among 23 people, there are 23 × 22/2 = 253 pairs, each of which being a potential candidate for a match. Looked at it this way, it doesn't seem that unlikely that one of these 253 pairs yields a match.
To emphasize the point, consider a different scenario: if you enter a room with 22 people, the chance that somebody there has the same birthday as you is not 50:50, but much lower. This is because now there are only 22 possible pairs that could yield a match. The actual birthday problem is asking if any of the 23 people have a matching birthday with any of the others.
Estimating the probability
To compute the approximate probability that in a room of n people, at least two have the same birthday, we disregard distribution variations, such as leap years, twins, seasonal or weekday variations, and assume that 365 possible birthdays are equally likely. Real-life birthday distribution are not uniform since not all dates are equally likely.
The trick is to first calculate the probability p that the n birthdays are different. Assuming n ≤ 365, this probability is given by
- <math>\frac{364}{365} \cdot \frac{363}{365} \cdot \frac{362}{365} \cdots \frac{365-n+1}{365}<math>
Birthday_paradox.png
because the second person cannot have the same birthday as the first (364/365), the third cannot have the same birthday as the first two (363/365), etc. Using factorial notation, this expression can also be written as
- <math>{ 365! \over 365^n (365-n)! }<math>
The probability p(n) that at least two persons of the n have the same birthday is then given by
- <math> p(n) = 1 - { 365! \over 365^n (365-n)! }<math>
for n ≤ 365, and 1 for n > 365.
For n = 23, one obtains a probability of about 0.507. Some of these probabilities, numerically computed using the formula above, are shown here:
n | p(n) |
---|---|
10 | 12% |
20 | 41% |
30 | 70% |
50 | 97% |
100 | 99.99996% |
200 | 99.9999999999999999999999999998% |
300 | 1 − (7 × 10−73) |
350 | 1 − (3 × 10−131) |
≥366 | 100% |
050329-birthday1.png
Note that neither of the two people is chosen in advance: by way of contrast, the probability q(n) that someone in a room of n other people has the same birthday as a particular person (for example, you) is given by
- <math> q(n) = 1- \left( \frac{364}{365} \right)^n <math>
which for n = 22 gives only about 0.059, or slightly better than 1 chance in 17. For a greater than 50:50 chance that one person in a roomful of n people has the same birthday as you, n would need to be at least 253 . Note that this number is significantly higher than 365/2 = 182.5: the reason is that there are likely some birthday matches among the people in the room.
A mathematical (as opposed to numerical) view
In his autobiography, Paul Halmos deplored the fact that the birthday paradox is often presented in terms of mere numerical computation rather than conceptual mathematics. The argument given below is adapted from Halmos' argument, which contained a small error. The product
- <math>\prod_{k=1}^{n-1}\left(1-{k \over 365}\right)<math>
equals 1 − p(n), and we are therefore interested in the first n such that the product is smaller than 1/2. We have
- <math>\sqrt[n-1]{\prod_{k=1}^{n-1}\left(1-{k \over 365}\right)}
<{1 \over n-1}\sum_{k=1}^{n-1}\left(1-{k \over 365}\right)<math>
because of the inequality of arithmetic and geometric means. Therefore:
- <math>\prod_{k=1}^{n-1}\left(1-{k \over 365}\right)
<\left({1 \over n-1}\sum_{k=1}^{n-1}\left(1-{k \over 365}\right)\right)^{n-1}<math>
- <math>=\left(1-{n \over 730}\right)^{n-1}<\left(e^{-n/730}\right)^{n-1}=e^{-(n^2-n)/730}.
<math>
(Here we first used the fact that the sum over all integers from 1 to n − 1 equals n(n − 1)/2, and then the inequality 1 − x < e−x.)
The value of the last expression is less than 1/2 if and only if
- <math>n^2-n>730\log_e 2\cong 505.997\dots<math>
where "loge" means the natural logarithm. This falls just barely short of 506, and as luck would have it, 506 is one of the values of n2 − n, attained when n = 23.
Of this argument, Halmos wrote:
- "The reasoning is based on important tools that all students of mathematics should have ready access to. The birthday problem used to be a splendid illustration of the advantages of pure thought over mechanical manipulation; the inequalities can be obtained in a minute or two, whereas the multiplications would take much longer, and be much more subject to error, whether the instrument is a pencil or an old-fashioned desk computer. What calculators do not yield is understanding, or mathematical facility, or a solid basis for more advanced, generalized theories."
However, Halmos' derivation only shows that at most 23 people are needed to ensure a birthday match with even chance; since we don't know how sharp the given inequalities are, the argument leaves open the possibility that n = 22 could also work.
Generalization and approximation
The birthday paradox may be generalized: there are n persons; each one of them randomly chooses a number between 1 and fixed N (where N may be, e.g., 365 or any other integer > 0).
What is the probability p(n) that at least two persons choose the same number?
An approximation to the answer is given by
- <math>p(n)\sim 1-1/\exp(n^2/(2N)).\,<math>
Results for N = 365
Missing image
050329-birthday2.png
image:050329-birthday2.png
Reverse problem
An alternate question may be:
- For fixed probability p ...
- ... find the greatest n(p) for which the probability p(n) is smaller than the given p, or
- ... find the smallest n(p) for which the probability p(n) is greater than the given p.
An approximation for this is given by:
- <math>n(p)\sim \left(2N\ln\left({1 \over 1-p}\right)\right)^{1/2}.<math>
Example
approximation | computation for N := 365 | |||||
p | n generalized | n for N := 365 | n↓ | p(n↓) | n↑ | p(n↑) |
0.01 | 0.14178 √N | 2.70864 | 2 | 0.00274 | 3 | 0.00820 |
0.05 | 0.32029 √N | 6.11916 | 6 | 0.04046 | 7 | 0.05624 |
0.1 | 0.45904 √N | 8.77002 | 8 | 0.07434 | 9 | 0.09462 |
0.2 | 0.66805 √N | 12.76302 | 12 | 0.16702 | 13 | 0.19441 |
0.3 | 0.84460 √N | 16.13607 | 16 | 0.28360 | 17 | 0.31501 |
0.5 | 1.17741 √N | 22.49439 | 22 | 0.47570 | 23 | 0.50730 |
0.7 | 1.55176 √N | 29.64625 | 29 | 0.68097 | 30 | 0.70632 |
0.8 | 1.79412 √N | 34.27666 | 34 | 0.79532 | 35 | 0.81438 |
0.9 | 2.14597 √N | 40.99862 | 40 | 0.89123 | 41 | 0.90315 |
0.95 | 2.44775 √N | 46.76414 | 46 | 0.94825 | 47 | 0.95477 |
0.99 | 3.03485 √N | 57.98081 | 57 | 0.99012 | 58 | 0.99166 |
Note: some values are coloured showing that the approximation is not always exact.
Empirical test
The birthday paradox can be simulated empirically using computer code.
days := 365 numPeople := 1 prob := 0.0 while prob < 0.5 { numPeople := numPeople + 1 prob := 1 - ((1-prob) * (days-(numPeople-1)) / days) print "Number of people: " + numPeople print "Prob. of same birthday: " + prob }
Applications
The birthday paradox in its more generic sense applies to hash functions: the number of N-bit hashes that can be generated before probably getting a collision is not 2N, but rather only 2N/2. This is exploited by birthday attacks on cryptographic hash functions.
The theory behind the birthday problem was used in [Schnabel 1938] under the name of capture-recapture statistics to estimate the size of fish population in lakes.
Unequal probabilities
As mentioned above, real-world birthday data are not equally distributed. The birthday problem for such non-constant birthday probabilities was tackled in [Klamkin 1967].
Near matches
Another generalization is to ask how many people are needed in order to have a better than 50% chance that two people have a birthday within one day of each other, or within two, three, etc., days of each other. This is a more difficult problem and requires use of the inclusion-exclusion principle. The results (assuming an equal distribution for birthdays) are just as surprising as in the standard birthday problem:
within k days | #people required |
---|---|
0 | 23 |
1 | 14 |
2 | 11 |
3 | 9 |
4 | 8 |
5 | 7 |
7 | 6 |
Thus in a family with six members, it is more likely than not that two members will have a birthday within a week of each other.
References
- Zoe Emily Schnabel: "The estimation of the total fish population of a lake", American Mathematical Monthly 45 (1938), pages 348-352
- M. Klamkin and D. Newman: "Extensions of the birthday surprise", Journal of Combinatorial Theory 3 (1967), pages 279-282.
External links
- http://www.efgh.com/math/birthday.htm
- http://www.teamten.com/lawrence/puzzles/birthday_paradox.html
- http://science.howstuffworks.com/question261.htm
- http://mathworld.wolfram.com/BirthdayProblem.html
- http://www.atriumtech.com/pongskorn/birthdayparadox/birthdayparadox.htmde:Geburtstagsproblem
es:Paradoja del cumpleaños it:Paradosso del compleanno nl:Verjaardagenparadox ja:誕生日のパラドックス pl:Paradoks dnia urodzin