Finding the Expected Value of the Maximum of n Random Variables

My friend Ryan, who is also a math tutor at UW, and I are working our way through several math resources including Larry Wasserman’s famous All of Statistics. Here is a math problem:

Suppose we have n random variables X_1, ...X_n all distributed uniformly, X_i \sim Uniform(0,1) . We want to find the expected value of \mathbb{E}[Y_n] where Y_n = \max\{X_1,..., X_n\} .

First, we need to find the Probability Density Function (PDF) f_Y(y) and we do so in the usual way, by first finding the Cumulative Distribution Function (CDF) and taking the derivative:

F_Y(y) = P(Y < y)
F_Y(y) = P(\max\{X_1, ..., X_n\} < y)
F_Y(y) = P(X_1,..., X_n < y)

We want to be able to get this step:
F_Y(y) = P(X_1 < y)P(X_2 < y) \cdots P(X_n < y)

But must show independence and we are not give that our X_i ‘s are in fact independent. Thanks to Ryan for helping me see that by definition:

F_Y(y) = \underset{A}{\idotsint} f_{X_1, \dots, X_n}(y) \,dx_1 \dots dx_n

However, note that in this case f_{X_1, \dots, X_n}(y) is a unit n-cube with area A equal to 1 . In other words f_{X_1, \dots, X_n}(y) = 1 . Our equation then simplifies:

F_Y(y) = \idotsint 1 dx_1 \dots dx_n
F_Y(y) = \int dx_1 \dots \int dx_n = [F_X(y)]^n where X here is a generic random variable, by symmetry (all X_i ‘s are identically distributed). This is the same answer we would’ve gotten if we made the iid assumption earlier and obtained F_Y(y) = P(X_1 < y)P(X_2 < y) \cdots P(X_n < y). Originally, I had made this assumption by way of wishful thinking — and a bit of intuition, it does seem that n uniformly distributed random variables would be independent — but Ryan corrected my mistake.

Now that we have F_Y(y) we can find f_Y(y) the PDF.

f_Y(y) = \frac{d}{dy}F_Y(y) = \frac{d}{dy}[F_X(y)]^n
f_Y(y) = n[F_X(y)]^{n-1}f_X(y) by the chain rule.

Recall that the PDF f_X(x) of a X \sim Uniform(0,1) is \frac{1}{b-a} = \frac{1}{1-0} = 1 for x \in [0,1] . And by extension the CDF  F_X(x) for a X \sim Uniform(0,1) is:
\int_a^x f(t)dt = \int_a^x \frac{1}{b-a}dt = t\frac{1}{b-a} \bigm|_a^x = \frac{x}{b-a} - \frac{a}{b-a} = \frac{x-a}{b-a} = \frac{x-0}{1-0} = x .

Plugging these values into our equation above (and noting we have F_X(y) not F_X(x) meaning we simply replace the x we just derived with y as we would in any normal function) we have:

f_Y(y) = ny^{n-1} \cdot 1

Finally, we are ready to take our expectation:

\mathbb{E}[Y] = \int_{y\in A}yf_Y(y)dy = \int_0^1 yny^{n-1}dy = n\int_0^1 y^{n}dy = n\bigg[\frac{1}{n+1}y^{n+1}\bigg]_0^1 = \frac{n}{n+1}

Let’s take a moment and make sure this answer seems reasonable. First, note that if we have the trival case of Y = \max\{X_1\} (which is simply Y = X_1 ; n = 1 in this case) we get \frac{1}{1+1} = \frac{1}{2} . This makes sense! If Y = X_1 then Y is just a uniform random variable on the interval 0 to 1 . And the expected value of that random variable is \frac{1}{2} which is exactly what we got.

Also notice that \lim_{n\to\infty} \frac{n}{n+1} = 1 . This also makes sense! If we take the maximum of 1 or 2 or 3 X_i ‘s each randomly drawn from the interval 0 to 1, we would expect the largest of them to be a bit above \frac{1}{2} , the expected value for a single uniform random variable, but we wouldn’t expect to get values that are extremely close to 1 like .9. However, if we took the maximum of, say, 100 X_i ‘s we would expect that at least one of them is going to be pretty close to 1 (and since we’re choosing the maximum that’s the one we would select). This doesn’t guarantee our math is correct (although it is), but it does give a gut check that what we derived is reasonable.

We can further verify our answer by simulation in R, for example by choosing n = 5 (thanks to the fantastic Markup.su syntax highlighter):

################################################################
# R Simulation
################################################################
X = 5
Y = replicate(100000, max(runif(X)))
empirical = mean(Y)
theoretical = (X/(X+1)) #5/6 = 8.33 in this case
percent_diff = abs((empirical-theoretical)/empirical)*100

# print to console
empirical
theoretical
percent_diff

We can see from our results that our theoretical and empirical results differ by just 0.05% after 100,000 runs of our simulation.

> empirical
[1] 0.8337853
> theoretical
[1] 0.8333333
> percent_diff
[1] 0.0542087
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s