My friend Ryan, who is also a math tutor at UW, and I are working our way through several math resources including Larry Wasserman’s famous *All of Statistics.* Here is a math problem:

Suppose we have random variables all distributed uniformly, . We want to find the expected value of where .

First, we need to find the Probability Density Function (PDF) and we do so in the usual way, by first finding the Cumulative Distribution Function (CDF) and taking the derivative:

We want to be able to get this step:

But must show independence and we are not give that our ‘s are in fact independent. Thanks to Ryan for helping me see that by definition:

However, note that in this case is a unit with area equal to . In other words . Our equation then simplifies:

where here is a generic random variable, by symmetry (all ‘s are identically distributed). This is the same answer we would’ve gotten if we made the iid assumption earlier and obtained . Originally, I had made this assumption by way of wishful thinking — and a bit of intuition, it does seem that uniformly distributed random variables would be independent — but Ryan corrected my mistake.

Now that we have we can find the PDF.

by the chain rule.

Recall that the PDF of a is for . And by extension the CDF for a is:

.

Plugging these values into our equation above (and noting we have not meaning we simply replace the we just derived with as we would in any normal function) we have:

Finally, we are ready to take our expectation:

Let’s take a moment and make sure this answer seems reasonable. First, note that if we have the trival case of (which is simply ; in this case) we get . This makes sense! If then is just a uniform random variable on the interval to . And the expected value of that random variable is which is exactly what we got.

Also notice that . This also makes sense! If we take the maximum of 1 or 2 or 3 ‘s each randomly drawn from the interval 0 to 1, we would expect the largest of them to be a bit above , the expected value for a single uniform random variable, but we wouldn’t expect to get values that are extremely close to 1 like .9. However, if we took the maximum of, say, 100 ‘s we would expect that at least one of them is going to be pretty close to 1 (and since we’re choosing the maximum that’s the one we would select). This doesn’t guarantee our math is correct (although it is), but it does give a gut check that what we derived is reasonable.

We can further verify our answer by simulation in R, for example by choosing (thanks to the fantastic Markup.su syntax highlighter):

################################################################ # R Simulation ################################################################ X = 5 Y = replicate(100000, max(runif(X))) empirical = mean(Y) theoretical = (X/(X+1)) #5/6 = 8.33 in this case percent_diff = abs((empirical-theoretical)/empirical)*100 # print to console empirical theoretical percent_diff

We can see from our results that our theoretical and empirical results differ by just 0.05% after 100,000 runs of our simulation.

> empirical [1] 0.8337853 > theoretical [1] 0.8333333 > percent_diff [1] 0.0542087