My friend Ryan, who is also a math tutor at UW, and I are working our way through several math resources including Larry Wasserman’s famous All of Statistics. Here is a math problem:
Suppose we have random variables
all distributed uniformly,
. We want to find the expected value of
where
.
First, we need to find the Probability Density Function (PDF) and we do so in the usual way, by first finding the Cumulative Distribution Function (CDF) and taking the derivative:
We want to be able to get this step:
But must show independence and we are not give that our ‘s are in fact independent. Thanks to Ryan for helping me see that by definition:
However, note that in this case is a unit
with area
equal to
. In other words
. Our equation then simplifies:
where
here is a generic random variable, by symmetry (all
‘s are identically distributed). This is the same answer we would’ve gotten if we made the iid assumption earlier and obtained
. Originally, I had made this assumption by way of wishful thinking — and a bit of intuition, it does seem that
uniformly distributed random variables would be independent — but Ryan corrected my mistake.
Now that we have we can find
the PDF.
by the chain rule.
Recall that the PDF of a
is
for
. And by extension the CDF
for a
is:
.
Plugging these values into our equation above (and noting we have not
meaning we simply replace the
we just derived with
as we would in any normal function) we have:
Finally, we are ready to take our expectation:
Let’s take a moment and make sure this answer seems reasonable. First, note that if we have the trival case of (which is simply
;
in this case) we get
. This makes sense! If
then
is just a uniform random variable on the interval
to
. And the expected value of that random variable is
which is exactly what we got.
Also notice that . This also makes sense! If we take the maximum of 1 or 2 or 3
‘s each randomly drawn from the interval 0 to 1, we would expect the largest of them to be a bit above
, the expected value for a single uniform random variable, but we wouldn’t expect to get values that are extremely close to 1 like .9. However, if we took the maximum of, say, 100
‘s we would expect that at least one of them is going to be pretty close to 1 (and since we’re choosing the maximum that’s the one we would select). This doesn’t guarantee our math is correct (although it is), but it does give a gut check that what we derived is reasonable.
We can further verify our answer by simulation in R, for example by choosing (thanks to the fantastic Markup.su syntax highlighter):
################################################################ # R Simulation ################################################################ X = 5 Y = replicate(100000, max(runif(X))) empirical = mean(Y) theoretical = (X/(X+1)) #5/6 = 8.33 in this case percent_diff = abs((empirical-theoretical)/empirical)*100 # print to console empirical theoretical percent_diff
We can see from our results that our theoretical and empirical results differ by just 0.05% after 100,000 runs of our simulation.
> empirical [1] 0.8337853 > theoretical [1] 0.8333333 > percent_diff [1] 0.0542087