N-th Derivative of Special Function

MIT has a cute little problem in their calculus problem set, which you can see here. I like this problem because it takes something complicated and shows that it actually reduces down to something very simple.

To start this problem off let’s suppose we have a function that we can write as the product of two other functions, $y = u(x)v(x)$ . There are many such functions. For example, $y(x) = x^2sin(x)$ .

To find the derivative we would need to use the power rule. To get warmed up let’s take two derivatives of $y = u(x)v(x)$ .

$y' = u'v + uv'$ (that’s just the product rule definition).

Applying the rule again we get four terms, two from the first product and two from the second:

$y'' = u''v +u'v' + u'v' + uv''$

We can combine terms to get:

$y'' = u''v + 2u'v' + uv''$

This process can continue and indeed the problem set gives us Libniz’s formula for the n-th derivative, which is:

$y^{(n)} = u^{(n)}v + \binom{n}{1}u^{(n-1)}v^{(1)} + \binom{n}{2}u^{(n-2)}v^{(2)} + \cdots + uv^{(n)}$

This formula may look daunting at first, but it’s really pretty simple. Note that we are not raising our functions to powers, $u^{(n-1)}$ means the $(n - 1)$ -st derivative. Let’s apply Libniz’s formula to the second derivative of $y(x)$ above.

$y^{(2)} = u^{(2)}v + \binom{2}{1}u^{(2-1)}v^{(1)} + uv^{(2)}$

$y^{(2)} = u^{(2)}v + \frac{2!}{1!(2-1)!}u^{(2-1)}v^{(1)} + uv^{(2)}$

$y^{(2)} = u^{(2)}v + 2u^{(1)}v^{(1)} + uv^{(2)}$

This is exactly what we got above, the only difference is the notation.

The problem now asks us to calculate $y^{(p+q)}$ if $y = x^p(x+a)^q$ .

Before we begin let’s make a few observations that will help us along the way. First, in this example our $u(x)$ is $x^p$ and $v(x)$ is $(x+a)^q$ .

Next, let me remind you about two well known facts of $x^p$ . As an example, I’ll take $p$ to be $3$ . Then we just have the following, where I will adopt the same notation used in Libniz’s formula.

$y = x^3$

$y ^{(1)}= 3 \cdot x^2$

$y ^{(2)}= 3 \cdot 2 \cdot x^1$

$y ^{(3)}= 3 \cdot 2 \cdot 1$

$y ^{(4)}= 0$

So what have we learned? Well, first we know that all derivatives higher than order $p$ are 0. In our example, $p$ was 3, and by the fourth derivative we got 0. Of course the derivative of 0 is 0, so from here on out (for all derivatives of order greater than 3), we’re going to get 0.

The second thing to notice is that the p-th derivative is actually $p!$ . Again, in this case $p$ was 3 and for $y ^{(3)}$ our answer was $3 \cdot 2 \cdot 1$ , otherwise known as $3!$ .

These two rules also hold for $(x+a)^q$ . True, we have to use the chain rule, but of course the derivative of $x+a$ is just 1.

Given what we’ve learned let’s move forward writing out the first few terms of Leibniz’s formula with our given $y(x)$ .

$y^{(p+q)} =x^{p^{(p+q)}}(x+a)^q + \binom{p+q}{1}x^{p^{(p+q-1)}}(x+a)^{q^{(1)}} + \binom{p+q}{2}x^{p^{(p+q-2)}}(x+a)^{q^{(2)}} + \cdots$

So far all of these terms are 0 since the derivatives of $x^p$ are of higher order than $p$ . We can ask when they will stop being 0. This occurs on the $(q+1)$ -st term when the derivative is $p + q - q$ , which is just $p$ . So we know that’s the first non-zero term. Let’s start with that term and write out the next few terms.

$y^{(p+q)} =\binom{p+q}{q}x^{p^{(p+q-q)}}(x+a)^{q^{(q)}} + \binom{p+q}{q+1}x^{p^{(p+q-(q+1))}}(x+a)^{q^{(q+1)}} +\binom{p+q}{q+2}x^{p^{(p+q-(q+2))}}(x+a)^{q^{(q+2)}} + \cdots$

Notice what happens here. For derivatives of order higher than $q$ the $(x+a)^q$ term is 0. So we are left with only a single term! That term is:

$y^{(p+q)} =\binom{p+q}{q}x^{p^{(p+q-q)}}(x+a)^{q^{(q)}}$

But this can be simplified using our rules from above. We can replace $x^{p^{(p+q-q)}}$ with $p!$ since this is the p-th derivative of $x^p$ and $(x+a)^{q^{(q)}}$ with $q!$ since it’s the q-th derivative of $(x+a)^q$ .

$y^{(p+q)} =\binom{p+q}{q}p!q!$

And now we can expand the binomial term. Recall that $\binom{n}{k} = \frac{n!}{k!(n-k)!}$ .

$y^{(p+q)} = \frac{(p+q)!}{q!(p+q-q)!}p!q!$

$y^{(p+q)} = \frac{(p+q)!}{q!p!}p!q!$

$y^{(p+q)} =(p+q)!$

See what I mean! The derivatives of the product of two fairly general functions just turns into a few multiplications.

Distribution Convergence

Let’s do a problem from Chapter 5 of All of Statistics.

Suppose $X_1, \dots X_n \sim \text{Uniform(0,1)}$ . Let $Y_n = \bar{X_n}^2$ . Find the limiting distribution of $Y_n$ .

Note that we have $Y_n = \bar{X_n}\bar{X_n}$

Recall from Theorem 5.5(e) that if $X_n \rightsquigarrow X$ and $Y_n \rightsquigarrow c$ then $X_n Y_n \rightsquigarrow cX$ .

So the question becomes does $X_n \rightsquigarrow c$ so that we can use this theorem? The answer is yes. Recall that from Theorem 5.4(b) $X_n \overset{P}{\longrightarrow} X$ implies that $X_n \rightsquigarrow X$ . So if we can show that we converge to a constant in probability we know that we converge to the constant in distribution. Let’s show that $\bar{X}_n \overset{P}{\longrightarrow} c$ . That’s easy. The law of large numbers tells us that the sample average converges in probability to the expectation. In other words $\bar{X}_n \overset{P}{\longrightarrow} \mathbb{E}[X]$ . Since we are told that $X_i$ is i.i.d from a Uniform(0,1) we know the expectation is $\mathbb{E}[X] = .5$ .

Putting it all together we have that:

$Y_n = \bar{X_n}^2$
$Y_n = \bar{X_n}\bar{X_n}$
$Y_n \rightsquigarrow \mathbb{E}[X]\mathbb{E}[X]$ (through the argument above)
$Y_n \rightsquigarrow (.5)(.5)$
$Y_n \rightsquigarrow .25$

We can also show this by simulation in R, which produces this chart:

y_convergence

Indeed we also get the answer 0.25. Here is the R code used to produce the chart above:

# Load plotting libraries
library(ggplot2)
library(ggthemes)

# Create Y = g(x_n)
g = function(n) {
  return(mean(runif(n))^2)
}

# Define variables
n = 1:10000
Y = sapply(n, g)

# Plot
set.seed(10)
df = data.frame(n,Y)
ggplot(df, aes(n,Y)) +
  geom_line(color='#3498DB') +
  theme_fivethirtyeight() +
  ggtitle('Distribution Convergence of Y as n Increases')

A Quick Little Proof

Prove that $mn$ is odd if and only if both $m$ and $n$ are odd. We can write this in mathamatical notation as $mn \in \mathbb O \Longleftrightarrow m,n \in \mathbb O$

Let’s first prove that if both $m$ and $n$ are odd then their product is odd. An odd number is of the form two times and integer plus one. So let $m = 2r + 1$ and $2q + 1$ . Then $mn = (2r + 1)(2q + 1) = 4rq + 2r + 2q + 1 = 2(2rq + r + q) + 1$ . Which is odd, by definition, it’s two time an integer — in this case the integer is $2(2rq + r + q)$ — plus one.

Now we have to prove things in the other direction. If $mn$ is odd then so are both $m$ and $n$ . Let’s do a proof by contradiction. We must assume that either $m$ or $n$ is even and then prove that $mn$ is even. Assume $m$ is even. Then $mn = (2r)(2q + 1) = 4rq + 2r = 2(2rq + r)$ , which is even. The same holds if n is even. We’ve completed our proof by contradiction and so have proven our original statement.

A Quick Little Proof

Let’s prove that $\sqrt{3}$ is irrational. An irrational number is one that cannot be written in the form $\frac{a}{b}$ , where $a$ and $b$ are both integers; in other words there is no repeating pattern in it’s decimal.

First, let’s prove that if $n^2$ is a multiple of $3$ then so is $n$ . Assume that $n$ is not a multiple of $3$ . Then $n$ can be written as: $n = 3m + \ell$ where $\ell \in \{1,2\}$ . Then $n^2 = 9m^2 + 6m\ell + \ell^2$ . We can factor out a $3$ , so $n^2 = 3(3m^2 + 2m\ell) + \ell^2$ . This implies $n^2$ is not a multiple of $3$ since it’s an integer times $3$ plus a number that isn’t a multiple of $3$ (that’s the $\ell$ part). We have proved the contrapositive which implies our original proposition ( $P \Rightarrow Q$ is the same as $\neg Q \Rightarrow \neg P$ ).

Now let’s move on to the main proof. Assume, by way of contradiction that the square root of $3$ is rational. If the square root of $3$ is rational then we can write $\sqrt{3} = \frac{a}{b}$ with $a$ and $b$ not sharing any common factors. We can do some math: $3 = \frac{a^2}{b^2}$ which means $3b^2 = a^2$ . But this means $a^2$ is a multiple of $3$ since it’s an integer times $3$ . We just proved that if $n^2$ is a multiple of $3$ then so is $n$ . This means that $a$ is a multiple of 3. But if $a$ is a multiple of $3$ it can be written as $3m$ . Let’s plug this in to get $3b^2 = (3m)^2$ , simplifying we get that $3b^2 = 9m^2$ . Let’s divide through by $3$ to get $b^2 = 3m^2$ . This means that $b^2$ is a multiple of $3$ since it’s $3$ times an integer. But again, if $b^2$ is a multiple of $3$ then so is $b$ . We started off by assuming that $a$ and $b$ had no common factors and we just showed that they share a common factor of $3$ . We’ve derived a contradiction to our proposition that $\sqrt{3}$ is rational, therefore it must be irrational.

Stock Price Simulation

Larry Wasserman presents an interesting simulation in Problem 11, Chapter 3 of All of Statistics. The problem asks you to simulate the stock market by modeling a simple random walk. With probability 0.5 the price of the stock goes down $1 and with probability 0.5 the stock prices goes up $1. You may recognize this as the same setup in our two simple random walk examples modeling a particle on the real line.

This simulation is interesting because Wasserman notes that even with an equal probability of the stock moving up and down we’re likely to see patterns in the data. I ran some simulations that modeled the change in stock price over the course of 1,000 days and grabbed a couple of graphs to illustrate this point. For example, look at the graph below. It sure looks like this is a stock that’s tanking! However, it’s generated with the random walk I just described.

stock_plot1

Even stocks that generally hover around the origin seem to have noticeable dips and peaks that look like patterns to the human eye even though they are not.

stock_plot2

If we run the simulation multiple times it’s easy to see that if you consider any single stock it’s not so unlikely to get large variations in price (the light purple lines). However, when you consider the average price of all stocks, there is very little change over time as we would expect (the dark purple line).

stock_plot3

Here is the R code to calculate the random walk and generate the last plot:

################################################################
# R Simulation
# James McCammon
# 2/20/2017
################################################################
# This script goes through the simulation of changes in stock
# price data.

# Load plotting libraries
library(ggplot2)
library(ggthemes)
library(reshape2)

#
# Simulate stock price data with random walk
#
    n = 1000 # Walk n steps
    p = .5 # Probability of moving left
    trials = 100 # Num times to repeate sim
    # Run simulation
    rand_walk = replicate(trials, cumsum(sample(c(-1,1), size=n, replace=TRUE, prob=c(p,1-p))))

#
# Prepare data for plotting
#
    all_walks = melt(rand_walk)
    avg_walk = cbind.data.frame(
      'x' = seq(from=1, to=n, by=1),
      'y' = apply(rand_walk, 1, mean)
    )

#
# Plot data
#
    ggplot() + 
      geom_line(data=all_walks, aes(x=Var1, y=value, group=Var2), color='#BCADDC', alpha=.5) +
      geom_line(data=avg_walk, aes(x=x, y=y), size = 1.3, color='#937EBF') +
      theme_fivethirtyeight() +
      theme(axis.title = element_text()) +
      xlab('Days') +
      ylab('Change in Stock Price (in $)') +
      ggtitle("Simulated Stock Prices")

Simple Random Walk – Method 2

Suppose we consider a simple random walk. A particle starts at initial position $z_i$ and moves one unit to the left with probability $p$ and moves one unit to the right with probability $1-p$ . What is the expected position $\mathbb{E}[X_n]$ of the particle after $n$ steps?

In a previous post I presented a method to solve this by iteration. I’d like to present a different method to solve the problem here.

Recall we have:

$X_i = \bigg\{ \begin{tabular}{cc} p & x = -1 \\ 1-p & x = 1 \end{tabular}$

We give it some thought we can see that after $n$ steps we’ve simply added up our random variables to get our final location. We could define a new random variable:

$Y = \sum_{i=1}^{n} X_i$

So now all we need to do is take the expected value of $Y$ . We can use rules of expected value to bring the expectation inside the summation and take the expectation of each individual $X_i$ as we normally would.

$\mathbb{E}[Y] = \mathbb{E}\big[\sum_{i=1}^{n} X_i \big]$
$\mathbb{E}[Y] = \sum_{i=1}^{n} \mathbb{E}[X_i]$
$\mathbb{E}[Y] = \sum_{i=1}^{n} -1 \cdot p + 1 \cdot (1-p)$
$\mathbb{E}[Y] = \sum_{i=1}^{n} 1 - 2p$
$\mathbb{E}[Y] = n(1 - 2p)$

This is the same answer we got via the iteration method. This also makes our R simulation easier. Recall we has this setup:

################################################################
# R Simulation
################################################################
# Generate random walk
rand_walk = function (n, p, z) {
  walk = sample(c(-1,1), size=n, replace=TRUE, prob=c(p,1-p))
  for (i in 1:n) {
    z = z + walk[i]
  }
  return(z)
}

n = 1000 # Walk n steps
p = .3 # Probability of moving left
z = 0 # Set initial position to 0
trials = 10000 # Num times to repeate sim
# Run simulation
X = replicate(trials, rand_walk(n,p,z))

Our code now becomes:

################################################################
# R Simulation
################################################################
n = 1000 # Walk n steps
p = .3 # Probability of moving left
trials = 10000 # Num times to repeate sim
# Run simulation
X = replicate(trials, sum(sample(c(-1,1), size=n, replace=TRUE, prob=c(p,1-p))))

# Calculate empirical and theoretical results
empirical = mean(X)
theoretical = n*(1-2*p)
percent_diff = abs((empirical-theoretical)/empirical)*100

# print to console
empirical
theoretical
percent_diff

Notice that for our random walk we can replace this function:

rand_walk = function (n, p, z) {
  walk = sample(c(-1,1), size=n, replace=TRUE, prob=c(p,1-p))
  for (i in 1:n) {
    z = z + walk[i]
  }
  return(z)
}
X = replicate(trials, rand_walk(n,p,z))

With the simpler:

X = replicate(trials, sum(sample(c(-1,1), size=n, replace=TRUE, prob=c(p,1-p))))

Additionally, in this second method we don’t need to specify an initial position since we assume from the beginning it’s zero. Of course both methods complete the same task, but they use different conceptual models. The first uses an iteration model, while the latter completes the “iteration” in a single step.

Distribution Mean Convergence

Suppose we want to simulate $\frac{1}{n}\sum_{i=1}^{n} X_i$ for $X_1, X_2, \dots , X_n \sim N(0, 1)$ , $n = 1, \dots , 10,000$ . Suppose we want to do the same for the Cauchy distribution.

In other words, we want to draw several random variables from a normal distribution and then take the average. As $n$ increases we should get closer to the mean of the distribution we’re drawing from, 0 in this case.

The R code below will do this. It produces this graph:
ch3-pr9

Notice that while the Normal distribution converges quickly the Cauchy never does. This is because the Cauchy distribution has fat tails and so extreme observations are common.

################################################################
# R Simulation
# James McCammon
# 2/20/2017
################################################################
# This script goes through the simulation of plotting both normal
# and Cauchy means for random vectors of size 1 to 10,000. It
# also demonstrates function creation and plotting.
# Highlight desired section and click "Run."

# Set working directory as needed
setwd("~/R Projects/All of Statistics")

###
# Calculate means and plot using base R
###

    # Set seed for reproducibility
    set.seed(271)
    
    #
    # Version 1: Simple
    #
    n = seq(from=1, to=10000, by=1)
    y=sapply(n, FUN=function(x) sum(rnorm(x))/x)
    plot(n, y, type="l")
    
    #
    # Version 2: Define a function
    #
    sim = function(x, FUN) {
      sapply(x, FUN=function(x) sum(FUN(x))/x)  
    }
    
    # Use function to plot normal means
    x = seq(from=1, to=10000, by=1)
    y1 = sim(x, rnorm)
    plot(x, y1, type="l")
    
    # Use function to plot Cauchy means
    y2 = sim(x, rcauchy)
    plot(x, y2, type="l")
    
    #
    # Version 3: More complex function
    #
    
    # This function has:
    # (1) error checking
    # (2) extra argument options
    # (3) the ability to input any distribution R supports
    sim = function(x, FUN, ...) {
      if(!is.character(FUN)) stop('Please enter distribution as string.')
      dists = c('rnorm',
                'rbeta',
                'rbinom',
                'rcauchy',
                'rchisq',
                'rexp',
                'rf',
                'rgamma',
                'rgeom',
                'rhyper',
                'rlnorm',
                'rmultinom',
                'rnbinom',
                'rpois',
                'rt',
                'runif',
                'rweibull')
      if(is.na(match(FUN, dists))) stop(paste('Please enter a valid distribution from one of:', paste(dists, collapse=', ')))
      FUN = get(FUN)
      sapply(x, FUN=function(x) sum(FUN(x, ...))/x)  
    }
    
    # We have to define our function in string form.
    # This will throw error 1.
    test1 = sim(x, rnorm)
    
    # We have to input a distribution R supports.
    # This will throw error 2.
    test2 = sim(x, 'my_cool_function')
    
    # We can input additional arguments like the
    # mean, standard deviations, or other shape parameters.
    test3 = sim(x, 'rnorm', mean=10, sd=2)

####
# Using ggplot2 to make pretty graph
###

    # Load libraries
    library(ggplot2)
    library(ggthemes)
    library(gridExtra)
    
    png(filename='Ch3-Pr9.png', width=1200, height=600)
    
      df1 = cbind.data.frame(x, y1)
      p1 = ggplot(df1, aes(x=x, y=y1)) + 
        geom_line(size = 1, color='#937EBF') + 
        theme_fivethirtyeight() +
        ggtitle("Normal Means")
      
      df2 = cbind.data.frame(x, y2)
      p2 = ggplot(df2, aes(x=x, y=y2)) + 
        geom_line(size = 1, color='#EF4664') + 
        theme_fivethirtyeight() +
        ggtitle("Cauchy Means")
      
      # Save charts
      grid.arrange(p1,p2,nrow=2,ncol=1)
      
    dev.off()
    # Print charts to screen
    grid.arrange(p1,p2,nrow=2,ncol=1)

Simple Random Walk – Method 1

I will calculate the expected value using two different methods. The second method is simpler, but I’ll start using an iteration method.

Our PMF is:

$f_X(x) = \bigg\{ \begin{tabular}{cc} p & x = -1 \\ 1-p & x = 1 \end{tabular}$

Let’s set our initial position as:
$n=0: \quad \mathbb{E}[X_0] = z_i$

After one step our expected position is then:
$n=1: \quad \mathbb{E}[X_1] = (z_i - 1)p + (z_i + 1)(1 - p)$
$n=1: \quad \mathbb{E}[X_1] = z_{i}p - p + z_i + 1 - z_{i}p - p$
$n=1: \quad \mathbb{E}[X_1] = z_i + 1 - 2p$

Great, let’s try iterating one more to see what we get. Note that at $n=2$ our position is now the result from $n=1$ , $z_i + 1 - 2p$ .
$n=2: \quad \mathbb{E}[X_2] = (z_i + 1 - 2p - 1)p + (z_i + 1 - 2p + 1)(1 - p)$
$n=2: \quad \mathbb{E}[X_2] = z_{i}p - 2p^2 + Z_i - 2p + 2 - z_{i}p + 2p^2 - 2p$
$n=2: \quad \mathbb{E}[X_2] = z_i + 2(1 - 2p)$

If we keep iterating we will see that $\mathbb{E}[X_n] = z_i + n(1 - 2p)$ . But we can prove this formally through induction. We’ve already done our base case, so let’s now do the induction step. We will assume that $\mathbb{E}[X_n] = z_i + n(1 - 2p)$ is true and show that it is also true for $n + 1$ .

$\mathbb{E}[X_{n+1}] = (z_i + n(1 - 2p) - 1)p + (z_i + n(1 - 2p) + 1)(1 - p)$
$\mathbb{E}[X_{n+1}] = (z_i + n - 2pn - 1)p + (z_i + n - 2pn + 1)(1 - p)$
$\mathbb{E}[X_{n+1}] = z_{i}p + pn - 2p^{2}n - p + z_i + n - 2pn + 1 -z_{i}p - pn + 2p^{2}n - p$
$\mathbb{E}[X_{n+1}] = - p + z_i + n - 2pn + 1 - p$
$\mathbb{E}[X_{n+1}] = z_i + (n + 1)(1 - 2p)$

Thus our induction step holds and we have shown that $\mathbb{E}[X_n] = z_i + n(1 - 2p)$ .

Because we chose our initial starting position $z_i$ to be arbitrary, we might as well set it to 0 to obtain a final result of $\mathbb{E}[X_n] = n(1 - 2p)$ .

Let’s take a moment to think about this result and make sure it seems reasonable. Suppose $p = 0.5$ . This would mean we have an equal chance of moving left or moving right. Over the long run we would expect our final position to be exactly where we started. Plugging in $p = 0.5$ to our equation yields $n(1 - 2 \cdot 0.5) = n(1 - 1) = 0$ . Just as we expected! What if $p = 1$ ? This means we only move to the left. Plugging $p = 1$ into our equation yields $n(1 - 2 \cdot 1) = n(-1) = -n$ . This makes sense! If we can only move to the left then after $n$ steps we would expect to be $n$ steps left of our staring position (the origin as we chose it), the negative direction in our problem setup. We could also choose $p$ to be 0, meaning we only move to the right and we would get $n$ , again just what we would expect!

We can also run a simulation in R to verify our results:

################################################################
# R Simulation
################################################################
# Generate random walk
rand_walk = function (n, p, z) {
  walk = sample(c(-1,1), size=n, replace=TRUE, prob=c(p,1-p))
  for (i in 1:n) {
    z = z + walk[i]
  }
  return(z)
}

n = 1000 # Walk n steps
p = .3 # Probability of moving left
z = 0 # Set initial position to 0
trials = 10000 # Num times to repeate sim
# Run simulation
X = replicate(trials, rand_walk(n,p,z))

# Calculate empirical and theoretical results
empirical = mean(X)
theoretical = n*(1-2*p)
percent_diff = abs((empirical-theoretical)/empirical)*100

# print to console
empirical
theoretical
percent_diff

Printing to the console we see that after 10,000 trials of 1,000 steps each our empirical and theoretical results differ by just 0.046%.

> empirical
[1] 400.1842
> theoretical
[1] 400
> percent_diff
[1] 0.0460288

Expected Value of a Generic Positive Random Variable

Here is a math problem:

Suppose we are told that $P(X > 0) = 1$ and are given that $\lim_{n\to\infty} x\big[1-F(x)\big] = 0$ . Show that $\mathbb{E}[X] = \int_0^{\infty} P(X > x)dx$ .

Let’s start with the definition of expectation and use integration by parts.

$\mathbb{E}[X] = \int_{-\infty}^{\infty}xf_X(x)dx = \int_0^{\infty}xf_X(x)dx$ since we are given $x \in (0,\infty)$ .

Now using integration by parts, making the natural selection we have:

$u=F_X(x) \quad \quad v=x$
$u'=f_X(x) \quad \quad v'=dx$

Recall that $\int u'v = uv-\int uv'$ and plugging in our selections we get:

$\mathbb{E}[X] = \int_0^{\infty}xf_X(x)dx = xF_X(x)\bigm|_0^{\infty} - \int_0^{\infty} F_X(x)dx$

Let’s rewrite $\int_0^{\infty} F_X(x)dx$ as $\int_0^{\infty} \big(1 - P(X > x)\big)dx$ . This simplifies to $\int_0^{\infty} dx - \int_0^{\infty} P(X > x)dx$ . Let’s plug this back into our equation above:

$\mathbb{E}[X] = \int_0^{\infty}xf_X(x)dx = xF_X(x)\bigm|_0^{\infty} - \bigg(\int_0^{\infty} dx - \int_0^{\infty} P(X > x)dx \bigg)$

$\mathbb{E}[X] = xF_X(x)\bigm|_0^{\infty} - x\bigm|_0^{\infty} + \int_0^{\infty} P(X > x)dx$

$\mathbb{E}[X] = \big[xF_X(x)- x\big]_0^{\infty} + \int_0^{\infty} P(X > x)dx$

$\mathbb{E}[X] = \big[x(F_X(x)- 1)\big]_0^{\infty} + \int_0^{\infty} P(X > x)dx$

$\mathbb{E}[X] = \big[-x(1-F_X(x))\big]^{x=\infty} + \int_0^{\infty} P(X > x)dx$ since plugging in $0$ to the first half of our expression just yields $0$ . We can’t actually evaluate $x$ at infinity, instead we take the limit:

$\mathbb{E}[X] = -\lim_{x\to\infty} x\big[1-F(x)\big] + \int_0^{\infty} P(X > x)dx$ , but recall that we are told $\lim_{x\to\infty} x\big[1-F(x)\big] = 0$ and so we are simply left with:

$\mathbb{E}[X] = \int_0^{\infty} P(X > x)dx$ , our desired result.

Finding the Expected Value of the Maximum of n Random Variables

My friend Ryan, who is also a math tutor at UW, and I are working our way through several math resources including Larry Wasserman’s famous All of Statistics. Here is a math problem:

Suppose we have $n$ random variables $X_1, ...X_n$ all distributed uniformly, $X_i \sim Uniform(0,1)$ . We want to find the expected value of $\mathbb{E}[Y_n]$ where $Y_n = \max\{X_1,..., X_n\}$ .

First, we need to find the Probability Density Function (PDF) $f_Y(y)$ and we do so in the usual way, by first finding the Cumulative Distribution Function (CDF) and taking the derivative:

$F_Y(y) = P(Y < y)$
$F_Y(y) = P(\max\{X_1, ..., X_n\} < y)$
$F_Y(y) = P(X_1,..., X_n < y)$

We want to be able to get this step:
$F_Y(y) = P(X_1 < y)P(X_2 < y) \cdots P(X_n < y)$

But must show independence and we are not give that our $X_i$ ‘s are in fact independent. Thanks to Ryan for helping me see that by definition:

$F_Y(y) = \underset{A}{\idotsint} f_{X_1, \dots, X_n}(y) \,dx_1 \dots dx_n$

However, note that in this case $f_{X_1, \dots, X_n}(y)$ is a unit $n-cube$ with area $A$ equal to $1$ . In other words $f_{X_1, \dots, X_n}(y) = 1$ . Our equation then simplifies:

$F_Y(y) = \idotsint 1 dx_1 \dots dx_n$
$F_Y(y) = \int dx_1 \dots \int dx_n = [F_X(y)]^n$ where $X$ here is a generic random variable, by symmetry (all $X_i$ ‘s are identically distributed). This is the same answer we would’ve gotten if we made the iid assumption earlier and obtained $F_Y(y) = P(X_1 < y)P(X_2 < y) \cdots P(X_n < y)$ . Originally, I had made this assumption by way of wishful thinking — and a bit of intuition, it does seem that $n$ uniformly distributed random variables would be independent — but Ryan corrected my mistake.

Now that we have $F_Y(y)$ we can find $f_Y(y)$ the PDF.

$f_Y(y) = \frac{d}{dy}F_Y(y) = \frac{d}{dy}[F_X(y)]^n$
$f_Y(y) = n[F_X(y)]^{n-1}f_X(y)$ by the chain rule.

Recall that the PDF $f_X(x)$ of a $X \sim Uniform(0,1)$ is $\frac{1}{b-a} = \frac{1}{1-0} = 1$ for $x \in [0,1]$ . And by extension the CDF $F_X(x)$ for a $X \sim Uniform(0,1)$ is:
$\int_a^x f(t)dt = \int_a^x \frac{1}{b-a}dt = t\frac{1}{b-a} \bigm|_a^x = \frac{x}{b-a} - \frac{a}{b-a} = \frac{x-a}{b-a} = \frac{x-0}{1-0} = x$ .

Plugging these values into our equation above (and noting we have $F_X(y)$ not $F_X(x)$ meaning we simply replace the $x$ we just derived with $y$ as we would in any normal function) we have:

$f_Y(y) = ny^{n-1} \cdot 1$

Finally, we are ready to take our expectation:

$\mathbb{E}[Y] = \int_{y\in A}yf_Y(y)dy = \int_0^1 yny^{n-1}dy = n\int_0^1 y^{n}dy = n\bigg[\frac{1}{n+1}y^{n+1}\bigg]_0^1 = \frac{n}{n+1}$

Let’s take a moment and make sure this answer seems reasonable. First, note that if we have the trival case of $Y = \max\{X_1\}$ (which is simply $Y = X_1$ ; $n = 1$ in this case) we get $\frac{1}{1+1} = \frac{1}{2}$ . This makes sense! If $Y = X_1$ then $Y$ is just a uniform random variable on the interval $0$ to $1$ . And the expected value of that random variable is $\frac{1}{2}$ which is exactly what we got.

Also notice that $\lim_{n\to\infty} \frac{n}{n+1} = 1$ . This also makes sense! If we take the maximum of 1 or 2 or 3 $X_i$ ‘s each randomly drawn from the interval 0 to 1, we would expect the largest of them to be a bit above $\frac{1}{2}$ , the expected value for a single uniform random variable, but we wouldn’t expect to get values that are extremely close to 1 like .9. However, if we took the maximum of, say, 100 $X_i$ ‘s we would expect that at least one of them is going to be pretty close to 1 (and since we’re choosing the maximum that’s the one we would select). This doesn’t guarantee our math is correct (although it is), but it does give a gut check that what we derived is reasonable.

We can further verify our answer by simulation in R, for example by choosing $n = 5$ (thanks to the fantastic Markup.su syntax highlighter):

################################################################
# R Simulation
################################################################
X = 5
Y = replicate(100000, max(runif(X)))
empirical = mean(Y)
theoretical = (X/(X+1)) #5/6 = 8.33 in this case
percent_diff = abs((empirical-theoretical)/empirical)*100

# print to console
empirical
theoretical
percent_diff

We can see from our results that our theoretical and empirical results differ by just 0.05% after 100,000 runs of our simulation.

> empirical
[1] 0.8337853
> theoretical
[1] 0.8333333
> percent_diff
[1] 0.0542087