Stock Price Simulation

Larry Wasserman presents an interesting simulation in Problem 11, Chapter 3 of All of Statistics. The problem asks you to simulate the stock market by modeling a simple random walk. With probability 0.5 the price of the stock goes down $1 and with probability 0.5 the stock prices goes up $1. You may recognize this as the same setup in our two simple random walk examples modeling a particle on the real line.

This simulation is interesting because Wasserman notes that even with an equal probability of the stock moving up and down we’re likely to see patterns in the data. I ran some simulations that modeled the change in stock price over the course of 1,000 days and grabbed a couple of graphs to illustrate this point. For example, look at the graph below. It sure looks like this is a stock that’s tanking! However, it’s generated with the random walk I just described.

stock_plot1

Even stocks that generally hover around the origin seem to have noticeable dips and peaks that look like patterns to the human eye even though they are not.

stock_plot2

If we run the simulation multiple times it’s easy to see that if you consider any single stock it’s not so unlikely to get large variations in price (the light purple lines). However, when you consider the average price of all stocks, there is very little change over time as we would expect (the dark purple line).

stock_plot3

Here is the R code to calculate the random walk and generate the last plot:

################################################################
# R Simulation
# James McCammon
# 2/20/2017
################################################################
# This script goes through the simulation of changes in stock
# price data.

# Load plotting libraries
library(ggplot2)
library(ggthemes)
library(reshape2)

#
# Simulate stock price data with random walk
#
    n = 1000 # Walk n steps
    p = .5 # Probability of moving left
    trials = 100 # Num times to repeate sim
    # Run simulation
    rand_walk = replicate(trials, cumsum(sample(c(-1,1), size=n, replace=TRUE, prob=c(p,1-p))))

#
# Prepare data for plotting
#
    all_walks = melt(rand_walk)
    avg_walk = cbind.data.frame(
      'x' = seq(from=1, to=n, by=1),
      'y' = apply(rand_walk, 1, mean)
    )

#
# Plot data
#
    ggplot() + 
      geom_line(data=all_walks, aes(x=Var1, y=value, group=Var2), color='#BCADDC', alpha=.5) +
      geom_line(data=avg_walk, aes(x=x, y=y), size = 1.3, color='#937EBF') +
      theme_fivethirtyeight() +
      theme(axis.title = element_text()) +
      xlab('Days') +
      ylab('Change in Stock Price (in $)') +
      ggtitle("Simulated Stock Prices")


Simple Random Walk – Method 1

Suppose we consider a simple random walk. A particle starts at initial position z_i and moves one unit to the left with probability p and moves one unit to the right with probability 1-p . What is the expected position \mathbb{E}[X_n] of the particle after n steps?

I will calculate the expected value using two different methods. The second method is simpler, but I’ll start using an iteration method.

Our PMF is:

f_X(x) = \bigg\{ \begin{tabular}{cc} p     & x = -1 \\ 1-p & x = 1 \end{tabular}

Let’s set our initial position as:
n=0: \quad \mathbb{E}[X_0] = z_i

After one step our expected position is then:
n=1: \quad \mathbb{E}[X_1] = (z_i - 1)p + (z_i + 1)(1 - p)
n=1: \quad \mathbb{E}[X_1] = z_{i}p - p + z_i + 1 - z_{i}p - p
n=1: \quad \mathbb{E}[X_1] = z_i + 1 - 2p

Great, let’s try iterating one more to see what we get. Note that at n=2 our position is now the result from n=1 , z_i + 1 - 2p .
n=2: \quad \mathbb{E}[X_2] = (z_i + 1 - 2p - 1)p + (z_i + 1 - 2p + 1)(1 - p)
n=2: \quad \mathbb{E}[X_2] = z_{i}p - 2p^2 + Z_i - 2p + 2 - z_{i}p + 2p^2 - 2p
n=2: \quad \mathbb{E}[X_2] = z_i + 2(1 - 2p)

If we keep iterating we will see that \mathbb{E}[X_n] = z_i + n(1 - 2p) . But we can prove this formally through induction. We’ve already done our base case, so let’s now do the induction step. We will assume that \mathbb{E}[X_n] = z_i + n(1 - 2p) is true and show that it is also true for n + 1 .

\mathbb{E}[X_{n+1}] = (z_i + n(1 - 2p) - 1)p + (z_i + n(1 - 2p) + 1)(1 - p)
\mathbb{E}[X_{n+1}] = (z_i + n - 2pn - 1)p + (z_i + n - 2pn + 1)(1 - p)
\mathbb{E}[X_{n+1}] = z_{i}p + pn - 2p^{2}n - p + z_i + n - 2pn + 1 -z_{i}p - pn + 2p^{2}n - p
\mathbb{E}[X_{n+1}] = - p + z_i + n - 2pn + 1 - p
\mathbb{E}[X_{n+1}] = z_i + (n + 1)(1 - 2p)

Thus our induction step holds and we have shown that \mathbb{E}[X_n] = z_i + n(1 - 2p) .

Because we chose our initial starting position z_i to be arbitrary, we might as well set it to 0 to obtain a final result of \mathbb{E}[X_n] = n(1 - 2p) .

Let’s take a moment to think about this result and make sure it seems reasonable. Suppose p = 0.5 . This would mean we have an equal chance of moving left or moving right. Over the long run we would expect our final position to be exactly where we started. Plugging in p = 0.5 to our equation yields n(1 - 2 \cdot 0.5) = n(1 - 1) = 0 . Just as we expected! What if p = 1 ? This means we only move to the left. Plugging p = 1 into our equation yields n(1 - 2 \cdot 1) = n(-1) = -n . This makes sense! If we can only move to the left then after n steps we would expect to be n steps left of our staring position (the origin as we chose it), the negative direction in our problem setup. We could also choose p to be 0, meaning we only move to the right and we would get n , again just what we would expect!

We can also run a simulation in R to verify our results:

################################################################
# R Simulation
################################################################
# Generate random walk
rand_walk = function (n, p, z) {
  walk = sample(c(-1,1), size=n, replace=TRUE, prob=c(p,1-p))
  for (i in 1:n) {
    z = z + walk[i]
  }
  return(z)
}

n = 1000 # Walk n steps
p = .3 # Probability of moving left
z = 0 # Set initial position to 0
trials = 10000 # Num times to repeate sim
# Run simulation
X = replicate(trials, rand_walk(n,p,z))

# Calculate empirical and theoretical results
empirical = mean(X)
theoretical = n*(1-2*p)
percent_diff = abs((empirical-theoretical)/empirical)*100

# print to console
empirical
theoretical
percent_diff

Printing to the console we see that after 10,000 trials of 1,000 steps each our empirical and theoretical results differ by just 0.046%.

> empirical
[1] 400.1842
> theoretical
[1] 400
> percent_diff
[1] 0.0460288