Customer Journey Management and the O-ring Theory

I was recently directed to this 2013 HBR article on managing customer journeys written by Alex Rawson, Ewan Duncan, and Conor Jones.

The article included this line:

Take new-customer onboarding, a journey that typically spans about three months and involves six or so phone calls, a home visit from a technician, and numerous web and mail exchanges. Each interaction with this provider had a high likelihood of going well. But in key customer segments, average satisfaction fell almost 40% over the course of the journey.

…which instantly broad to mind the O-ring theory of economic development. Wikipedia summarizes the theory like this: “[the O-ring theory] proposes that tasks of production must be executed proficiently together in order for any of them to be of high value.”

One fallout of the theory is that, because independent probabilities multiply, “doing a good job” often isn’t good enough. Take the example of the TV provider excerpted above. After a comprehensive customer journey analysis was conducted it was discovered that on average there were 19 separate customer interactions. Now suppose there is a 95% satisfaction rate for each interaction when considered alone. The probability that all 19 interactions will be successful for any given customer is only 37% (.95^19). Meaning the majority of customers will have at least one negative experience.

To think about the impact of cross-touchpoint experience management more fully I ran a simulation in R. I imagined there was a company that had 10,000 customers they were onboarding over the course of a year with 19 various touchpoints. To make the simulation more realistic I shifted from the binary “satisfied/unsatisfied” calculation above and imagined the satisfaction score distribution for each touchpoint show below. For most companies this would be a highly successful customer management program. And indeed, after running 1,000 simulations the average customer score is a 9.4 (which is just the expected value using the distribution below).

…but because there are so many customers and the probabilities of each touchpoint are independent 5.5% of customers ended up having two or more touchpoints they rated as a two or below. Nearly a quarter (24.5%) of customers had two or more touchpoints that were rated a five or below.

probs.PNG

This could have drastic consequences for revenue. Two or more horrible touchpoint experiences during the onboarding process — which is obviously the first impression a customer has with a particular company — could lead customers to reduce the amount of service they buy or to abandonment altogether.

One way to think about fixing the problem of the customer journey is to envision transforming independent probabilities into conditional probabilities. So, for example, if a customer does experience a particularly bad touchpoint the conditional probability that their subsequent touchpoints will be managed with special diligence should increase.

From an O-ring point of view the answer is simple: ensure that you are creating agglomeration economies that attract high-performing individuals that help construct great teams. Groups of people that are operating together at a high-level inspire one another and hold one another accountable, creating a virtuous circle. Over time the natural effect is to weed out low performers. Great teams might reduce the probability of a poor experience from 5% to 0.5%.

Of course to perform at their maximum level teams need to be well trained and know what mark to shoot for. This is just the solution the authors helped the TV provider narrow in on.

The authors conclude that:

As company leaders dug further, they uncovered the root of the problem. Most customers weren’t fed up with any one phone call, field visit, or other interaction—in fact, they didn’t much care about those singular touchpoints. What reduced satisfaction was something few companies manage—cumulative experiences across multiple touchpoints and in multiple channels over time.

The pay TV company’s salespeople, for example, were focused on closing new sales and helping the customer choose from a dense menu of technology and programming options—but they had very little visibility into what happened after they hung up the phone, other than whether or not the customer went through with the installation. Confusion about promotions and questions about the installation process, hardware options, and channel lineups often caused dissatisfaction later in the process and drove queries to the call centers, but sales agents seldom got the feedback that could have helped them adjust their initial approach.

The solution to broken service-delivery chains isn’t to replace touchpoint management. Functional groups have important expertise, and touchpoints will continue to be invaluable sources of insight, particularly in the fast-changing digital arena. (See David Edelman’s“Branding in the Digital Age: You’re Spending Your Money in All the Wrong Places,” HBR December 2010.) Instead, companies need to embed customer journeys into their operating models in four ways: They must identify the journeys in which they need to excel, understand how they are currently performing in each, build cross-functional processes to redesign and support those journeys, and institute cultural change and continuous improvement to sustain the initiatives at scale.

 

R code below:

###################################################################################################
# James McCammon
# Customer Experience Simulation
# 6/25/2016
# Version 1.0
###################################################################################################
 
# Function to get the number of customers with a certain number of ratings below a given threashold
percent_x_scores_below_y = function(sim, num_scores, equal_or_below_cutoff, num_customers) {
  (sum(apply(sim,2,FUN=function(x) sum(x<=equal_or_below_cutoff)) >= num_scores)/num_customers) * 100 
}
 
# Set variables
scale = 1:10
probs = c(.01, .01, .01, .01, .01, .01, .01, .03, .1, .8)
interactions.per.customer = 19
num_customers = 10000
n = interactions.per.customer*num_customers
num_sims = 1000
 
# Setup results matrix
sim_results = matrix(nrow=num_sims, ncol = 3)
colnames(sim_results) = c('Average_Customer_Rating',
                       'Percent_Having_Two_Scores_Below_Two',
                       'Percent_Having_Two_Scores_Below_Five')
 
# Run simulation
for(i in 1: num_sims) {
  # Run sim
  sim = matrix(sample(scale, size=n, replace=TRUE, prob=probs), ncol=num_customers, nrow=interactions.per.customer)
  # Store mean score
  sim_results[i,'Average_Customer_Rating'] = mean(apply(sim,2,mean))
  # Store % of customers with two scores below 2
  sim_results[i,'Percent_Having_Two_Scores_Below_Two'] = percent_x_scores_below_y(sim, num_scores=2, equal_or_below_cutoff=2, num_customers)
  # Store % of customers with two scores below 5
  sim_results[i,'Percent_Having_Two_Scores_Below_Five'] = percent_x_scores_below_y(sim, num_scores=2, equal_or_below_cutoff=5, num_customers)  
}
 
# Calculate average across sims
apply(sim_results, 2, mean)

Created by Pretty R at inside-R.org

Advertisements

How delicate are relationships?

Recently I have been thinking of dating analytically as a stream of dates where one must build up “relationship capital” before certain pieces of information are revealed or certain negative events occur. The stream must more or less occur in a particular order. That is, if you examined a series of weekly dates over the course of a year and scrambled that order – maybe the 37th date came first, the 5th date came 9th, etc. – the outcome wouldn’t necessarily be the same.

Many of us believe that there are instances when a piece of personal information is revealed “too soon.” For example, if on a first date your companion tells you that he or she recently filed for bankruptcy it may be a “deal breaker” as you assume this reflects negatively on their level of responsibility (as well as being an “overshare”). However, if the same piece of information is revealed after you’ve been dating, say, three months you can weigh the strength of that assumption against the person you’ve come to know. Likewise, some intense external shocks can prematurely end a relationship if they occur too close to the beginning of a relationship whereas  if that same event occurred after sufficient time had passed both partners have would have built up enough “relationship capital” to weather the storm.

I decided to expand on this idea by creating a simple model (which I plan on elaborating over time) by assuming a couple goes on a date once a week for a year (or 52 dates over whatever time period you like). Three pieces of information or events must be reveled only after enough capital has accrued. Event 1, is low level and can only occur after Date 5, Event 2 must occur after Date 20, and Event 3, the most serious, must occur only after Date 40.

What if we kept the concept of “deal breakers” and randomly scrambled the order or the dates. How many relationships would still last a year or more (by chance) simply because events happened after enough capital was built up? It turns out that scrambling the order of the stream of dates results in a failed relationship about 88% of the time.

Of course, this is a theoretical exercise, not just because it’s impossible to scramble the order of dates, but because in practice it is us who decide if we want to be with a person despite difficult times or questionable personal sharing.

 

Steph Curry is Awesome

There have been many, many blog posts about Steph Curry’s dominance this year but let me add one more just for fun. Using data from dougstats.com, I looked into Curry’s dominance in 3-point shooting.

The average number of 3-pointers made so far this season (excluding Steph Curry) is 42. However, this isn’t exactly the group we want to compare Steph against, since we wouldn’t consider, say, DeAndre Jordan his peer in terms of 3-point responsibility. Instead, I considered guards that have played in 60+ games so far this season. After doing so the average number of 3-pointers made about doubles to 87 (again, excluding Curry). Meanwhile, Steph Curry has 343 threes and is on track to finish the season with 398, more than one hundred more than his own single-season record. Here’s how the outlier that is Steph Curry looks graphically.

Four years ago Klay Thompson would’ve been on track to beat Ray Allen’s single-season record, but because of Curry, Thompson has to settle for a distant second.

Screen Shot 2016-03-24 at 5.37.27 PM.png

Curry has been ahead of everyone else all season long and this distance only grows larger with each game as this chart shows (made with data from Basketball-Reference.com):

Screen Shot 2016-03-25 at 4.07.50 PM.png

How rare is Steph Curry’s season? If we think of creating an NBA guard as a random normal process and give it this seasons mean and standard deviation of three pointers made we can get a rough idea. As it turns out the distribution of threes is skewed (as you might expect), but if you squint a bit you can see the distribution of the square root of each player’s three pointers made is approximately normal (this was revealed by using the PowerTransform function in R’s car package).

Assuming the parameters above we would expect to see a “2015-2016 Steph Curry” about once every 200 seasons.

Screen Shot 2016-03-25 at 7.33.10 PM.png

Here is Curry’s shot chart so far this season (using data from stats.NBA.com and this tutorial from The Data Game):

Screen Shot 2016-03-25 at 10.14.03 PM.png

Will a 16 seed ever beat a 1 seed?

That question arose during a recent dinner with my friend Graham at a popular pizza restaurant in Seattle. Tony Kornheiser of PTI said last week that it will never happen. Graham agrees. I think it will happen in our lifetime. It has almost happened a number of times already.

It seemed to me from casual observation like 16 seeds are getting closer to winning on average, but I decided to check this by plotting the point differential of higher-seeded teams during the first round of the NCAA tournament (in the first round there are four 1-vs-16 games). Indeed, compared to many other matchups the 1-vs-16 matchup has shifted greatly over time.

Screen Shot 2016-03-19 at 12.03.18 AM.png

The margin of victory is still substantial, between 10 and 15 points so far this decade, but I remain confident that, let’s say, sometime in the next 40 years it will happen.  There have already been eight 15 seed victories over number 2 seeds and twenty-one 14 seed wins over 3 seeds. The situation looks even better when you consider the closest 1-vs-16 game during each tournament (since we only need one 16 seed to win).

Screen Shot 2016-03-22 at 1.36.38 PM.png

It seems that about two to three times every decade there are relatively close 1-vs-16 games and about once a decade there is an extremely close game (decided by just a couple of points). The 2000s did not fare well for 16 seeds.

I think the outcomes of close games are more stochastic than most. Leadership attribution bias seems to turn these stochastic events into narratives of late-game heroics and we’re prone to say that the 1 seed is more poised and resistant to pressure than players at smaller schools. Of course the history of the tournament has shown us many, many exceptions to this rule (if it’s a rule at all). At tension with this narrative is the story of the underdog that is just happy to be at the tournament and has nothing to lose, playing loose, having fun, and playing “to win” while the nervous champion is playing “not to lose.” So to me many of these games are closer to a coin flip than we like to think and given enough coin flips a tail is bound to come up eventually.

Also, think about it this way: How much difference is there between a 1 seed and a 2 seed, and between a 15 seed and a 16 seed? As you know if you ever watch the tournament seeding show there isn’t much of a difference. The lowest ranked number 1 seed isn’t much better – and could be worse – than some of the number 2 seeds, and eight times has a two seed lost in the first round. Of course you could argue the 2 seeds that lost should have actually been 3 seeds, although this year Michigan State lost as a number 2 seed and many considered them to be a favorite in the entire tournament (by some measures this is the biggest tournament upset ever). The larger point is that seeding is also somewhat stochastic and the question “can a 16 seed beat a 1 seed” is really the question of whether an overmatched team can exhibit a one-time victory over an opponent that is much more dominant on average. And we already know the answer to this question is “yes.”

To give the other side its due, since 1979 when seeding began, 22 of the 37 NCAA Tournament winners have been 1 seeds, so at least some of the top seeds are properly ranked and 16 seeds will have a tough time beating them when ranking is accurate.

It’s also a question as to why the decrease in point margin has occurred. Keep in mind the plot above is just a second-order trend line, although if you plot the underlying year-by-year margin it does follow the trend on average (of course). It’s interesting that the late 1980s and early 1990s was a time of lower point differential for 16-vs-1 games and that this period also correlated with three wins for 2 seeds in the first round (1991, 1993, 1997). Likewise the past few years have seen another decrease in 16-vs-1 game point differential and another string of 2-seed wins, two in 2012 and one each in 2013 and 2016. Similarly, between 1986 and 1999 there were thirteen times that a 14 beat a 3, and since 2010 this has occurred another six times (the intervening years saw only two instances of this in 2005 and 2006). These two periods also correspond to the highly touted recruiting classes of Michigan in 1991 (famously nicknamed the “Fab Five“) and Kentucky’s 2013 “mega class.” It may be that there are episodic shifts in recruiting that systematically leave certain types of talent on the table for smaller schools to cull and develop. (Of course, it may be I’m just seeing patterns where none exist).

My recent memory is that although there are a few good schools that still get the best players, smaller schools have seen that if they recruit good players (particularly good shooters or traditional big men) and play as a team they have a chance at beating anyone in the first rounds and perhaps going deep into the tournament. This increases their confidence and performance. This is another reason I think a 16 seed will eventually beat a 1, because the recent years of the tournament have expanded our imagination of what is possible. Think about the well-known phenomenon of a 12 seed always beating a 5 seed. How nervous do you think 5 seeds are every year? How much of this consistency in a 12 beating a 5 is self-reinforcing and due to the 5 seeds having “the jitters” and the 12 seeds being relatively overconfident?

Creating “Tidy” Web Traffic Data in R

There are many tutorials for transforming data using R, but I wanted to demonstrate how you might create dummy web traffic data and then make it tidy ( have one row for every observation) so that it can be further processed using statistical modeling.

The sample data I created looks like the table below. So there is a separate line for each ID. This might be a visitor ID or a session ID. The ISP, week, and country are the same for every row of the ID. But the page ID is different. If the user visited five different pages they would have five unique rows. Two pages, two rows. And so on. You can imagine many different scenarios where you might have a dataset structured in this way. The column “Page_Val” is a custom column that must be added to the dataset by the user. When we pivot the data each page ID will become it’s own column. If a user visited a particular page Page-Val will be used to add a 1 to that page ID column for that user. Otherwise an NA will appear. We can then go back and replace all the NAs with 0s. I used two lines of code to perform the transformation and replacing of NAs with 0s, but using the fill parameter in the dcast function in “reshape2” it can actually be done in a single line. That’s why it pays to read the documentation very closely, which I didn’t do the first time around. Shame on me.

Screen Shot 2015-03-23 at 6.56.45 PM

Because the sample data is so small the data can easily be printed out to the screen before and after the transformation.

Screen Shot 2015-03-23 at 7.11.06 PM

But the data doesn’t have to be small. In fact, I wrote a custom R function that generates this dummy data. It can easily produce large amounts of dummy data so more robust excercises can be performed. For instance, I produced 1.5 million rows of data in just a couple of seconds and then ran speed tests using the standard reshape function in one trial and the dcast function from “reshape2” in another trial. The dcast function was about 15 times faster in transforming the data. All of the R code is below.

###############################################################################################
# James McCammon
# Wed, 18 March 2015
#
# File Description:
# This file demonstrates how to transform dummy website data from long to wide format.
# The dummy data is strucutred so that there are mulitple rows for each visit, with 
# each visit row containing a unique page id representing a different webpage viewed
# in that visit. The goal is to transform the data so that each visit is a single row,
# with webpage ids transformed into columns names with 1s and 0s as values representing 
# whether  the page was viewed in that particular visit. This file demonstrates two different
# methods to accomplish this task. It also performs a speed test to determine which
# of the two methods is faster.
###############################################################################################
 
# Install packages if necessary
needed_packages = c("reshape2", "norm")
new_packages = needed_packages[!(needed_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new.packages)
 
###############################################################################################
# Create small amount of raw dummy data
###############################################################################################
# Make a function to create dummy data
get_raw_data = function(num_visitors, n, ids, isps, weeks, countries, pages) {
  raw_data = cbind.data.frame(
    'Id' = do.call(rep, list(x=ids, times=n)),
    'ISP' = isps[do.call(rep, list(x=sample(1:length(isps), size=num_visitors, replace=TRUE), times=n))],
    'Week' = weeks[do.call(rep, list(x=sample(1:length(weeks), size=num_visitors, replace=TRUE), times=n))],
    'Country' = countries[do.call(rep, list(x=sample(1:length(countries), size=num_visitors, replace=TRUE), times=n))],
    'Page' = unlist(sapply(n, FUN=function(i) sample(pages, i))),
    'Page_Val' = rep(1, times=sum(n)))
  raw_data
}
 
# Set sample parameters to draw from when creating dummy raw data
num_visitors = 5
max_row_size = 7
ids = 1:num_visitors
isps = c('Digiserve', 'Combad', 'AT&P', 'Verison', 'Broadserv', 'fastconnect')
weeks = c('Week1')
countries = c('U.S.', 'Brazil', 'China', 'Canada', 'Mexico', 'Chile', 'Russia')
pages = as.character(sample(100000:200000, size=max_row_size, replace=FALSE))
n = sample(1:max_row_size, size=num_visitors, replace=TRUE)
# Create a small amount of dummy raw data
raw_data_s = get_raw_data(num_visitors, n, ids, isps, weeks, countries, pages)
 
###############################################################################################
# Transform raw data: Method 1
###############################################################################################
# Reshape data
transformed_data_s = reshape(raw_data_s,
                timevar = 'Page',
                idvar = c('Id', 'ISP', 'Week', 'Country'),
                direction='wide')
# Replace NAs with 0
transformed_data_s[is.na(transformed_data_s)] = 0
 
# View the raw and transformed versions of the data
raw_data_s
transformed_data_s
 
###############################################################################################
# Transform raw data: Method 2
###############################################################################################
# Load libraries
require(reshape2)
require(norm)
 
# Transform the data using dcast from the reshape2 package
transformed_data_s = dcast(raw_data_s, Id + ISP + Week + Country ~ Page, value.var='Page_Val')
# Replace NAs using a coding function from the norm package
transformed_data_s = .na.to.snglcode(transformed_data_s, 0)
# An even simpler way of filling in missing data with 0s is to
# specify fill=0 as an argument to dcast.
 
# View the raw and transformed versions of the data
raw_data_s
transformed_data_s
 
###############################################################################################
# Run speed tests on larger data
###############################################################################################
# Caution: This will create a large data object and run several functions that may take a
# while to run on your machine. As a cheat sheet the results of the test are as follows:
# The transformation method that used the standard "reshape" command took 28.3 seconds
# The transformation method that used "dcast" from the "reshape2" package took 1.8 seconds
 
# Set sample parameters to draw from when creating dummy raw data
num_visitors = 100000
max_row_size = 30
ids = 1:num_visitors
pages = as.character(sample(100000:200000, size=max_row_size, replace=FALSE))
n = sample(1:max_row_size, size=num_visitors, replace=TRUE)
# Create a large amount of raw dummy data
raw_data_l = get_raw_data(num_visitors, n, ids, isps, weeks, countries, pages)
 
s1 = system.time({
  transformed_data_l = reshape(raw_data_l,
                               timevar = 'Page',
                               idvar = c('Id', 'ISP', 'Week', 'Country'),
                               direction='wide')
  transformed_data_l[is.na(transformed_data_l)] = 0 
})
 
s2 = system.time({
  transformed_data_l = dcast(raw_data_l, Id + ISP + Week + Country ~ Page, value.var='Page_Val')
  transformed_data_l = .na.to.snglcode(transformed_data_l, 0)   
})
 
s1[3]
s2[3]

Created by Pretty R at inside-R.org

R and C++ Selection Sort

Casually following along with Princton’s Coursera course on algorithms taught in Java. Trying to implement the 5 different sorting algorithms without looking at the Java implementation. At the same time trying to learn a bit of C++ and use R’s Rcpp function.

I implemented selection sort in both R and C++ and boy is the C++ version faster — 1000+ times faster!!! Below you can see a comparison sorting a 50,000 element numeric vector. The C++ version took less than a second, while the R version took 14 minutes. Crazy!!

> unsorted = sample(1:50000)
> R_time = system.time(selection_sort_R(unsorted))
> Cpp_time = system.time(selection_sort_Cpp(unsorted))
> R_time['elapsed']/Cpp_time['elapsed']
 elapsed 
1040.395
> Cpp_time
   user  system elapsed 
   0.82    0.00    0.81 
> R_time
   user  system elapsed 
 838.79    0.09  842.72

Created by Pretty R at inside-R.org

Below are the implementations in R and C++.

###################################################
# R Selection Sort
###################################################
# Define R selection Sort
selection_sort_R = function(v) {
  N = length(v)
 
  for(i in 1:N) {
    min = i
 
    for(j in i:N) {
      if(v[j] < v[min]) {
        min = j
      }    
    }
    temp = v[i]
    v[i] = v[min]
    v[min] = temp
  }
  return(v)
}

Created by Pretty R at inside-R.org

###################################################
# C++ Selection Sort
###################################################
# Define C++ selection Sort
require(Rcpp)
cppFunction('NumericVector selection_sort_Cpp(NumericVector v) {
  int N = v.size();
  int min;
  int temp;
 
  for(int i = 0; i < N; i++) {
    min = i;
 
    for(int j = i; j < N; j++) {
      if(v[j] < v[min]) {
        min = j;
      }    
    }
    temp = v[i];
    v[i] = v[min];
    v[min] = temp;
  }
  return v;
}')

Created by Pretty R at inside-R.org

Percolation Algorithm In R

I’m casually following along with a few of the lessons in Princeton’s Algorithms, Part 1 on Coursera. The course itself is taught in Java, but I thought completing some of the lessons in R would improve my R skills. The first activity I’ve completed is a percolation simulation using weighted quick-union with path compression. I’m not actually sure if this was one of the assignments, but it seemed cool so I thought I’d give it a try.

Percolation is meant to model physical or social systems where some type of blockage occurs. The algorithm taught in the class is to begin with full blockage and to randomly unblock one element until there is a path from one side of the system to another. The simplest method of doing this is to imagine a blocked grid. The initial blocked system is shown below.

percolation start

My algorithm removes one blockage per half second and this is modeled by turning the black box white. A totally percolated system is shown below. Notice there is now a path through the blockage – by starting at the bottom, following the open (white) squares up and to the left and then diagonally up and to the right.

percolation end

At the moment percolation is completed a message is printed to the console.

percolation output

The code to initialize the board and percolated system and to run the clearing algorithm is shown below.

####################################################################
# James McCammon
# 21 Feb 2015
# Percolation Algorithm Functions Version 1
####################################################################
# Load libraries
library(shape)
source("perc_functions.R")
 
# Define variables
nrows = 10
ncols = 10
nsites = nrows*ncols
closedSites = 1:nsites
numneighbors = 4
display = NULL
#display = matrix(data=0, nrow=nrows, ncol=ncols)
VTN = nsites + 1
VBN = nsites + 2
 
# Initailize board
board = initBoard()
initDisplay()
initVTN()
initVBN()
 
# Run main
main()

Created by Pretty R at inside-R.org

Most of the work is done in the set of functions below. This can definitely be improved, for example by incorporating an arbitrary list of neighbors into each site object rather than assuming the system is a square grid and determining neighbors by looking in the four quadrants around each site. I could also think harder about memory usage. Right now I just threw everything into the global environment and didn’t think too hard about memory usage or efficiency. Using the tracemem() function from the pryr shows there are a few times the board object is copied. I’ll probably play around with things and continue to improve them.

####################################################################
# James McCammon
# 21 Feb 2015
# Percolation Algorithm Main Version 1
####################################################################
# Define main function
main = function() {
  sleeptime = 0.5
 
  while(!connected(VTN, VBN)) {
    randSite = getRandSite()
    board[[randSite]]$open = TRUE
    #location = trans2display(randSite)
    #write2display(location)
    filledrectangle(mid=c(display[[randSite]]$xmid, 
                          display[[randSite]]$ymid), 
                    display[[randSite]]$wx, 
                    display[[randSite]]$wy, 
                    col='white', 
                    lwd=1, 
                    lcol='white')
 
    neighbors = getNeighbors(randSite)
 
    for(i in 1:numneighbors) {
      if(is.null(neighbors[[i]])) next
      if(!isOpen(neighbors[[i]])) next
      join(randSite, neighbors[[i]])
    } 
    Sys.sleep(sleeptime)
  }
  print("Percolation completed")
  p = 1 - (length(closedSites)/nsites)  
  return()
}
 
 
#Initalize the display
initDisplay = function() {
  display <<- as.vector(1:nsites, mode='list')
  windows(5, 5)
  plot.new()
  xmid = 0
  ymid = 1
  wx = .1
  wy = .1
  index = 1
 
  for(row in 1:nrows) {
    for(col in 1:ncols) {
      filledrectangle(mid=c(xmid, ymid), wx=wx, wy=wy, col='black', lwd=1, lcol='white')
      display[[index]] <<- list('xmid'=xmid, 'ymid'=ymid, 'wx'=wx, 'wy'=wy)
      index = index + 1
      xmid = xmid + .1
    }
    xmid = 0
    ymid = ymid - .1
  }  
}
 
 
# Gets root 
getRoot = function(node) {
  while(node != board[[node]]$root) {
    board[[node]]$root <<- board[[board[[node]]$root]]$root
    node = board[[node]]$root
  }
  return(node)
}
 
 
# Checks if two notes are connected
connected = function(node1, node2) {
  return(getRoot(node1) == getRoot(node2))
}
 
 
# Initalize board
initBoard = function() {
  board = as.vector(1:nsites, mode='list')
  board = lapply(board, function(i) list("index"=i, "size"=1, "root"=i, "open"=FALSE))
  return(board)
}
 
 
# Initalize virtural top node
initVTN = function() {
  board[[VTN]] <<- list("index"=VTN, "size"=(ncols + 1), "root"=VTN, "open"=FALSE)
  lastrow = nsites - ncols + 1
  for(i in 1:ncols)  {
    board[[i]]$root <<- board[[VTN]]$root
  }  
}
 
 
# Initalize virtual bottom node
initVBN = function() {
  board[[VBN]] <<- list("index"=VBN, "size"=(ncols + 1), "root"=VBN, "open"=FALSE)
  lastrow = nsites - ncols + 1
  for(i in lastrow:nsites) {
    board[[i]]$root <<- board[[VBN]]$root
  }  
}
 
 
# Choose random site
getRandSite = function() {
  randSite = sample(closedSites, size=1)
  closedSites <<- closedSites[-which(closedSites==randSite)]
  return(randSite)
}
 
 
# Checks if site is open
isOpen = function(site) {
  return(!(site%in%closedSites))
}
 
 
# Get neighbors on a square board
getNeighbors = function(index) {
  top = index - ncols
  if(top < 1) top = NULL
 
  bottom = index + ncols
  if(bottom > nsites) bottom = NULL
 
  left = index - 1
  if(!(left%%ncols)) left = NULL
 
  right = index + 1
  if(right%%ncols == 1) right = NULL
 
  return(list("tn"=top, "rn"=right, "bn"=bottom, "ln"=left))
}
 
 
# Join two nodes
join = function(node1, node2) {
 
  r1 = getRoot(node1)
  r2 = getRoot(node2)
  s1 = board[[r1]]$size
  s2 = board[[r2]]$size
 
  if(s1 > s2) {
    board[[r2]]$root <<- board[[r1]]$root
    board[[r1]]$size <<- s1 + s2
  }
 
  else {
    board[[r1]]$root <<- board[[r2]]$root
    board[[r2]]$size <<- s2 + s1 
  } 
}
 
 
####################################################################
# Functions for visualizing output with matrix
####################################################################
# Translate location of site to matrix
trans2display = function(index) {
  col = index%%ncols
  if(!col) col=ncols
  row = ceiling(index/nrows)
  return(list("row"=row, "col"=col))  
}
 
 
# Update the matrix display
write2display = function(transObject) {
  display[transObject$row, transObject$col] <<- 1  
}

Created by Pretty R at inside-R.org