Testing Tony Kornheiser’s Football (Soccer) Population Theory

Fans of the daily ESPN show Pardon the Interruption (PTI) will be familiar with the co-host’s frequent “Population Theory.” The theory has a few formulations; it is sometimes asserted that when two countries compete in international football the country with the larger population will win, while at other times it’s stated that the more populous country should win.

The “Population Theory” sometimes also incorporates the resources of the country. So, for example, Kornheiser recently stated that the United States should be performing better in international football both because the country has a large population, but also because it has spent a large sum of money on its football infrastructure.

I decided to test this theory by creating a dataset that combines football scores from SoccerLotto.com with population and per capita GDP data from various sources. Because of the SoccerLott.com formatting the page wasn’t easily scraped by R or copied and pasted into Excel, so a fair amount of manual work was involved. Here’s a picture of me doing that manual work to breakup this text 🙂

IMG_4265

The dataset included 537 international football games that took place between 30 June 2015 and 27 June 2016. The most recent game in the dataset was the shocking Iceland upset over England. The population and per capita GDP data used whatever source was available. Because official government statistics are not collected annually the exact year differs. I’ve uploaded the data into a public Dropbox folder here. Feel free to use it. R code is provided below.

Per capita GDP is perhaps the most readily available proxy for national football resources, though admittedly it’s imperfect. Football is immensely popular globally and so many poor countries may spend disproportionately large sums on developing their football programs. A more useful statistic might be average age of first football participation, but as of yet I don’t have access to this type of data.

Results

So how does Kornheiser’s theory hold up to the data? Well, Kornheiser is right…but just barely. Over the past year the more populous country has won 51.6% of the time. So if you have to guess the outcome of an international football match and all you’re given is the population of the two countries involved then you should indeed bet on the more populous country.

Of the 537 games, 81 occurred on a neutral field. More populous countries fared poorly on neutral fields, winning only 43.2% of the time. While at home the more populous country won 53.1% of their matches.

Richer countries fared even worse, losing more than half their games (53.8%). Both at home and at neutral fields they also fared poorly (winning only 45.8% and 48.1% of their matches respectively).

The best predictor of international football matches (at least in the data I had available) was whether the team was playing at home: home teams won 60.1% of the time.

To look more closely at population and winning I plotted teams that had played more than three international matches in the past year against their population. There were 410 total games that met this criteria. I also plotted a linear trend line in red, which as the figures above suggest, slopes upward ever so slightly.

population_vs_winning_perct.png

Although 527 games is a lot, it’s only a single year’s worth of data. It may be possible that this year was an anomaly and I’m working on collecting a larger set of data. As the chart above suggests many countries have a population around 100 million or less and so it would perhaps be surprising if countries with a few million more or fewer people had significantly different outcomes in their matches. But we can test this too…

When two countries whose population difference is less than 1 million play against one another the more populous country actually losses 55.9% of the time. When two countries are separated by less than 5 million people the more populous country wins slightly more than random chance with a winning percentage of 52.1%. But large population differences (greater than 50 million inhabitants) does not translate into more victories. They win just 51.2% of the time. So perhaps surprisingly the small sample of data I have suggests that population differences matter more when the differences are smaller (of course this could be spurious).

This can be further seen below in a slightly different view of the chart above that exchanges the axes and limits teams to those countries with less than 100 million people.

population_vs_winning_perct_smaller.png

R code provided below:

###################################################################################################
# James McCammon
# International Football and Population Analysis
# 7/1/2016
# Version 1.0
###################################################################################################
 
# Import Data
setwd("~/Soccer Data")
soccer_data = read.csv('soccer_data.csv', header=TRUE, stringsAsFactors=FALSE)
population_data = read.csv('population.csv', header=TRUE, stringsAsFactors=FALSE)
 
 
################################################################################################
# Calculate summary data
################################################################################################
# Subset home field and neutral field games
nuetral_field = subset(soccer_data, Neutral=='Yes')
home_field = subset(soccer_data, Neutral=='No')
 
# Calculate % that larger country won
(sum(soccer_data[['Bigger.Country.Won']])/nrow(soccer_data)) * 100
# What about at neutral field?
(sum(nuetral_field[['Bigger.Country.Won']])/nrow(nuetral_field)) * 100
# What about at a home field?
(sum(home_field[['Bigger.Country.Won']])/nrow(home_field)) * 100
 
# Calculate % that richer country won
(sum(soccer_data[['Richer.Country.Won']])/nrow(soccer_data)) * 100
# What about at neutral field?
(sum(nuetral_field[['Richer.Country.Won']])/nrow(nuetral_field)) * 100
# What about at a home field?
(sum(home_field[['Richer.Country.Won']])/nrow(home_field)) * 100
 
# Calculate home field advantage
home_field_winner = subset(home_field, !is.na(Winner))
(sum(home_field_winner[['Home.Team']] == home_field_winner[['Winner']])/nrow(home_field_winner)) * 100
 
# Calculate % that larger country won when pop diff is less than 1 million
ulatra_small_pop_diff_mathes = subset(soccer_data, abs(Home.Team.Population - Away.Team.Population) < 1000000)
(sum(ulatra_small_pop_diff_mathes[['Bigger.Country.Won']])/nrow(ulatra_small_pop_diff_mathes)) * 100
#Calculate % that larger country won when pop diff is less than 5 million
small_pop_diff_mathes = subset(soccer_data, abs(Home.Team.Population - Away.Team.Population) < 5000000)
(sum(small_pop_diff_mathes[['Bigger.Country.Won']])/nrow(small_pop_diff_mathes)) * 100
#Calculate % that larger country won when pop diff is larger than 50 million
big_pop_diff_mathes = subset(soccer_data, abs(Home.Team.Population - Away.Team.Population) > 50000000)
(sum(big_pop_diff_mathes[['Bigger.Country.Won']])/nrow(big_pop_diff_mathes)) * 100
 
 
################################################################################################
# Chart winning percentage vs. population
################################################################################################
library(dplyr)
library(reshape2)
 
base_data = 
  soccer_data %>%
  filter(!is.na(Winner)) %>%
  select(Home.Team, Away.Team, Winner) %>%
  melt(id.vars = c('Winner'), value.name='Team')
 
games_played = 
  base_data %>%
  group_by(Team) %>%
  summarize(Games.Played = n())
 
games_won = 
  base_data %>%
  mutate(Result = ifelse(Team == Winner,1,0)) %>%
  group_by(Team) %>%
  summarise(Games.Won = sum(Result))
 
team_results = 
  merge(games_won, games_played, by='Team') %>%
  filter(Games.Played > 2) %>%
  mutate(Win.Perct = Games.Won/Games.Played)
 
team_results = merge(team_results, population_data, by='Team')
 
# Plot all countries
library(ggplot2)
library(ggthemes)
ggplot(team_results, aes(x=Win.Perct, y=Population)) +
  geom_point(size=3, color='#4EB7CD') +
  geom_smooth(method='lm', se=FALSE, color='#FF6B6B', size=.75, alpha=.7) +
  theme_fivethirtyeight() +
  theme(axis.title=element_text(size=14)) +
  scale_y_continuous(labels = scales::comma) +
  xlab('Winning Percentage') +
  ylab('Population') +
  ggtitle(expression(atop('International Soccer Results Since June 2015', 
                     atop(italic('Teams With Three or More Games Played (410 Total Games)'), ""))))
ggsave('population_vs_winning_perct.png')
 
# Plot countries smaller than 100 million
ggplot(subset(team_results,Population<100000000), aes(y=Win.Perct, x=Population)) +
  geom_point(size=3, color='#4EB7CD') +
  geom_smooth(method='lm', se=FALSE, color='#FF6B6B', size=.75, alpha=.7) +
  theme_fivethirtyeight() +
  theme(axis.title=element_text(size=14)) +
  scale_x_continuous(labels = scales::comma) +
  ylab('Winning Percentage') +
  xlab('Population') +
  ggtitle(expression(atop('International Soccer Results Since June 2015', 
                          atop(italic('Excluding Countries with a Population Greater than 100 Million'), ""))))
ggsave('population_vs_winning_perct_smaller.png')

Created by Pretty R at inside-R.org

Why I’m not voting for president, but plan on complaining anyway

“If you don’t vote you don’t have the right to complain,” the saying goes. Alas, I don’t plan on voting, but I have no problem complaining anyway.

Of course the word “right” here is not used to denote a legal construct, but rather a cosmic one. A sort of you-got-what’s-coming-to-you quid pro quo. You didn’t marry LeeAnne so now you have no right to complain about being 35 and alone, or so my mother tells me.

But just as surely as cosmic rights exist, we’re all guilty of violating them everyday, especially when it comes to complaining when perhaps we know better. I might, for instance, procrastinate on the job by harassing my friends to “get out the vote” only to later complain about having to work late. Or I might refuse to take public transportation and then complain about all the traffic I have to sit in. And don’t get me started on all the wasted time we spend lamenting our relationship misadventures, so often the result of our own design.

So for starters even if I have no right to complain about who’s president if I didn’t vote, I’m going to complain anyway because that’s what we humans do. It happens in all kinds of settings, why should politics be any different? Indeed, it’s notable that you never hear the phrase “If you don’t vote you have no right to celebrate.” It seems there is something deeply human and seductive about complaint.

But, the example above regarding public transportation points out another flaw in the not-voting-equals-no-complaining calculus. Even if you decided to take the bus instead of drive it wouldn’t really do anything to mitigate the amount of traffic in your city. It’s not your single car that’s causing congestion after all. You should feel guiltless when complaining about traffic because – unless you happen to be the head of your city’s transportation department – there is quite literally nothing you as an individual can do to reduce traffic even if, paradoxically, you happen to be part of the problem.

By now you have likely unveiled my public-transportation-is-really-voting allusion. I’m sure you’ve heard it before, but stick with me for a moment. That’s right, I’m sorry people but your vote just doesn’t matter.  Let me clarify. Your vote actually does matter in all kinds of important ways. It allows you to express your preferences through our democratic process, to align yourself with politicians you believe to be sensible if not always wholly upstanding, to signal to the world how civic minded you are as you stroll into the afterwork cocktail party with an “I voted” sticker affixed to your lapel. It’s just that your vote doesn’t matter for the outcome of the election itself.

I hesitate to call this position anything other than fact. It has been shown both by mathematical calculations and by historical evidence. In truth the probability of an election being decided by your vote alone is not absolutely zero. The probability that you’ll be struck by lightening isn’t zero either, but you should probably go about living as if it were.

Why should my right to complain hinge on something so superfluous as a vote?

“But what about Florida?” you ask. Yes, in 2000 the U.S. presidential election was decided by a mere 537 votes. These five hundred votes might as well have been five million though because both numbers are larger than zero, the count difference it would take for your vote to decide the election.

“But what if everyone thought the way you do?,” you retort. Well, in that case we’d be in trouble. But everyone doesn’t think the way I do, which is why this piece is likely to draw your ire. If you’re the type of person that organizes the masses to get out and vote then you might matter a lot for an election, but your vote matters very little.

And while we’re at it — no, not voting is not the same thing as voting for Donald Trump. You can bet if Drumpf becomes president I’ll do plenty of complaining, not least because I don’t want my next trip to the White House to involve being blinded by the sun’s reflection off a gold-plated North Lawn.

The situation is even rosier for the would-be kvetch though because not only does voting not matter, but the president doesn’t matter that much either. Now comes the exciting part because I get to reference my favorite kind of bias, aptly-titled “leadership attribution bias.” In short, the president is a manager like any other: they get all of the credit when things go well and none of the blame when things go poorly. A cheap shot I know.

When you credit (or blame) the president you’re really referencing U.S. political institutions more broadly, and you have even less control over those than you have over who the next president is. The president is buffeted by all kinds of institutional and political forces: House and Senate constituencies, tit-for-tat political horse trading, the actions of both rouge and friendly nations, state and local policy, regulatory agencies, the judiciary, and the vacillating will of the American public to name a few.

The average American political scientist thinks the president matters much less than the average American citizen. Maybe they’re out of touch or overly wonkish, or maybe they’re better at understanding the complexities and constraints of the modern American presidency.

I haven’t even mentioned the fact that one might be disinclined to vote simply because none of the candidates in our not-so-diverse, two-party system fit the bill. Now that’s something to complain about.

Nor am I fond of the idea of absorbing the marginal voter into the presidential election decision simply because it’s everyone’s civic duty to vote. If someone is ignorant, let them abstain. It’s probably better than tackling a crash course in U.S. politics days before an election. And while you’re at it, when they’re forced to switch healthcare providers let them complain. The distance between their abstention and healthcare troubles is lightyears.

There are plenty of reasons not to vote. And there are certainly plenty of reasons to complain about policy outcomes. Abstention may seem foolish because it puts a decision that could be ours in the hands of another. But if we have a cosmic right to anything, it’s to complain despite our own foolishness.

[Relax. It’s intentionally incendiary people.]