R Functions to Download and Load Data

I’ve been writing a lot of R code recently and since I’m always downloading and loading files I decided to write a couple of helper functions. The first function “getData” can download and unzip (if necessary) files from the internet making minor adjustments in the connection to account for differences in operating system. Certainly not as robust as the “Downloader” package, but useful for most simple operations.

The next function “loadData” can automatically load a set of files and automatically assigns a name in R which is the same as the file name. I added a simple progress bar to this function since it can sometimes take a while to load several large files. You can see how to use the two functions here:

# Specify directories and file names
fileURL = 'A file URL'
fileName = 'AZippedFile.zip'
subDir = 'aSubDirectory'
# Download data and put in directory using custom function
getData(fileURL, fileName, subDir, compressed = TRUE)
# Load required files into R 
# Get file names
fileNames = list.files(subDir, full.names = TRUE, recursive = TRUE)
# Specify required files
requiredFiles = c("file1.txt", "file2.txt", "file3.txt")
# Get required files from the list of files using custom function
loadData(fileNames, requiredFiles, envir = .GlobalEnv)

Created by Pretty R at inside-R.org

Here are the two custom functions:

# Get data function
getData = function(fileURL, fileName, mainDir = getwd(), subDir = ".", compressed = FALSE, OS = "Unspecified") {
  # If the data doesn't exist download it and setup a folder to store it
  if (!file.exists(file.path(mainDir, subDir, fileName))) {
    print("Downloading data and creating file directory...")
    method = switch(tolower(OS),
                    "windows" = "internal",
                    "mac" = "curl",
                    "lynx" = "wget",
    dir.create(file.path(mainDir, subDir), showWarnings = FALSE)
    filePath = file.path(mainDir, subDir, fileName)
    download.file(fileURL, destfile = filePath, method)
    if (compressed) unzip(filePath, exdir = subDir)
  else {
    print("Files already download. Ready to read into R memory.")
# Load data function
loadData = function(fileNames, requiredFiles, envir = environment()) {
  numFiles = length(requiredFiles)
  pb = txtProgressBar(min = 0, max = numFiles, style = 3)
  # Load each file into R
  print("Loading files into memory. Please wait.")
  for (file in 1:numFiles) {
    elem = requiredFiles[file]
    filePath = paste0("/",elem)
    R_Object = gsub(".txt", "", elem)
    assign(R_Object, read.table(fileNames[grep(filePath, fileNames)], header=FALSE), envir)
    setTxtProgressBar(pb, file)
  print("Files loaded.")

Created by Pretty R at inside-R.org


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s