I’ve been writing a lot of R code recently and since I’m always downloading and loading files I decided to write a couple of helper functions. The first function “getData” can download and unzip (if necessary) files from the internet making minor adjustments in the connection to account for differences in operating system. Certainly not as robust as the “Downloader” package, but useful for most simple operations.
The next function “loadData” can automatically load a set of files and automatically assigns a name in R which is the same as the file name. I added a simple progress bar to this function since it can sometimes take a while to load several large files. You can see how to use the two functions here:
# Specify directories and file names fileURL = 'A file URL' fileName = 'AZippedFile.zip' subDir = 'aSubDirectory' # Download data and put in directory using custom function getData(fileURL, fileName, subDir, compressed = TRUE) # Load required files into R # Get file names fileNames = list.files(subDir, full.names = TRUE, recursive = TRUE) # Specify required files requiredFiles = c("file1.txt", "file2.txt", "file3.txt") # Get required files from the list of files using custom function loadData(fileNames, requiredFiles, envir = .GlobalEnv)
Created by Pretty R at inside-R.org
Here are the two custom functions:
# Get data function getData = function(fileURL, fileName, mainDir = getwd(), subDir = ".", compressed = FALSE, OS = "Unspecified") { # If the data doesn't exist download it and setup a folder to store it if (!file.exists(file.path(mainDir, subDir, fileName))) { print("Downloading data and creating file directory...") method = switch(tolower(OS), "windows" = "internal", "mac" = "curl", "lynx" = "wget", "auto") dir.create(file.path(mainDir, subDir), showWarnings = FALSE) filePath = file.path(mainDir, subDir, fileName) download.file(fileURL, destfile = filePath, method) if (compressed) unzip(filePath, exdir = subDir) } else { print("Files already download. Ready to read into R memory.") } } # Load data function loadData = function(fileNames, requiredFiles, envir = environment()) { numFiles = length(requiredFiles) pb = txtProgressBar(min = 0, max = numFiles, style = 3) # Load each file into R print("Loading files into memory. Please wait.") for (file in 1:numFiles) { elem = requiredFiles[file] filePath = paste0("/",elem) R_Object = gsub(".txt", "", elem) assign(R_Object, read.table(fileNames[grep(filePath, fileNames)], header=FALSE), envir) setTxtProgressBar(pb, file) } close(pb) print("Files loaded.") }