Have you ever struggled to import hundred of small datasets files? Can be very time consuming or maybe impossible. I was in this situation some time ago when I had a folder with approximately three thousand CSV files, and I was interested in creating a single dataset.

At the time I was thinking to create a for loop for importing each file separately and then to merge all small datasets.

# file1 = read_csv("file1.csv")
# file2 = read_csv("file2.csv")
# file3 = read_csv("file3.csv")

I didn't know how that would work, or even it would be possible to merge 3000 datasets easily. Anyway, I started searching for similar questions, and I don't remember that I found something helpful until I discovered the plyr package. I am happy to share it with you. There are no many codes.

Load the package

library(plyr)
library(readr)

For this post, I created 3 CSV files and put them in a folder (i.e., cvsfolder) in my desktop. You can do the same if you want to replicate this post. I set the directory in R and used the function list.files to list all files in folder with extension CSV.

setwd("~/Desktop")
mydir = "csvfolder"
myfiles = list.files(path=mydir, pattern="*.csv", full.names=TRUE)
myfiles
## [1] "csvfolder/file1.csv" "csvfolder/file2.csv" "csvfolder/file3.csv"

In the R Studio environment, I have only the location of CSV files; no file is uploaded yet. To upload all files and create a dataset will use ldply and applied the read_csv function.

dat_csv = ldply(myfiles, read_csv)
dat_csv
##    a  b  c
## 1  1 34 98
## 2 23 55 10
## 3 43 67  3
## 4 32 21 56
## 5 34 23 57
## 6 31 24 58
## 7 43 65 77
## 8 45 63 78
## 9 57 61 79

Done!

You can apply the same function for importing .txt files as well. The function read.table shall be used for .txt files. See code below:

# dat_txt = ldply(myfiles, read.table, sep = "\t", fill=TRUE, header = TRUE)

Extra

Below I will import each file separately to show that the dataset and variable names correspondent with the dat_csv above.

read_csv(myfiles[1])
## # A tibble: 3 x 3
##       a     b     c
##   <int> <int> <int>
## 1     1    34    98
## 2    23    55    10
## 3    43    67     3
read_csv(myfiles[2])
## # A tibble: 3 x 3
##       a     b     c
##   <int> <int> <int>
## 1    32    21    56
## 2    34    23    57
## 3    31    24    58
read_csv(myfiles[3])
## # A tibble: 3 x 3
##       a     b     c
##   <int> <int> <int>
## 1    43    65    77
## 2    45    63    78
## 3    57    61    79

I hope you learned something new today and share it with your peers. Who knows it may be helpful for someone else.