In this post, I will describe how to import data from CSV and Excel files into R. First, I prepared my two sample datasets, one as a CSV and the other as an Excel file on my personal computer. Download: csv file and Excel file.

Preparing the dataset is a primary step to import the data fast and efficiently. Therefore, there are some practices we all need to follow to avoid issues while importing :

The missing values in the data set need to be indicated with NA.
It is best to avoid blank spaces between words; instead, we can use “_” to connect them; otherwise, R will identify them as different variables.
Use short words and try to avoid symbols.

So, once I have my dataset saved in CSV and Excel file in my desktop, I start importing it into R. You should also know the file path in your pc because it is needed when writing down the code.

Let's start loading the 'readr' package.

library("readr")
read_csv("sample1.csv")
## # A tibble: 18 x 6
##    Seq_Number Gender   Age Ethnicity        Blood_Pressure Diabetes
##         <dbl> <chr>  <dbl> <chr>                     <dbl> <chr>   
##  1          1 M         32 Mexican_American            110 yes     
##  2          2 M         35 Black                       120 no      
##  3          3 F         30 White                       135 yes     
##  4          4 M         37 Other                       127 no      
##  5          5 F         37 Multiracial                 100 no      
##  6          6 M         33 Black                       105 no      
##  7          7 M         34 Black                       140 yes     
##  8          8 M         38 Other                       100 no      
##  9          9 F         36 White                       105 no      
## 10         10 F         31 Mexican_American            120 no      
## 11         11 F         35 White                       130 yes     
## 12         12 M         38 Mexican_American            140 yes     
## 13         13 F         33 White                        90 no      
## 14         14 M         39 Black                        98 no      
## 15         15 M         40 Other                        99 no      
## 16         16 F         40 Mexican_American            100 no      
## 17         17 F         32 White                       105 no      
## 18         18 M         37 Black                       110 no

Meanwhile, for the excel file I load the 'readxl' library and then the code as below:

library("readxl")
read_excel("sample1.xlsx")
## # A tibble: 18 x 6
##    Seq_Number Gender   Age Ethnicity        Blood_Pressure Diabetes
##         <dbl> <chr>  <dbl> <chr>                     <dbl> <chr>   
##  1          1 M         32 Mexican_American            110 yes     
##  2          2 M         35 Black                       120 no      
##  3          3 F         30 White                       135 yes     
##  4          4 M         37 Other                       127 no      
##  5          5 F         37 Multiracial                 100 no      
##  6          6 M         33 Black                       105 no      
##  7          7 M         34 Black                       140 yes     
##  8          8 M         38 Other                       100 no      
##  9          9 F         36 White                       105 no      
## 10         10 F         31 Mexican_American            120 no      
## 11         11 F         35 White                       130 yes     
## 12         12 M         38 Mexican_American            140 yes     
## 13         13 F         33 White                        90 no      
## 14         14 M         39 Black                        98 no      
## 15         15 M         40 Other                        99 no      
## 16         16 F         40 Mexican_American            100 no      
## 17         17 F         32 White                       105 no      
## 18         18 M         37 Black                       110 no

Hope it is helpful!