In this post, I will describe how to import data from CSV and Excel files into R. First, I prepared my two sample datasets, one as a CSV and the other as an Excel file on my personal computer. Download: csv file and Excel file.

Preparing the dataset is a primary step to import the data fast and efficiently. Therefore, there are some practices we all need to follow to avoid issues while importing :

The missing values in the data set need to be indicated with NA.
It is best to avoid blank spaces between words; instead, we can use “_” to connect them; otherwise, R will identify them as different variables.
Use short words and try to avoid symbols.

So, once I have my dataset saved in CSV and Excel file in my desktop, I start importing it into R. You should also know the file path in your pc because it is needed when writing down the code.

Let's start loading the 'readr' package.

## # A tibble: 18 x 6
##    Seq_Number Gender   Age Ethnicity        Blood_Pressure Diabetes
##         <dbl> <chr>  <dbl> <chr>                     <dbl> <chr>   
##  1          1 M         32 Mexican_American            110 yes     
##  2          2 M         35 Black                       120 no      
##  3          3 F         30 White                       135 yes     
##  4          4 M         37 Other                       127 no      
##  5          5 F         37 Multiracial                 100 no      
##  6          6 M         33 Black                       105 no      
##  7          7 M         34 Black                       140 yes     
##  8          8 M         38 Other                       100 no      
##  9          9 F         36 White                       105 no      
## 10         10 F         31 Mexican_American            120 no      
## 11         11 F         35 White                       130 yes     
## 12         12 M         38 Mexican_American            140 yes     
## 13         13 F         33 White                        90 no      
## 14         14 M         39 Black                        98 no      
## 15         15 M         40 Other                        99 no      
## 16         16 F         40 Mexican_American            100 no      
## 17         17 F         32 White                       105 no      
## 18         18 M         37 Black                       110 no

Meanwhile, for the excel file I load the 'readxl' library and then the code as below:

## # A tibble: 18 x 6
##    Seq_Number Gender   Age Ethnicity        Blood_Pressure Diabetes
##         <dbl> <chr>  <dbl> <chr>                     <dbl> <chr>   
##  1          1 M         32 Mexican_American            110 yes     
##  2          2 M         35 Black                       120 no      
##  3          3 F         30 White                       135 yes     
##  4          4 M         37 Other                       127 no      
##  5          5 F         37 Multiracial                 100 no      
##  6          6 M         33 Black                       105 no      
##  7          7 M         34 Black                       140 yes     
##  8          8 M         38 Other                       100 no      
##  9          9 F         36 White                       105 no      
## 10         10 F         31 Mexican_American            120 no      
## 11         11 F         35 White                       130 yes     
## 12         12 M         38 Mexican_American            140 yes     
## 13         13 F         33 White                        90 no      
## 14         14 M         39 Black                        98 no      
## 15         15 M         40 Other                        99 no      
## 16         16 F         40 Mexican_American            100 no      
## 17         17 F         32 White                       105 no      
## 18         18 M         37 Black                       110 no

Hope it is helpful!