DataScience+ We publish R tutorials from scientists at academic and scientific institutions with a goal to give everyone in the world access to a free knowledge. Our tutorials cover different topics including statistics, data manipulation and visualization!
Getting Data

First Things to Do After You Import the Data into R

After you successfully uploaded the data into R, I would recommend to do 5 checks before you start with analysis. In this example I’m using data-set iris which is a pre-loaded in R

Dimension

Check the dimensions (i.e number of rows and columns) of your dataset by using function dim().

dim(iris)
150   5

Names

Identify the names of your variables in dataset.

names(iris) 
"Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

Structure

To get information about the structure of dataset (i.e if variable is numeric or factor).

str(iris)
'data.frame':	150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..

Header

Look the header of your dataset to get information about the variables and their values.

head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa

Missings

Look for missing data. In addition we will use the function sum and mean to summarize all the missings.

sum(is.na(iris$Sepal.Length))
mean(is.na(iris$Sepal.Length))
0
0

Those are first thing I usually do after I load a dataset in R.