After you successfully uploaded the data into R, I would recommend doing 5 checks before you start with analysis. In this example I’m using data-set iris
which is a pre-loaded in R
Dimension
Check the dimensions (i.e number of rows and columns) of your dataset by using function dim()
.
dim(iris) 150 5 Copy
Names
Identify the names of your variables in dataset.
names(iris) "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" Copy
Structure
To get information about the structure of dataset (i.e if variable is numeric or factor).
str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",.. Copy
Header
Look the header of your dataset to get information about the variables and their values.
head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa Copy
Missings
Look for missing data. In addition we will use the function sum
and mean
to summarize all the missings.
sum(is.na(iris$Sepal.Length)) mean(is.na(iris$Sepal.Length)) 0 0 Copy
Those are the first thing I usually do after I load a dataset in R.