After you successfully uploaded the data into R, I would recommend doing 5 checks before you start with analysis. In this example I’m using data-set iris
which is a pre-loaded in R
Dimension
Check the dimensions (i.e number of rows and columns) of your dataset by using function dim()
.
dim(iris) 150 5
Names
Identify the names of your variables in dataset.
names(iris) "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
Structure
To get information about the structure of dataset (i.e if variable is numeric or factor).
str(iris) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..
Header
Look the header of your dataset to get information about the variables and their values.
head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa
Missings
Look for missing data. In addition we will use the function sum
and mean
to summarize all the missings.
sum(is.na(iris$Sepal.Length)) mean(is.na(iris$Sepal.Length)) 0 0
Those are the first thing I usually do after I load a dataset in R.