DataScience+ An online community for showcasing R & Python tutorials. It operates as a networking platform for data scientists to promote their talent and get hired. Our mission is to empower data scientists by bridging the gap between talent and opportunity.
Getting Data

First Things to Do After You Import the Data into R

After you successfully uploaded the data into R, I would recommend to do 5 checks before you start with analysis. In this example I’m using data-set iris which is a pre-loaded in R

Dimension

Check the dimensions (i.e number of rows and columns) of your dataset by using function dim().

dim(iris)
150   5

Names

Identify the names of your variables in dataset.

names(iris) 
"Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

Structure

To get information about the structure of dataset (i.e if variable is numeric or factor).

str(iris)
'data.frame':	150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..

Header

Look the header of your dataset to get information about the variables and their values.

head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa

Missings

Look for missing data. In addition we will use the function sum and mean to summarize all the missings.

sum(is.na(iris$Sepal.Length))
mean(is.na(iris$Sepal.Length))
0
0

Those are first thing I usually do after I load a dataset in R.