An online community for showcasing R & Python tutorials. It operates as a networking platform for data scientists to promote their talent and get hired. Our mission is to empower data scientists by bridging the gap between talent and opportunity.
Getting Data

# First Things to Do After You Import the Data into R

• Published on July 31, 2015 at 9:36 pm
• Updated on October 30, 2017 at 3:37 pm

After you successfully uploaded the data into R, I would recommend doing 5 checks before you start with analysis. In this example I’m using data-set iris which is a pre-loaded in R

## Dimension

Check the dimensions (i.e number of rows and columns) of your dataset by using function dim().

dim(iris)
150   5


## Names

Identify the names of your variables in dataset.

names(iris)
"Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"


## Structure

To get information about the structure of dataset (i.e if variable is numeric or factor).

str(iris)
'data.frame':	150 obs. of  5 variables:
$Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...$ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$Species : Factor w/ 3 levels "setosa","versicolor",..  ## Header Look the header of your dataset to get information about the variables and their values. head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa  ## Missings Look for missing data. In addition we will use the function sum and mean to summarize all the missings. sum(is.na(iris$Sepal.Length))
mean(is.na(iris\$Sepal.Length))
0
0


Those are the first thing I usually do after I load a dataset in R.