We share R tutorials from scientists at academic and scientific institutions with a goal to give everyone in the world access to a free knowledge. Our tutorials cover different topics including statistics, data manipulation and visualization!
Data Management

# How to Deal with Missing Values in R

It might happen that your dataset is not complete, and when information is not available we call it missing values. In R the missing values are coded by the symbol NA. To identify missings in your dataset the function is is.na().

First lets create a small dataset:

Name <- c("John", "Tim", NA)
Sex <- c("men", "men", "women")
Age <- c(45, 53, NA)
dt <- data.frame(Name, Sex, Age)


Here is our dataset called dt:

dt

Name   Sex Age
1 John   men  45
2  Tim   men  53
3  <NA> women  NA


Now will see for missings in the dataset:

is.na(dt)

Name    Sex   Age
FALSE FALSE FALSE
FALSE FALSE FALSE
TRUE  FALSE  TRUE


You also can find the sum and the percentage of missings in your dataset with the code below:

sum(is.na(dt))
2

mean(is.na(dt))
0.2222222


When you import dataset from other statistical applications the missing values might be coded with a number, for example 99. In order to let R know that is a missing value you need to recode it.

dt$Age[dt$Age == 99] <- NA


Another useful function in R to deal with missing values is na.omit() which delete incomplete observations.

Let see another example, by creating first another small dataset:

Name <- c("John", "Tim", NA)
Sex <- c("men", NA, "women")
Age <- c(45, 53, NA)
dt <- data.frame(Name, Sex, Age)


Here is the dataset, called again dt:

dt

Name Sex Age
John men  45
Tim  <NA>  53
<NA> women NA


Now will use the function to remove the missings

na.omit(dt)

Name Sex Age
John men  45

This was introduction for dealing with missings values. To learn how to impute missing data please read this post.