DataScience+ An online community for showcasing R & Python tutorials. It operates as a networking platform for data scientists to promote their talent and get hired. Our mission is to empower data scientists by bridging the gap between talent and opportunity.
Data Management

Subsetting Datasets in R

  • Published on August 2, 2015 at 10:21 pm
  • Updated on April 28, 2017 at 6:23 pm

Subsetting datasets in R include select and exclude variables or observations. To select variables from a dataset you can use this function dt[,c("x","y")], where dt is the name of dataset and “x” and “y” name of vaiables. To exclude variables from dataset, use same function but with the sign - before the colon number like dt[,c(-x,-y)].

Here an example by using iris dataset:

"Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

We want to select only “Sepal.Length” “Sepal.Width” from the dataset:

dt <- iris[,c("Sepal.Length","Sepal.Width")]

"Sepal.Length" "Sepal.Width" 

Now we may want only to exclude variables 2 and 3:

dt <- iris[,c(-2,-3)]

"Sepal.Length" "Petal.Width" "Species" 

Sometimes you need to exclude observation based on certain condition. For this task the function subset() is used.
For this example we are creating new dataset:

Name <- c("John", "Tim", "Ami")
Sex <- c("men", "men", "women")
Age <- c(45, 53, 35)
dt <- data.frame(Name, Sex, Age)

Here is the dataset called dt:


Name  Sex Age
John  men  45
Tim   men  53
Ami women  35

We want to exclude the women with age > 40 years and will create another dataset called dt2

dt2 <- subset(dt, Age>40&Sex==men)

Name  Sex Age
John  men  45
Tim   men  53

subset() function is broadly used in R programing and datasets. Post comment if you have any question about it.