DataScience+ We publish R tutorials from scientists at academic and scientific institutions with a goal to give everyone in the world access to a free knowledge. Our tutorials cover different topics including statistics, data manipulation and visualization!
Data Management

Subsetting Datasets in R

Subsetting datasets in R include select and exclude variables or observations. To select variables from a dataset you can use this function dt[,c("x","y")], where dt is the name of dataset and “x” and “y” name of vaiables. To exclude variables from dataset, use same function but with the sign - before the colon number like dt[,c(-x,-y)].

Here an example by using iris dataset:

"Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

We want to select only “Sepal.Length” “Sepal.Width” from the dataset:

dt <- iris[,c("Sepal.Length","Sepal.Width")]

"Sepal.Length" "Sepal.Width" 

Now we may want only to exclude variables 2 and 3:

dt <- iris[,c(-2,-3)]

"Sepal.Length" "Petal.Width" "Species" 

Sometimes you need to exclude observation based on certain condition. For this task the function subset() is used.
For this example we are creating new dataset:

Name <- c("John", "Tim", "Ami")
Sex <- c("men", "men", "women")
Age <- c(45, 53, 35)
dt <- data.frame(Name, Sex, Age)

Here is the dataset called dt:


Name  Sex Age
John  men  45
Tim   men  53
Ami women  35

We want to exclude the women with age > 40 years and will create another dataset called dt2

dt2 <- subset(dt, Age>40&Sex==men)

Name  Sex Age
John  men  45
Tim   men  53

subset() function is broadly used in R programing and datasets. Post comment if you have any question about it.