Subsetting datasets in R include select and exclude variables or observations. To select variables from a dataset you can use this function dt[,c("x","y")]
, where dt
is the name of dataset and “x” and “y” name of vaiables. To exclude variables from dataset, use same function but with the sign -
before the colon number like dt[,c(-x,-y)]
.
Here an example by using iris
dataset:
names(iris) "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
We want to select only “Sepal.Length” “Sepal.Width” from the dataset:
dt <- iris[,c("Sepal.Length","Sepal.Width")] names(dt) "Sepal.Length" "Sepal.Width"
Now we may want only to exclude variables 2 and 3:
dt <- iris[,c(-2,-3)] names(dt) "Sepal.Length" "Petal.Width" "Species"
Sometimes you need to exclude observation based on certain condition. For this task the function subset()
is used.
For this example we are creating new dataset:
Name <- c("John", "Tim", "Ami") Sex <- c("men", "men", "women") Age <- c(45, 53, 35) dt <- data.frame(Name, Sex, Age)
Here is the dataset called dt
:
dt Name Sex Age John men 45 Tim men 53 Ami women 35
We want to exclude the women with age > 40 years and will create another dataset called dt2
dt2 <- subset(dt, Age>40&Sex==men) dt2 Name Sex Age John men 45 Tim men 53
subset()
function is broadly used in R programing and datasets. Post comment if you have any question about it.