To create a new variable or to transform an old variable into a new one, usually, is a simple task in R.
The common function to use is newvariable <- oldvariable
. Variables are always added horizontally in a data frame. Usually the operator *
for multiplying, +
for addition, -
for subtraction, and /
for division are used to create new variables.
Let create a dataset:
hospital <- c("New York", "California") patients <- c(150, 350) costs <- c(3.1, 2.5) df <- data.frame(hospital, patients, costs) Copy
The dataset we created is called df
:
df hospital patients costs New York 150 3.1 California 350 2.5 Copy
Now we will create a new variable called totcosts
as showing below:
df$totcosts <- df$patients * df$costs Copy
Let see the dataset again:
df hospital patients costs totcosts New York 150 3.1 465 California 350 2.5 875 Copy
Now we are interested to rename and recode a variable in R.
Using dataset above we rename the variable:
df$costs_euro <- df$costs Copy
Or we can also delete the variable by using command NULL
:
df$costs <- NULL Copy
Now we see the dataset again:
df hospital patients costs_euro New York 150 3.1 California 350 2.5 Copy
Here is an example how to recode variable patients:
df$patients <- ifelse(df$patients==150, 100, ifelse(df$patients==350, 300, NA))Copy
Let see the dataset again:
df hospital patients costs New York 100 3.1 California 300 2.5 Copy
For recoding variable I used the function ifelse()
, but you can use other functions as well.
Merging datasets
Merging datasets means to combine different datasets into one. If datasets are in different locations, first you need to import in R as we explained previously. You can merge columns, by adding new variables; or you can merge rows, by adding observations.
To add columns use the function merge()
which requires that datasets you will merge to have a common variable. In case that datasets doesn't have a common variable use the function cbind
. However, for the function cbind
is necessary that both datasets to be in same order.
Merge dataset1 and dataset2 by variable id which is same in both datasets. Using the code below we are adding new columns:
finaldt <- merge(dataset1, dataset2, by="id") Copy
Or we can merge datasets by adding columns when we know that both datasets are correctly ordered:
finaldt <- cbind(dataset1, dataset2) Copy
To add rows use the function rbind
. When you merge datasets by rows is important that datasets have exactly the same variable names and the same number of variables.
Here an example merge datasets by adding rows
finaldt <- rbind(dataset1, dataset2) Copy
Do you have any questions, post comment below?