The following is a complete tutorial to download macroeconomic data from St. Louis FRED economic databases, draw a scatter plot, perform OLS regression, plot the final chart with regression line and regression statistics, and then save the chart as a PNG file for documentation.
Step 1
Load the necessary packages for this tutorial
# load the necessary packages library(alfred) library(tidyverse) library(Hmisc) library(broom)
Step 2
Define the start and end dates of the analysis
# --- set the designed time period of data for analysis startdate <- "1980-01-01" enddate <- "2018-04-01"
Step 3
Download specific macroeconomic data from FRED St. Louis economic databases and ETL the data. Many other data series can be found at the FRED’s website.
# get unemployment data time series from FRED St. Louis dfunrate <- get_fred_series("UNRATE", "unrate", observation_start = startdate, observation_end = enddate) # get University of Michigan consumer sentiment index data time series from FRED St. Louis dfumcsent <- get_fred_series("UMCSENT", "umcsent", observation_start = startdate, observation_end = enddate) # combine the two time series data into one data frame dfall <- cbind(dfunrate,dfumcsent) # strip or remove redundant month field from data downloaded from FRED St. Louis dfall <- dfall[,c(1,2,4)] # obtain the number of data points in the dataframe mdx <- (1:nrow(dfall)) # convert FRED date field from string to R's date type dfall$date <- as.Date(dfall$date)
Step 4
Perform OLS regression on the macroeconomic dataset
# simple linear regression and output regression statistics into a data frame dffit <- lm(umcsent ~ unrate, data = dfall) summary(dffit) dffitout <- tidy(dffit) Call: lm(formula = umcsent ~ unrate, data = dfall) Residuals: Min 1Q Median 3Q Max -33.593 -4.441 0.732 5.889 25.149 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 117.1734 1.7957 65.25 <2e-16 *** unrate -4.8537 0.2756 -17.61 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 9.68 on 458 degrees of freedom Multiple R-squared: 0.4038, Adjusted R-squared: 0.4025 F-statistic: 310.2 on 1 and 458 DF, p-value: < 2.2e-16
Step 5
Extract regression statistics from the regression model
# obtain OLS fitness measure: adjusted r square and p-value and coefficients dffit.AdjrSquared <- summary(dffit)$adj.r.squared dffit.pVal <- dffitout$p.value[2] dffit.intercept <- dffitout$estimate[1] dffit.slope <- dffitout$estimate[2] dffit.rse <- sigma(dffit)
Step 6
Define the plot’s parameters and labels in one section
# define the plot's default parameters fredseries <- "UNRATESENTIMENT" chart.number <- "Figure 1" chart.title <- paste(chart.number, ". Unemployment Rate vs Consumer Sentiment Index", sep = "", collapse = NULL) chart.subtitle <- "with OLS Regression" chart.caption <- "Source: FRED St. Louis. U.S. Bureau of Labor Statistics. University of Michigan." chart.xlabel <- "Unemployment Rate (%)" chart.ylabel <- "University of Michigan Consumer Sentiment Index" chart.filename <- paste(chart.number," ",fredseries,".png", sep = "", collapse = NULL)
Step 7
Plot the scatter plot and OLS regression line using ggplot
# plot the xy scatter plot and OLS regression line dfplt <- ggplot(dfall, aes(x = unrate, y = umcsent)) + geom_point(fill = NA, shape = 1) + labs( x = chart.xlabel, y = chart.ylabel, title = chart.title, subtitle = chart.subtitle, caption = chart.caption) + geom_smooth(method='lm')
Step 8
Define the x-y coordinates for text annotation to enhance readability
# define the x-y coordinates for text annotations xpos1 <- max(dfall$unrate) * 0.90 xpos2 <- xpos1 xpos3 <- xpos1 xpos4 <- xpos1 ypos1 <- max(dfall$umcsent) * 0.94 ypos2 <- max(dfall$umcsent) * 0.97 ypos3 <- max(dfall$umcsent) * 1.00 ypos4 <- max(dfall$umcsent) * 0.91
Step 9
Annotate with OLS model specifications and plot the final complete chart
# add p-value to the chart dfplt <- dfplt + annotate(geom="text", x=xpos1, y=ypos1, label=paste("p-value = ",as.character(format(dffit.pVal, digits = 4))), color="blue") # add adjusted r square to the chart dfplt <- dfplt + annotate(geom="text", x=xpos2, y=ypos2, label=paste("Adj. R = ",as.character(format(dffit.AdjrSquared, digits = 4))), color="blue") # add OLS equation coefficients to the chart dfplt <- dfplt + annotate(geom="text", x=xpos3, y=ypos3, label=paste("Intercept= ",as.character(format(dffit.intercept, digits = 6))," Slope= ", as.character(format(dffit.slope, digits = 4))), color="blue") # add residual standard error to the chart dfplt <- dfplt + annotate(geom="text", x=xpos4, y=ypos4, label=paste("RSE = ",as.character(format(dffit.rse, digits = 5))), color="blue") # output the final completely composed chart to the console plot area dfplt
Step 10
Save the final plot as a PNG file with a size specified as 800×600
# save the plot into a graphics file with a size defined at 800 x 600 png(filename=chart.filename, width = 800, height = 600) dfplt dev.off()
That is it. This is a 10-step complete tutorial giving researchers new to the world of R programming an introduction to download data from FRED St. Louis economic databases and perform regression with detailed results plotted altogether.