DataScience+ An online community for showcasing R & Python tutorials. It operates as a networking platform for data scientists to promote their talent and get hired. Our mission is to empower data scientists by bridging the gap between talent and opportunity.
Data Management

Working on Data-Warehouse (SQL) with R

  • Published on July 13, 2016 at 1:23 pm
  • Updated on April 28, 2017 at 6:26 pm

SQL being the most popular used query language for deep diving from small data to so called big data. However, there are many other languages like R which are growing in the user community for stats and graphical methodology models and plot data to match the data scientists mind.

R as an Analytic Application

R can be used for Data Warehousing and BI applications but R is not commonly being used as a platform for Data Warehouse; instead it works as a mediator for performing focused analysis with database managements.
Experts say that R handles everything in memory, which implies that it might be difficult to use it for expansive volumes of data.

R offers a large number of interfaces to apps such as Excel, SAS etc. which enable users to have a rich experience of the app integrated with the interface. Citing an experts comment on this – “Typically, I see people running their data warehouses in other packages and then using R to perform high-quality, focused analysis”.

R has compatibility with many packages like PL/R, Dplyr, plyr etc. Below we will see how Dplyr works.

Dplyr – what is Dplyr? (Just for the crazy minds)

Dplyr is a package which provides tools for manipulating datasets in R. Dplyr is the next iteration of plyr. Dplyr differs greatly from plyr for its easy, intuitive syntax. The major difference in Dplyr is the usage of the operator (%.%), that allows the code to be read in the opposite direction (left to right). If your usage is more towards explorations or if you want to collaborate your data with other data structures you could choose Dplyr.

Trying data with dplyr – Github


For examples have the data packages installed

install.packages(c("nycflights13", "Lahman"))

To learn dyplr please refer to link

vignette("introduction", package = "dplyr")

— Source github

Working on R with PostgreSQL

In terms of Postgresql it’s commonly called PL/R or madlib as a language for many databases.

To install packages (postgres SQL) for R

 package ‘RPostgreSQL’ successfully unpacked and MD5 sums checked 

Connect to DB (DW):

drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname="postgres", host="", user="gpadmin",password="gpadmin",port="5432")

Experimenting with Sample retrieval

data <- dbGetQuery(con, "select datid,datname from pg_catalog.pg_stat_activity;")
[1] 94888776

Clearing the result

Function: dbClearResult (package DBI)

What is the Future of R?

When it comes to general data warehousing and BI tasks R as a solution makes it through many companies. Organizations prefer to take R as it’s easy to add existing IT environments or to add them on the current codes which can be used for special analytics in intensive projects. R lacks in state-of-the-art GUI. Hence, users prefer to use more interactive and easy-to-use front ends for R packages.

R has growth potential with more number of packages that help statisticians and analysts use these packages in more interactive ways hence enabling R to answer their research and business goals.