The most important and primary step in Data Analysis is gathering data from all possible sources (Primary or Secondary). Data can be available in all sorts of formats ranging from flat files like (.txt,.csv) to exotic file formats like excel. These files may be stored locally in your system or in your working directory. Packages like utils of Base R, readR, data.table, XLconnect can be used to expose some very important methods to access such locally saved files.

But there may be a scenario where those files are stored at some remote server (location) . Also the data is no longer present in expected file formats like .txt, .csv, .excel. In such cases, the most common format in which data is stored on the Web can be json, xml, html. This is where Accessing Web data in R comes in picture.

We refer such data as Web data and the exposed file path which is nothing but the url to access the Web data is referred to as an API. So when want to access and work on Web Data in our R studio we invoke/consume the corresponding API using HTTP clients in R.

HTTP: Hypertext Transfer Protocol (HTTP) is designed to enable communications between clients and servers. There are many possible HTTP methods used to consume an API, but below are the most commonly used:

Assume this is our base URL (API)

https://reqres.in/api/users

Types of URLS (based on how we send data as query parameters to the API). Directory-based url (separated by “/”). The path looks very similar to our local system file path.

https://reqres.in/api/users/pageid/1

Where pageid is the key of the query parameter and 1 is the value of that key. This API will fetch all data from users table where pageid is 1.

Parameter-based URL. The url contains key value pairs saprated by “&”.

https://reqres.in/api/users?pageid=1&userid=5

Where pageid, userid are keys and 1 and 5 are their respective values.

When it comes to R to consume such APIS we focus majorly on the package below:

Import all required libraries

# This package is required for Accessing APIS (HTTP or HTTPS URLS from Web)
library(httr)
#This package exposes some additional functions to convert json/text to data frame
library(rlist)
#This package exposes some additional functions to convert json/text to data frame
library(jsonlite)
#This library is used to manipulate data
library(dplyr)
resp<-GET("https://reqres.in/api/users?pageid=2")
#.When we get the response from API we will use to very basic methods of httr.
http_type(resp)  #. This method will tell us what is the type of response fetched from GET() call to the API.
## [1] "application/json"
http_error(resp) #. This method just verifies if the response is error free for processing
## [1] FALSE

Now as we can see the API is parameter based and it expects a query parameter. Initially, we added query parameter inside the URL. But now we will separately supply the query parameter in form of a list in query argument of GET method.

query<-list(page="2")
resp<-GET("https://reqres.in/api/users",query=query)
http_type(resp)
## [1] "application/json"
http_error(resp)
## [1] FALSE
# Shows raw data which is not structured and readable
jsonRespText<-content(resp,as="text") 
jsonRespText
## [1] "{\"page\":2,\"per_page\":3,\"total\":12,\"total_pages\":4,\"data\":[{\"id\":4,\"first_name\":\"Eve\",\"last_name\":\"Holt\",\"avatar\":\"https://s3.amazonaws.com/uifaces/faces/twitter/marcoramires/128.jpg\"},{\"id\":5,\"first_name\":\"Charles\",\"last_name\":\"Morris\",\"avatar\":\"https://s3.amazonaws.com/uifaces/faces/twitter/stephenmoon/128.jpg\"},{\"id\":6,\"first_name\":\"Tracey\",\"last_name\":\"Ramos\",\"avatar\":\"https://s3.amazonaws.com/uifaces/faces/twitter/bigmancho/128.jpg\"}]}"
# Structurised data in form of R vectors and lists
jsonRespParsed<-content(resp,as="parsed") 
jsonRespParsed
## $page
## [1] 2
## 
## $per_page
## [1] 3
## 
## $total
## [1] 12
## 
## $total_pages
## [1] 4
## 
## $data
## $data[[1]]
## $data[[1]]$id
## [1] 4
## 
## $data[[1]]$first_name
## [1] "Eve"
## 
## $data[[1]]$last_name
## [1] "Holt"
## 
## $data[[1]]$avatar
## [1] "https://s3.amazonaws.com/uifaces/faces/twitter/marcoramires/128.jpg"
## 
## 
## $data[[2]]
## $data[[2]]$id
## [1] 5
## 
## $data[[2]]$first_name
## [1] "Charles"
## 
## $data[[2]]$last_name
## [1] "Morris"
## 
## $data[[2]]$avatar
## [1] "https://s3.amazonaws.com/uifaces/faces/twitter/stephenmoon/128.jpg"
## 
## 
## $data[[3]]
## $data[[3]]$id
## [1] 6
## 
## $data[[3]]$first_name
## [1] "Tracey"
## 
## $data[[3]]$last_name
## [1] "Ramos"
## 
## $data[[3]]$avatar
## [1] "https://s3.amazonaws.com/uifaces/faces/twitter/bigmancho/128.jpg"

Convert JSON reponse which is in text format to data frame using jsonlite package

fromJSON(jsonRespText)
## $page
## [1] 2
## 
## $per_page
## [1] 3
## 
## $total
## [1] 12
## 
## $total_pages
## [1] 4
## 
## $data
##   id first_name last_name
## 1  4        Eve      Holt
## 2  5    Charles    Morris
## 3  6     Tracey     Ramos
##                                                                avatar
## 1 https://s3.amazonaws.com/uifaces/faces/twitter/marcoramires/128.jpg
## 2  https://s3.amazonaws.com/uifaces/faces/twitter/stephenmoon/128.jpg
## 3    https://s3.amazonaws.com/uifaces/faces/twitter/bigmancho/128.jpg

We can extract required columns from parsed response of JSON and create our data frame using dplyr and base R packages.

modJson<-jsonRespParsed$data #. Access data element of whole list and ignore other vectors
modJson
## [[1]]
## [[1]]$id
## [1] 4
## 
## [[1]]$first_name
## [1] "Eve"
## 
## [[1]]$last_name
## [1] "Holt"
## 
## [[1]]$avatar
## [1] "https://s3.amazonaws.com/uifaces/faces/twitter/marcoramires/128.jpg"
## 
## 
## [[2]]
## [[2]]$id
## [1] 5
## 
## [[2]]$first_name
## [1] "Charles"
## 
## [[2]]$last_name
## [1] "Morris"
## 
## [[2]]$avatar
## [1] "https://s3.amazonaws.com/uifaces/faces/twitter/stephenmoon/128.jpg"
## 
## 
## [[3]]
## [[3]]$id
## [1] 6
## 
## [[3]]$first_name
## [1] "Tracey"
## 
## [[3]]$last_name
## [1] "Ramos"
## 
## [[3]]$avatar
## [1] "https://s3.amazonaws.com/uifaces/faces/twitter/bigmancho/128.jpg"
#Using dplyr and base R
modJson%>%bind_rows%>%select(id,first_name,last_name,avatar)
## # A tibble: 3 x 4
##      id first_name last_name avatar                                       
##   <int> <chr>      <chr>     <chr>                                        
## 1     4 Eve        Holt      https://s3.amazonaws.com/uifaces/faces/twitt~
## 2     5 Charles    Morris    https://s3.amazonaws.com/uifaces/faces/twitt~
## 3     6 Tracey     Ramos     https://s3.amazonaws.com/uifaces/faces/twitt~

Using rList package. Since we can see our data is converted into in form of list we use list.select and list.stack to filter columns and create a tibble respectively.

list.select(modJson,id,first_name)
## [[1]]
## [[1]]$id
## [1] 4
## 
## [[1]]$first_name
## [1] "Eve"
## 
## 
## [[2]]
## [[2]]$id
## [1] 5
## 
## [[2]]$first_name
## [1] "Charles"
## 
## 
## [[3]]
## [[3]]$id
## [1] 6
## 
## [[3]]$first_name
## [1] "Tracey"
list.stack(modJson)
##   id first_name last_name
## 1  4        Eve      Holt
## 2  5    Charles    Morris
## 3  6     Tracey     Ramos
##                                                                avatar
## 1 https://s3.amazonaws.com/uifaces/faces/twitter/marcoramires/128.jpg
## 2  https://s3.amazonaws.com/uifaces/faces/twitter/stephenmoon/128.jpg
## 3    https://s3.amazonaws.com/uifaces/faces/twitter/bigmancho/128.jpg

Results obtained from dplyr, base R and rlist packages are very similar.

post_result <- POST(url="http://httpbin.org/post",body="this is a test") # where body argument accpets data we wish to send to server

Note: All APIs used in the example above are OPEN APIS.

References