An online community for showcasing R & Python tutorials. It operates as a networking platform for data scientists to promote their talent and get hired. Our mission is to empower data scientists by bridging the gap between talent and opportunity.
Visualizing Data

# Visualizing MLS Player Salaries with ggplot2

Recently, I came across this great visualization of MLS Player salaries. I tried to do something similar with ggplot2, and while I was unable to replicate the interactivity or the tree-map nature of the graph, the graph still looks pretty cool.

## Data

The data is contained in this pdf file. I obtained a CSV file extracted from the PDF file by using PDFtables.com. The data can be downloaded here.

## Exploratory Analysis

We will need the plyr and ggplot2 libraries for this. Let’s load them up and read in the data. To learn more about ggplot2 read my previous tutorial.

library(plyr)
library(ggplot2)

salary <- read.csv('September 15 2015 Salary Information - Alphabetical.csv', na.strings = '')
1   NY        Abang    Anatole   F $50,000.00$    50,000.00
2   KC Abdul-Salaam       Saad   D $60,000.00$    73,750.00
3  CHI        Accam      David   F $650,000.00$   720,937.50
4  DAL       Acosta     Kellyn   M $60,000.00$    84,000.00
5  VAN     Adekugbe     Samuel   D $60,000.00$    65,000.00
6  POR          Adi    Fanendo   F $651,500.00$   664,000.00
The X and X.1 columns have nothing but the $sign, so we can remove them. Also, the base salary is stored as factor. To convert to numeric, first we have to remove the commas in the data. We can use the gsub function for this. Next, we need to convert it to numeric. However, we cannot directly convert from factor to numeric, because R assigns a factor level to each data variable and if you convert it directly, it will just return that number. The way to convert it without losing information is to first convert it to character and then to numeric. salary$X <- NULL