News headlines text analysis

In the present tutorial, I show an introductory text analysis of a ABC-news news headlines dataset. I will have a look to the most common words therein present and run a sentiment analysis on those headlines by taking advantage of the following sentiment lexicons:

NRC

Bing

AFINN

The NRC sentiment lexicon from Saif Mohammad and Peter Turney categorizes words into categories of positive, negative, anger, anticipation, disgust, fear joy, sadness, surprise and trust.

The Bing sentiment lexicon from Bing Liu and others categorizes words into positive or negative sentiment category.

The AFINN sentiment lexicon from Finn Arup Nielsen assigns words with a score from -5 to 5, with negative scores indicating negative sentiment and positive scores indicating positive sentiment.

For more information about those sentiment lexicons, see references listed out at the bottom.

Packages

I am going to take advantage of the following R packages.

suppressPackageStartupMessages(library(stringr))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(tidytext))
suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(textdata))
suppressPackageStartupMessages(library(widyr))
suppressPackageStartupMessages(library(ggplot2))
Copy

Packages versions are herein listed.

packages <- c("stringr", "dplyr", "tidytext", "tidyr", "textdata", "widyr", "ggplot2")
version <- lapply(packages, packageVersion)
version_c <- do.call(c, version)
data.frame(packages=packages, version = as.character(version_c))
##   packages version
## 1  stringr   1.4.0
## 2    dplyr   0.8.4
## 3 tidytext   0.2.2
## 4    tidyr   1.0.2
## 5 textdata   0.3.0
## 6    widyr   0.1.2
## 7  ggplot2   3.2.1
Copy

Running on Windows-10 the following R language version.

R.version
##                _                           
## platform       x86_64-w64-mingw32          
## arch           x86_64                      
## os             mingw32                     
## system         x86_64, mingw32             
## status                                     
## major          3                           
## minor          5.3                         
## year           2019                        
## month          03                          
## day            11                          
## svn rev        76217                       
## language       R                           
## version.string R version 3.5.3 (2019-03-11)
## nickname       Great Truth
Copy

Note

Before running this code, make sure to have downloaded the lexicon of the sentiments baselines in use by executing the following operations:

get_sentiments("nrc")
get_sentiments("bing")
get_sentiments("afinn")
Copy

and accepting all prescriptions as asked by the interactive menu showing up.

Getting Data

I then download our news dataset containing millions of headlines from:

“https://www.kaggle.com/therohk/million-headlines/downloads/million-headlines.zip/7”

Its uncompression produces the abcnews-date-text.csv file. I load it into the news_data dataset and have a look at.

news_data <- read.csv("abcnews-date-text.csv", header = TRUE, stringsAsFactors = FALSE)
dim(news_data)
## [1] 1103663       2
Copy

head(news_data)
##   publish_date                                      headline_text
## 1     20030219 aba decides against community broadcasting licence
## 2     20030219     act fire witnesses must be aware of defamation
## 3     20030219     a g calls for infrastructure protection summit
## 4     20030219           air nz staff in aust strike for pay rise
## 5     20030219      air nz strike to affect australian travellers
## 6     20030219                  ambitious olsson wins triple jump
Copy

tail(news_data)
##         publish_date                                               headline_text
## 1103658     20171231             stunning images from the sydney to hobart yacht
## 1103659     20171231 the ashes smiths warners near miss liven up boxing day test
## 1103660     20171231                     timelapse: brisbanes new year fireworks
## 1103661     20171231                    what 2017 meant to the kids of australia
## 1103662     20171231            what the papodopoulos meeting may mean for ausus
## 1103663     20171231   who is george papadopoulos the former trump campaign aide
Copy

Token Analysis

It is time to extract the tokens from our dataset. Select the column named as headline_text and unnesting the word tokens determine the following.

news_df <- news_data %>% select(headline_text)
news_tokens <- news_df %>% unnest_tokens(word, headline_text)
head(news_tokens, 10)
##             word
## 1            aba
## 1.1      decides
## 1.2      against
## 1.3    community
## 1.4 broadcasting
## 1.5      licence
## 2            act
## 2.1         fire
## 2.2    witnesses
## 2.3         must
Copy

tail(news_tokens, 10)
##                   word
## 1103662.7        ausus
## 1103663            who
## 1103663.1           is
## 1103663.2       george
## 1103663.3 papadopoulos
## 1103663.4          the
## 1103663.5       former
## 1103663.6        trump
## 1103663.7     campaign
## 1103663.8         aide
Copy

It is interesting to generate and inspect a table reporting how many times each token shows up within the headlines and its proportion with respect the total.

news_tokens_count <- news_tokens %>% count(word, sort = TRUE) %>% mutate(proportion = n / sum(n))
Copy

The top-10 words which appear most.

head(news_tokens_count, 10)
## # A tibble: 10 x 3
##    word        n proportion
##    <chr>   <int>      <dbl>
##  1 to     214201    0.0303 
##  2 in     135981    0.0192 
##  3 for    130239    0.0184 
##  4 of      80759    0.0114 
##  5 on      73037    0.0103 
##  6 over    50306    0.00711
##  7 the     49810    0.00704
##  8 police  35984    0.00509
##  9 at      31723    0.00449
## 10 with    29676    0.00420
Copy

And the ones which appear less frequently:

tail(news_tokens_count, 10)
## # A tibble: 10 x 3
##    word           n  proportion
##    <chr>      <int>       <dbl>
##  1 zweli          1 0.000000141
##  2 zwitkowsky     1 0.000000141
##  3 zydelig        1 0.000000141
##  4 zygar          1 0.000000141
##  5 zygiefs        1 0.000000141
##  6 zylvester      1 0.000000141
##  7 zynga          1 0.000000141
##  8 zyngier        1 0.000000141
##  9 zz             1 0.000000141
## 10 zzz            1 0.000000141
Copy

There is an issue in having doing that way. The issue is that there are words which do not have relevant role in easing the sentiment analysis, the so called stop words. Herein below the stop words wihin our dataset are shown.

data(stop_words)
head(stop_words, 10)
## # A tibble: 10 x 2
##    word        lexicon
##    <chr>       <chr>  
##  1 a           SMART  
##  2 a's         SMART  
##  3 able        SMART  
##  4 about       SMART  
##  5 above       SMART  
##  6 according   SMART  
##  7 accordingly SMART  
##  8 across      SMART  
##  9 actually    SMART  
## 10 after       SMART
Copy

To remove stop words as required, we take advantage of the anti_join operation.

news_tokens_no_sp <- news_tokens %>% anti_join(stop_words)
head(news_tokens_no_sp, 10)
##            word
## 1           aba
## 2       decides
## 3     community
## 4  broadcasting
## 5       licence
## 6           act
## 7          fire
## 8     witnesses
## 9         aware
## 10   defamation
Copy

Then, counting news tokens again after having removed the stop words.

news_tokens_count <- news_tokens_no_sp %>% count(word, sort = TRUE) %>% mutate(proportion = n / sum(n))
head(news_tokens_count, 10)
## # A tibble: 10 x 3
##    word          n proportion
##    <chr>     <int>      <dbl>
##  1 police    35984    0.00673
##  2 govt      16923    0.00317
##  3 court     16380    0.00306
##  4 council   16343    0.00306
##  5 interview 15025    0.00281
##  6 fire      13910    0.00260
##  7 nsw       12912    0.00242
##  8 australia 12353    0.00231
##  9 plan      12307    0.00230
## 10 water     11874    0.00222
Copy

tail(news_tokens_count)
## # A tibble: 6 x 3
##   word          n  proportion
##   <chr>     <int>       <dbl>
## 1 zygiefs       1 0.000000187
## 2 zylvester     1 0.000000187
## 3 zynga         1 0.000000187
## 4 zyngier       1 0.000000187
## 5 zz            1 0.000000187
## 6 zzz           1 0.000000187
Copy

Then, I filter out tokens having more than 8,000 counts.

news_token_over8000 <- news_tokens_count %>% filter(n > 8000) %>% mutate(word = reorder(word, n))
nrow(news_token_over8000)
## [1] 32
Copy

head(news_token_over8000, 10) 
## # A tibble: 10 x 3
##    word          n proportion
##    <fct>     <int>      <dbl>
##  1 police    35984    0.00673
##  2 govt      16923    0.00317
##  3 court     16380    0.00306
##  4 council   16343    0.00306
##  5 interview 15025    0.00281
##  6 fire      13910    0.00260
##  7 nsw       12912    0.00242
##  8 australia 12353    0.00231
##  9 plan      12307    0.00230
## 10 water     11874    0.00222
Copy

tail(news_token_over8000, 10) 
## # A tibble: 10 x 3
##    word         n proportion
##    <fct>    <int>      <dbl>
##  1 day       8818    0.00165
##  2 hospital  8815    0.00165
##  3 car       8690    0.00163
##  4 coast     8411    0.00157
##  5 calls     8401    0.00157
##  6 win       8315    0.00156
##  7 woman     8213    0.00154
##  8 killed    8129    0.00152
##  9 accused   8094    0.00151
## 10 world     8087    0.00151
Copy

It is interesting to show the proportion as per-thousands by means of an histogram plot.

news_token_over8000 %>%  
  ggplot(aes(word, proportion*1000, fill=ceiling(proportion*1000))) +
  geom_col() + xlab(NULL) + coord_flip() + theme(legend.position = "none")
Copy

News Sentiment Analysis

In this paragraph, I focus on each single headline to evaluate its specific sentiment as determined by each lexicon. Hence the output shall determine if each specific headline has got positive or negative sentiment.

head(news_df, 10)
##                                         headline_text
## 1  aba decides against community broadcasting licence
## 2      act fire witnesses must be aware of defamation
## 3      a g calls for infrastructure protection summit
## 4            air nz staff in aust strike for pay rise
## 5       air nz strike to affect australian travellers
## 6                   ambitious olsson wins triple jump
## 7          antic delighted with record breaking barca
## 8   aussie qualifier stosur wastes four memphis match
## 9        aust addresses un security council over iraq
## 10         australia is locked into war timetable opp
Copy

I will analyse only the first 1000 headlines just for computational time reasons. The token list of such is as follows.

news_df_subset <- news_df[1:1000,,drop=FALSE]
tkn_l <- apply(news_df_subset, 1, function(x) { data.frame(headline_text=x, stringsAsFactors = FALSE) %>% unnest_tokens(word, headline_text)})
Copy

Removing the stop words from the token list.

single_news_tokens <- lapply(tkn_l, function(x) {anti_join(x, stop_words)})
Copy

str(single_news_tokens, list.len = 5)
## List of 1000
##  $ 1   :'data.frame':    5 obs. of  1 variable:
##   ..$ word: chr [1:5] "aba" "decides" "community" "broadcasting" ...
##  $ 2   :'data.frame':    5 obs. of  1 variable:
##   ..$ word: chr [1:5] "act" "fire" "witnesses" "aware" ...
##  $ 3   :'data.frame':    4 obs. of  1 variable:
##   ..$ word: chr [1:4] "calls" "infrastructure" "protection" "summit"
##  $ 4   :'data.frame':    7 obs. of  1 variable:
##   ..$ word: chr [1:7] "air" "nz" "staff" "aust" ...
##  $ 5   :'data.frame':    6 obs. of  1 variable:
##   ..$ word: chr [1:6] "air" "nz" "strike" "affect" ...
##   [list output truncated]
Copy

As we can see, to each headline is associated a list of tokens. The sentiment of a headline is computed as based on the sum of positive/negative score of each token of.

single_news_tokens[[1]]
##           word
## 1          aba
## 2      decides
## 3    community
## 4 broadcasting
## 5      licence
Copy

Bing lexicon

In this paragraph, the computation of the sentiment associated to the tokens list is shown for Bing lexicon. I first define a function named as compute_sentiment() whose purpose is to output the positiveness score of a specific headline.

compute_sentiment <- function(d) {
  if (nrow(d) == 0) {
    return(NA)
  }
  neg_score <- d %>% filter(sentiment=="negative") %>% nrow()
  pos_score <- d %>% filter(sentiment=="positive") %>% nrow()
  pos_score - neg_score
} 
Copy

The inner join on bing lexicon of each single headline tokens list is given as input to the compute_sentiment() function to determine the sentiment score of each specific headline.

sentiments_bing <- get_sentiments("bing")
str(sentiments_bing)
## Classes 'tbl_df', 'tbl' and 'data.frame':    6786 obs. of  2 variables:
##  $ word     : chr  "2-faces" "abnormal" "abolish" "abominable" ...
##  $ sentiment: chr  "negative" "negative" "negative" "negative" ...
Copy

single_news_sentiment_bing <- sapply(single_news_tokens, function(x) { x %>% inner_join(sentiments_bing) %>% compute_sentiment()})
Copy

The result is a vector of integers each element value at i-th position is the sentiment associated to the i-th news

str(single_news_sentiment_bing)
##  Named int [1:1000] NA -1 1 -1 -1 2 0 NA NA NA ...
##  - attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...
Copy

Here is the summary, please note that:

the median is negative
NA's show up

summary(single_news_sentiment_bing)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -3.000  -1.000  -1.000  -0.475   1.000   2.000     520
Copy

Collecting the resulting in a data frame as follows.

single_news_sentiment_bing_df <- data.frame(headline_text=news_df_subset$headline_text, score = single_news_sentiment_bing)
head(single_news_sentiment_bing_df, 10)
##                                         headline_text score
## 1  aba decides against community broadcasting licence    NA
## 2      act fire witnesses must be aware of defamation    -1
## 3      a g calls for infrastructure protection summit     1
## 4            air nz staff in aust strike for pay rise    -1
## 5       air nz strike to affect australian travellers    -1
## 6                   ambitious olsson wins triple jump     2
## 7          antic delighted with record breaking barca     0
## 8   aussie qualifier stosur wastes four memphis match    NA
## 9        aust addresses un security council over iraq    NA
## 10         australia is locked into war timetable opp    NA
Copy

NRC lexicon

In this paragraph, the computation of the sentiment associated to the tokens list is shown for NRC lexicon. With respect the previous analysis based on bing lexicon, some more pre-processing is needed as explained in what follows. First we get the NRC sentiment lexicon and see what are the sentiments threin present.

sentiments_nrc <- get_sentiments("nrc")
(unique_sentiments_nrc <- unique(sentiments_nrc$sentiment))
##  [1] "trust"        "fear"         "negative"     "sadness"      "anger"        "surprise"    
##  [7] "positive"     "disgust"      "joy"          "anticipation"
Copy

To have as output a positive/negative sentiment result, I define a mapping of abovelisted sentiments to a positive/negative string result as follows.

compute_pos_neg_sentiments_nrc <- function(the_sentiments_nrc) {
  s <- unique(the_sentiments_nrc$sentiment)
  df_sentiments <- data.frame(sentiment = s, 
                              mapped_sentiment = c("positive", "negative", "negative", "negative",
                                                    "negative", "positive", "positive", "negative", 
                                                    "positive", "positive"))
  ss <- sentiments_nrc %>% inner_join(df_sentiments)
  the_sentiments_nrc$sentiment <- ss$mapped_sentiment
  the_sentiments_nrc
}

nrc_sentiments_pos_neg_scale <- compute_pos_neg_sentiments_nrc(sentiments_nrc)
Copy

Above function is used to produce the single headline text sentiment results. Such result is given as input to the compute_sentiment() function.

single_news_sentiment_nrc <- sapply(single_news_tokens, function(x) { x %>% inner_join(nrc_sentiments_pos_neg_scale) %>% compute_sentiment()})
Copy

str(single_news_sentiment_nrc)
##  Named int [1:1000] 1 -4 1 2 -2 2 4 NA 5 -2 ...
##  - attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...
Copy

Here is the summary, please note that:

the median is equal to zero
NA's show up

summary(single_news_sentiment_nrc)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -9.0000 -2.0000  0.0000 -0.3742  2.0000  9.0000     257
Copy

single_news_sentiment_nrc_df <- data.frame(headline_text=news_df_subset$headline_text, score = single_news_sentiment_nrc)
head(single_news_sentiment_nrc_df, 10)
##                                         headline_text score
## 1  aba decides against community broadcasting licence     1
## 2      act fire witnesses must be aware of defamation    -4
## 3      a g calls for infrastructure protection summit     1
## 4            air nz staff in aust strike for pay rise     2
## 5       air nz strike to affect australian travellers    -2
## 6                   ambitious olsson wins triple jump     2
## 7          antic delighted with record breaking barca     4
## 8   aussie qualifier stosur wastes four memphis match    NA
## 9        aust addresses un security council over iraq     5
## 10         australia is locked into war timetable opp    -2
Copy

AFINN lexicon

In this paragraph, the computation of the sentiment associated to the tokens list is shown for AFINN lexicon.

sentiments_afinn <- get_sentiments("afinn")
colnames(sentiments_afinn) <- c("word", "sentiment")
str(sentiments_afinn)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 2477 obs. of  2 variables:
##  $ word     : chr  "abandon" "abandoned" "abandons" "abducted" ...
##  $ sentiment: num  -2 -2 -2 -2 -2 -2 -3 -3 -3 -3 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   word = col_character(),
##   ..   value = col_double()
##   .. )
Copy

As we can see, the afinn lexicon provides a score for each token. We just need to sum up each headline tokens score to obtain the sentiment score of the headline under analysis.

single_news_sentiment_afinn_df <- lapply(single_news_tokens, function(x) { x %>% inner_join(sentiments_afinn)})
single_news_sentiment_afinn <- sapply(single_news_sentiment_afinn_df, function(x) { 
      ifelse(nrow(x) > 0, sum(x$sentiment), NA)
  })
Copy

str(single_news_sentiment_afinn)
##  Named num [1:1000] NA -2 NA -2 -1 6 3 NA NA -2 ...
##  - attr(*, "names")= chr [1:1000] "1" "2" "3" "4" ...
Copy

Here is the summary, please note that:

the median is negative
NA's show up

summary(single_news_sentiment_afinn)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  -9.000  -3.000  -2.000  -1.148   1.000   7.000     508
Copy

single_news_sentiment_afinn_df <- data.frame(headline_text=news_df_subset$headline_text, score = single_news_sentiment_afinn)
head(single_news_sentiment_afinn_df, 10)
##                                         headline_text score
## 1  aba decides against community broadcasting licence    NA
## 2      act fire witnesses must be aware of defamation    -2
## 3      a g calls for infrastructure protection summit    NA
## 4            air nz staff in aust strike for pay rise    -2
## 5       air nz strike to affect australian travellers    -1
## 6                   ambitious olsson wins triple jump     6
## 7          antic delighted with record breaking barca     3
## 8   aussie qualifier stosur wastes four memphis match    NA
## 9        aust addresses un security council over iraq    NA
## 10         australia is locked into war timetable opp    -2
Copy

Comparing results

Having obtained for each news three potential results as sentiment evaluation, we would like to compare their congruency.
As congruence we mean the fact that all three lexicons express the same positive or negative result, in other words the same score sign indipendently from its magnitude. If NA values are present, the congruence shall be computed until at least two non NA values are available, otherwise is equal to NA.

Furthermore we compute the final news sentiment as based upon the sum of each lexicon sentiment score.

compute_congruence <- function(x,y,z) {
  v <- c(sign(x), sign(y), sign(z))
  # if only one lexicon reports the score, we cannot check for congruence
  if (sum(is.na(v)) >= 2) {
    return (NA)
  }
  # removing NA and zero value
  v <- na.omit(v)
  v_sum <- sum(v)
  abs(v_sum) == length(v)
}

compute_final_sentiment <- function(x,y,z) {
  if (is.na(x) && is.na(y) && is.na(z)) {
    return (NA)
  }

  s <- sum(x, y, z, na.rm=TRUE)
  # positive sentiments have score strictly greater than zero
  # negative sentiments have score strictly less than zero
  # neutral sentiments have score equal to zero 
  ifelse(s > 0, "positive", ifelse(s < 0, "negative", "neutral"))
}

news_sentiments_results <- data.frame(headline_text = news_df_subset$headline_text, 
                                      bing_score = single_news_sentiment_bing, 
                                      nrc_score = single_news_sentiment_nrc, 
                                      afinn_score = single_news_sentiment_afinn,
                                      stringsAsFactors = FALSE)

news_sentiments_results <- news_sentiments_results %>% rowwise() %>% 
  mutate(final_sentiment = compute_final_sentiment(bing_score, nrc_score, afinn_score),
         congruence = compute_congruence(bing_score, nrc_score, afinn_score))

head(news_sentiments_results, 40)
## Source: local data frame [40 x 6]
## Groups: <by row>
## 
## # A tibble: 40 x 6
##    headline_text                           bing_score nrc_score afinn_score final_sentiment congruence
##    <chr>                                        <int>     <int>       <dbl> <chr>           <lgl>     
##  1 aba decides against community broadcas~         NA         1          NA positive        NA        
##  2 act fire witnesses must be aware of de~         -1        -4          -2 negative        TRUE      
##  3 a g calls for infrastructure protectio~          1         1          NA positive        TRUE      
##  4 air nz staff in aust strike for pay ri~         -1         2          -2 negative        FALSE     
##  5 air nz strike to affect australian tra~         -1        -2          -1 negative        TRUE      
##  6 ambitious olsson wins triple jump                2         2           6 positive        TRUE      
##  7 antic delighted with record breaking b~          0         4           3 positive        FALSE     
##  8 aussie qualifier stosur wastes four me~         NA        NA          NA <NA>            NA        
##  9 aust addresses un security council ove~         NA         5          NA positive        NA        
## 10 australia is locked into war timetable~         NA        -2          -2 negative        TRUE      
## # ... with 30 more rows
Copy

Is would be useful to replace the numeric score with same {negative, neutral, positive} scale.

replace_score_with_sentiment <- function(v_score) {
  v_score[v_score > 0] <- "positive"
  v_score[v_score < 0] <- "negative"
  v_score[v_score == 0] <- "neutral"
  v_score
} 

news_sentiments_results$bing_score <- replace_score_with_sentiment(news_sentiments_results$bing_score)
news_sentiments_results$nrc_score <- replace_score_with_sentiment(news_sentiments_results$nrc_score)
news_sentiments_results$afinn_score <- replace_score_with_sentiment(news_sentiments_results$afinn_score)

news_sentiments_results[,2:5] <- lapply(news_sentiments_results[,2:5], as.factor)

head(news_sentiments_results, 40)
## Source: local data frame [40 x 6]
## Groups: <by row>
## 
## # A tibble: 40 x 6
##    headline_text                           bing_score nrc_score afinn_score final_sentiment congruence
##    <chr>                                   <fct>      <fct>     <fct>       <fct>           <lgl>     
##  1 aba decides against community broadcas~ <NA>       positive  <NA>        positive        NA        
##  2 act fire witnesses must be aware of de~ negative   negative  negative    negative        TRUE      
##  3 a g calls for infrastructure protectio~ positive   positive  <NA>        positive        TRUE      
##  4 air nz staff in aust strike for pay ri~ negative   positive  negative    negative        FALSE     
##  5 air nz strike to affect australian tra~ negative   negative  negative    negative        TRUE      
##  6 ambitious olsson wins triple jump       positive   positive  positive    positive        TRUE      
##  7 antic delighted with record breaking b~ neutral    positive  positive    positive        FALSE     
##  8 aussie qualifier stosur wastes four me~ <NA>       <NA>      <NA>        <NA>            NA        
##  9 aust addresses un security council ove~ <NA>       positive  <NA>        positive        NA        
## 10 australia is locked into war timetable~ <NA>       negative  negative    negative        TRUE      
## # ... with 30 more rows
Copy

Tabularizations of each lexicon resulting sentiment and final sentiments are herein shown.

table(news_sentiments_results$bing_score, news_sentiments_results$final_sentiment, dnn = c("bing", "final"))
##           final
## bing       negative neutral positive
##   negative      278      15       14
##   neutral        16       6       11
##   positive        6       7      127
Copy

table(news_sentiments_results$nrc_score, news_sentiments_results$final_sentiment, dnn = c("nrc", "final"))
##           final
## nrc        negative neutral positive
##   negative      353      10        4
##   neutral        18      13        6
##   positive       25      16      298
Copy

table(news_sentiments_results$afinn_score, news_sentiments_results$final_sentiment, dnn = c("afinn", "final"))
##           final
## afinn      negative neutral positive
##   negative      326      10       12
##   neutral         3       1        6
##   positive        4       9      121
Copy

Tabularization of congruence and final sentiments is herein shown.

table(news_sentiments_results$congruence, news_sentiments_results$final_sentiment, dnn = c("congruence", "final"))
##           final
## congruence negative neutral positive
##      FALSE       67      33       45
##      TRUE       292       0      132
Copy

Conclusions

We analyzed the news headlines to determine their sentiments while taking advantage of three sentiments lexicons. We outlined some basics of the methodology for such purpose. We also had the chance to compare the results obtained by means of all three lexicons and set forth a final sentiment evaluation. If you are interested in understanding much more about text analysis, see ref. [4].

References

[1] NRC sentiment lexicon
[2] BING sentiment lexicon
[3] AFINN sentiment lexicon
[4] Text mining with R

Data ManagementText Mining