In the previous part of this tutorial, we build several models based on logistic regression. One aspect to be further considered is the decision threshold tuning that may help in reaching a better accuracy. We are going to show a procedure able to determine an optimal value for such purpose. ROC plots will be introduced as well.

Decision Threshold Tuning

As default, caret uses 0.5 as threshold decision value to be used as probability cutoff value. We are going to show how to tune such decision threshold to achieve a better training set accuracy. We then compare how the testing set accuracy changes accordingly.

Tuning Analysis

We load previously saved training, testing datasets together with the models we have already determined.

suppressPackageStartupMessages(library(caret))
suppressPackageStartupMessages(library(ROCR))
set.seed(1023)
readRDS(file="wf_log_reg_part2.rds")

The following tuning procedure explores a pool of cutoff values giving back a table reporting {cutoff, accuracy, specificity} triplets. Its inspection allows to identify the cutoff value providing with the highest accuracy. In case of a tie in accuracy value, we choose the lowest cutoff value of such.

glm.tune <- function(glm_model, dataset) {
   results <- data.frame()
   for (q in seq(0.2, 0.8, by = 0.02)) {
      fitted_values <- glm_model$finalModel$fitted.values
      prediction <- as.factor(ifelse(fitted_values >= q, "Yes", "No"))                
      cm <- confusionMatrix(prediction, dataset$RainTomorrow)
      accuracy <- cm$overall["Accuracy"]
      specificity <- cm$byClass["Specificity"]
      results <- rbind(results, data.frame(cutoff=q, accuracy=accuracy, specificity = specificity))
   }
   rownames(results) <- NULL
   results
}

In the utility function above, we included also the specificity metrics. Specificity is defined as TN/(TN+FP) and in the next part of this series we will discuss some aspects of. Let us show an example of confusion matrix with some metrics computation to clarify the terminology.

    Reference
Prediction  No Yes
       No  203  42
       Yes   0   3
                                          
               Accuracy : 0.8306   = (TP+TN)/(TP+TN+FP+FN) = (203+3)/(203+3+42+0)                                                 
            Sensitivity : 1.00000  = TP/(TP+FN) = 203/(203+0)       
            Specificity : 0.06667  = TN/(TN+FP) = 3/(3+42)     
         Pos Pred Value : 0.82857  = TP/(TP+FP) = 203/(203+42)       
         Neg Pred Value : 1.00000  = TN/(TN+FN) = 3/(3+0)       

Please note that the “No” column is associated to a positive outcome, whilst “Yes” to a negative one. Specificity measure how good we are in predict that {RainTomorrow = “No”} and actually it is and this is something that we are going to investigate in next post of this series.

The logistic regression models to be tuned are:

mod9am_c1_fit: RainTomorrow ~ Cloud9am + Humidity9am + Pressure9am + Temp9am
mod3pm_c1_fit: RainTomorrow ~ Cloud3pm + Humidity3pm + Pressure3pm + Temp3pm
mod_ev_c2_fit: RainTomorrow ~ Cloud3pm + Humidity3pm + Pressure3pm + Temp3pm + WindGustDir
mod_ev_c3_fit: RainTomorrow ~ Pressure3pm + Sunshine

In the following analysis, for all models we determine the cutoff value providing with the highest accuracy.

glm.tune(mod9am_c1_fit, training)
   cutoff  accuracy specificity
1    0.20 0.7822581  0.73333333
2    0.22 0.7862903  0.71111111
3    0.24 0.7943548  0.68888889
4    0.26 0.8064516  0.64444444
5    0.28 0.8104839  0.64444444
6    0.30 0.8145161  0.60000000
7    0.32 0.8185484  0.57777778
8    0.34 0.8185484  0.51111111
9    0.36 0.8104839  0.46666667
10   0.38 0.8266129  0.46666667
11   0.40 0.8266129  0.40000000
12   0.42 0.8306452  0.40000000
13   0.44 0.8266129  0.37777778
14   0.46 0.8346774  0.35555556
15   0.48 0.8467742  0.33333333
16   0.50 0.8508065  0.31111111
17   0.52 0.8427419  0.26666667
18   0.54 0.8467742  0.26666667
19   0.56 0.8346774  0.20000000
20   0.58 0.8387097  0.20000000
21   0.60 0.8387097  0.20000000
22   0.62 0.8346774  0.17777778
23   0.64 0.8387097  0.17777778
24   0.66 0.8346774  0.15555556
25   0.68 0.8387097  0.15555556
26   0.70 0.8387097  0.13333333
27   0.72 0.8387097  0.13333333
28   0.74 0.8346774  0.11111111
29   0.76 0.8266129  0.06666667
30   0.78 0.8266129  0.06666667
31   0.80 0.8306452  0.06666667

The optimal cutoff value is equal to 0.5. In this case, we find the same cutoff value as the default the caret package takes advantage of for logistic regression models.

opt_cutoff <- 0.5 
pred_test <- predict(mod9am_c1_fit, testing, type = "prob")
prediction = as.factor(ifelse(pred_test$Yes >= opt_cutoff, "Yes", "No"))
confusionMatrix(prediction, testing$RainTomorrow)
Confusion Matrix and Statistics

          Reference
Prediction No Yes
       No  80  12
       Yes  6   7
                                          
               Accuracy : 0.8286          
                 95% CI : (0.7427, 0.8951)
    No Information Rate : 0.819           
    P-Value [Acc > NIR] : 0.4602          
                                          
                  Kappa : 0.3405          
 Mcnemar's Test P-Value : 0.2386          
                                          
            Sensitivity : 0.9302          
            Specificity : 0.3684          
         Pos Pred Value : 0.8696          
         Neg Pred Value : 0.5385          
             Prevalence : 0.8190          
         Detection Rate : 0.7619          
   Detection Prevalence : 0.8762          
      Balanced Accuracy : 0.6493          
                                          
       'Positive' Class : No

Then, tuning the 3PM model.

glm.tune(mod3pm_c1_fit, training)
  cutoff  accuracy specificity
1    0.20 0.8064516   0.7777778
2    0.22 0.8185484   0.7555556
3    0.24 0.8225806   0.7333333
4    0.26 0.8306452   0.6888889
5    0.28 0.8467742   0.6666667
6    0.30 0.8467742   0.6444444
7    0.32 0.8427419   0.6222222
8    0.34 0.8669355   0.6222222
9    0.36 0.8709677   0.6222222
10   0.38 0.8629032   0.5777778
11   0.40 0.8669355   0.5777778
12   0.42 0.8669355   0.5555556
13   0.44 0.8548387   0.4666667
14   0.46 0.8548387   0.4444444
15   0.48 0.8588710   0.4444444
16   0.50 0.8669355   0.4444444
17   0.52 0.8629032   0.4222222
18   0.54 0.8669355   0.4222222
19   0.56 0.8669355   0.3777778
20   0.58 0.8669355   0.3777778
21   0.60 0.8588710   0.3333333
22   0.62 0.8548387   0.3111111
23   0.64 0.8508065   0.2888889
24   0.66 0.8467742   0.2666667
25   0.68 0.8387097   0.2222222
26   0.70 0.8387097   0.2222222
27   0.72 0.8346774   0.2000000
28   0.74 0.8387097   0.1777778
29   0.76 0.8346774   0.1555556
30   0.78 0.8346774   0.1555556
31   0.80 0.8306452   0.1111111

The optimal cutoff value is equal to 0.36.

opt_cutoff <- 0.36
pred_test <- predict(mod3pm_c1_fit, testing, type="prob")
prediction = as.factor(ifelse(pred_test$Yes >= opt_cutoff, "Yes", "No"))
confusionMatrix(prediction, testing$RainTomorrow)
Confusion Matrix and Statistics

          Reference
Prediction No Yes
       No  81   5
       Yes  5  14
                                          
               Accuracy : 0.9048          
                 95% CI : (0.8318, 0.9534)
    No Information Rate : 0.819           
    P-Value [Acc > NIR] : 0.0112          
                                          
                  Kappa : 0.6787          
 Mcnemar's Test P-Value : 1.0000          
                                          
            Sensitivity : 0.9419          
            Specificity : 0.7368          
         Pos Pred Value : 0.9419          
         Neg Pred Value : 0.7368          
             Prevalence : 0.8190          
         Detection Rate : 0.7714          
   Detection Prevalence : 0.8190          
      Balanced Accuracy : 0.8394          
                                          
       'Positive' Class : No    

We improved the test accuracy, previously resulting as equal to 0.8857 and now equal to 0.9048 (please take a look at part #2 of this tutorial to know about the test accuracy with 0.5 cutoff value). Then, the first evening hours model.

glm.tune(mod_ev_c2_fit, training)
  cutoff  accuracy specificity
1    0.20 0.8387097   0.8444444
2    0.22 0.8467742   0.8222222
3    0.24 0.8548387   0.8222222
4    0.26 0.8629032   0.8000000
5    0.28 0.8588710   0.7333333
6    0.30 0.8709677   0.7333333
7    0.32 0.8790323   0.7111111
8    0.34 0.8830645   0.6888889
9    0.36 0.8870968   0.6666667
10   0.38 0.8870968   0.6666667
11   0.40 0.8951613   0.6666667
12   0.42 0.8991935   0.6666667
13   0.44 0.8991935   0.6666667
14   0.46 0.8951613   0.6444444
15   0.48 0.8951613   0.6222222
16   0.50 0.8951613   0.6222222
17   0.52 0.8951613   0.6000000
18   0.54 0.8911290   0.5777778
19   0.56 0.8911290   0.5777778
20   0.58 0.8951613   0.5777778
21   0.60 0.8911290   0.5555556
22   0.62 0.8870968   0.5111111
23   0.64 0.8830645   0.4888889
24   0.66 0.8830645   0.4666667
25   0.68 0.8790323   0.4222222
26   0.70 0.8709677   0.3555556
27   0.72 0.8548387   0.2666667
28   0.74 0.8508065   0.2444444
29   0.76 0.8427419   0.1777778
30   0.78 0.8387097   0.1555556
31   0.80 0.8387097   0.1555556

The optimal cutoff value is equal to 0.42.

opt_cutoff <- 0.42
pred_test <- predict(mod_ev_c2_fit, testing, type="prob")
prediction = as.factor(ifelse(pred_test$Yes >= opt_cutoff, "Yes", "No"))
confusionMatrix(prediction, testing$RainTomorrow)
Confusion Matrix and Statistics

          Reference
Prediction No Yes
       No  82   8
       Yes  4  11
                                          
               Accuracy : 0.8857          
                 95% CI : (0.8089, 0.9395)
    No Information Rate : 0.819           
    P-Value [Acc > NIR] : 0.04408         
                                          
                  Kappa : 0.58            
 Mcnemar's Test P-Value : 0.38648         
                                          
            Sensitivity : 0.9535          
            Specificity : 0.5789          
         Pos Pred Value : 0.9111          
         Neg Pred Value : 0.7333          
             Prevalence : 0.8190          
         Detection Rate : 0.7810          
   Detection Prevalence : 0.8571          
      Balanced Accuracy : 0.7662          
                                          
       'Positive' Class : No           

We did not improve the accuracy as the default cutoff value provides with the same. Other metrics changed and in particular we notice that the sensitivity decreased a little bit while the positive predicted value improved. Then the second evening hours model.

glm.tune(mod_ev_c3_fit, training)
 cutoff  accuracy specificity
1    0.20 0.8225806   0.8000000
2    0.22 0.8145161   0.7333333
3    0.24 0.8064516   0.6666667
4    0.26 0.8145161   0.6666667
5    0.28 0.8145161   0.6222222
6    0.30 0.8185484   0.6222222
7    0.32 0.8266129   0.6000000
8    0.34 0.8185484   0.5555556
9    0.36 0.8306452   0.5333333
10   0.38 0.8467742   0.5333333
11   0.40 0.8467742   0.5111111
12   0.42 0.8427419   0.4666667
13   0.44 0.8508065   0.4666667
14   0.46 0.8467742   0.4222222
15   0.48 0.8467742   0.4000000
16   0.50 0.8508065   0.4000000
17   0.52 0.8548387   0.4000000
18   0.54 0.8508065   0.3777778
19   0.56 0.8669355   0.3777778
20   0.58 0.8669355   0.3555556
21   0.60 0.8629032   0.3333333
22   0.62 0.8588710   0.3111111
23   0.64 0.8548387   0.2888889
24   0.66 0.8508065   0.2666667
25   0.68 0.8467742   0.2444444
26   0.70 0.8387097   0.2000000
27   0.72 0.8387097   0.1777778
28   0.74 0.8346774   0.1555556
29   0.76 0.8306452   0.1333333
30   0.78 0.8306452   0.1333333
31   0.80 0.8306452   0.1333333

The optimal cutoff value is equal to 0.56.

opt_cutoff <- 0.56
pred_test <- predict(mod_ev_c3_fit, testing, type="prob")
prediction = as.factor(ifelse(pred_test$Yes >= opt_cutoff, "Yes", "No")
confusionMatrix(prediction, testing$RainTomorrow)
Confusion Matrix and Statistics

          Reference
Prediction No Yes
       No  82   8
       Yes  4  11
                                          
               Accuracy : 0.8857          
                 95% CI : (0.8089, 0.9395)
    No Information Rate : 0.819           
    P-Value [Acc > NIR] : 0.04408         
                                          
                  Kappa : 0.58            
 Mcnemar's Test P-Value : 0.38648         
                                          
            Sensitivity : 0.9535          
            Specificity : 0.5789          
         Pos Pred Value : 0.9111          
         Neg Pred Value : 0.7333          
             Prevalence : 0.8190          
         Detection Rate : 0.7810          
   Detection Prevalence : 0.8571          
      Balanced Accuracy : 0.7662          
                                          
       'Positive' Class : No     

In this case, we did not improved the accuracy of this model. Other metrics changed and in particular we notice that the sensitivity is slightly higher than the default cutoff driven value (0.9419), and the positive predicted value decreased a little bit (for default cutoff was 0.9205).

ROC analysis

To have a visual understanding of the classification performances and compute further metrics, we are going to take advantage of the ROCr package facilities. We plot ROC curves and determine the corresponding AUC value. That is going to be done for each model. The function used to accomplish with this task is the following.

glm.perf.plot <- function (prediction, cutoff) {
    perf <- performance(prediction, measure = "tpr", x.measure = "fpr")     
    par(mfrow=(c(1,2)))
    plot(perf, col="red")
    grid()
    perf <- performance(prediction, measure = "acc", x.measure = "cutoff")    
    plot(perf, col="red")
    abline(v = cutoff, col="green")
    grid()
    auc_res <- performance(prediction, "auc")
    [email protected][[1]]
}

In the following, each model will be considered, AUC value printed out and ROC plots given.

mod9am_pred_prob <- predict(mod9am_c1_fit, testing, type="prob")
mod9am_pred_resp <- prediction(mod9am_pred_prob$Yes,  testing$RainTomorrow)
glm.perf.plot(mod9am_pred_resp, 0.5)
[1] 0.8004896

mod3pm_pred_prob <- predict(mod3pm_c1_fit, testing, type="prob")
mod3pm_pred_resp <- prediction(mod3pm_pred_prob$Yes,  testing$RainTomorrow)
glm.perf.plot(mod3pm_pred_resp, 0.36)
[1] 0.9155447

The 3PM model shows the highest AUC value among all the models.

mod_ev_c2_prob <- predict(mod_ev_c2_fit, testing, type="prob")
mod_ev_c2_pred_resp <- prediction(mod_ev_c2_prob$Yes
,  testing$RainTomorrow)
glm.perf.plot(mod_ev_c2_pred_resp, 0.42)
[1] 0.8390453

mod_ev_c3_prob <- predict(mod_ev_c3_fit, testing, type="prob")
mod_ev_c3_pred_resp <- prediction(mod_ev_c3_prob$Yes,  testing$RainTomorrow)
glm.perf.plot(mod_ev_c3_pred_resp, 0.56)
[1] 0.8886169

Conclusions

By tuning the decision threshold, we were able to improve training and testing set accuracy. It turned out the 3PM model achieved a satisfactory accuracy. ROC plots gave us an understanding of how true/false positive rates vary and what is the accuracy obtained by a specific decision threshold value (i.e. cutoff value). Ultimately, AUC values were reported.

If you have any questions, please feel free to comment below.