In this post, I am going to build a statistical learning model as based upon plant leaf datasets introduced in part one of this tutorial. We have available three datasets, each one providing sixteen samples each of one-hundred plant species.

The features are:

Specifically, I will take advantage of Discrimination Analysis for classification purposes and I will compare my results with the ones of ref. [1] table 2, as herein reported.

Ref. [1] authors show the accuracy reached by KNN (proportional and weighted proportional kernel density) for any combination of the datasets in use. Their top accuracy is 96% and it is achieved when using all three datasets. In the present tutorial I am going to compare my results with the ones obtained in ref. [1].

Packages

suppressPackageStartupMessages(library(caret))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
Copy

Classification Models

Loading previous post environment where datasets are available.

load('PlantLeafEnvironment.RData')
Copy

We define an utility function to gather all the steps needed for model building by caret package (ref. [3]), specifically:

As pre-processing step, also PCA (Principal Component Analysis) is specified. That allows for slightly better accuracy results compared to non PCA preprocessing scenario (see also ref. [4] for further insights about combining PCA and LDA).

Among all available Discriminant Analysis models within caret package, the lda appears to be the most efficient with time and as well providing with good results, as we will see at the end. For details about Linear Discriminant Analysis, please see ref. [5] and [6].

set.seed(1023)

plant_leaf_model <- function(leaf_dataset) {

    train_idx <- createDataPartition(leaf_dataset$species, p = 0.7, list = FALSE)
    trControl <- trainControl(method = "repeatedcv",  number = 10, verboseIter = FALSE, repeats = 5)

    relevant_features <- setdiff(colnames(leaf_dataset), c("id", "species"))

    feat_sum <- paste(relevant_features, collapse = "+")
    frm <- as.formula(paste("species ~ ", feat_sum))

    train_set <- leaf_dataset[train_idx,]
    test_set <- leaf_dataset[-train_idx,]

    lda_fit <- train(frm, 
                     data = train_set,
                     method ="lda", 
                     tuneLength = 10,
                     preProcess = c("BoxCox", "center", "scale", "pca"),
                     trControl = trControl,
                     metric = 'Accuracy')

    lda_test_pred <- predict(lda_fit, test_set)

    cfm <- confusionMatrix(lda_test_pred, test_set$species)

    list(model=lda_fit, confusionMatrix = cfm)
}
Copy

Margin dataset model

We build our model for margin dataset only.

result <- plant_leaf_model(margin_data)
Copy

Model fit results.

result$model
## Linear Discriminant Analysis 
## 
## 1200 samples
##   64 predictor
##  100 classes: 'Acer Campestre', 'Acer Capillipes', 'Acer Circinatum', 'Acer Mono', 'Acer Opalus', 'Acer Palmatum', 'Acer Pictum', 'Acer Platanoids', 'Acer Rubrum', 'Acer Rufinerve', 'Acer Saccharinum', 'Alnus Cordata', 'Alnus Maximowiczii', 'Alnus Rubra', 'Alnus Sieboldiana', 'Alnus Viridis', 'Arundinaria Simonii', 'Betula Austrosinensis', 'Betula Pendula', 'Callicarpa Bodinieri', 'Castanea Sativa', 'Celtis Koraiensis', 'Cercis Siliquastrum', 'Cornus Chinensis', 'Cornus Controversa', 'Cornus Macrophylla', 'Cotinus Coggygria', 'Crataegus Monogyna', 'Cytisus Battandieri', 'Eucalyptus Glaucescens', 'Eucalyptus Neglecta', 'Eucalyptus Urnigera', 'Fagus Sylvatica', 'Ginkgo Biloba', 'Ilex Aquifolium', 'Ilex Cornuta', 'Liquidambar Styraciflua', 'Liriodendron Tulipifera', 'Lithocarpus Cleistocarpus', 'Lithocarpus Edulis', 'Magnolia Heptapeta', 'Magnolia Salicifolia', 'Morus Nigra', 'Olea Europaea', 'Phildelphus', 'Populus Adenopoda', 'Populus Grandidentata', 'Populus Nigra', 'Prunus Avium', 'Prunus X Shmittii', 'Pterocarya Stenoptera', 'Quercus Afares', 'Quercus Agrifolia', 'Quercus Alnifolia', 'Quercus Brantii', 'Quercus Canariensis', 'Quercus Castaneifolia', 'Quercus Cerris', 'Quercus Chrysolepis', 'Quercus Coccifera', 'Quercus Coccinea', 'Quercus Crassifolia', 'Quercus Crassipes', 'Quercus Dolicholepis', 'Quercus Ellipsoidalis', 'Quercus Greggii', 'Quercus Hartwissiana', 'Quercus Ilex', 'Quercus Imbricaria', 'Quercus Infectoria sub', 'Quercus Kewensis', 'Quercus Nigra', 'Quercus Palustris', 'Quercus Phellos', 'Quercus Phillyraeoides', 'Quercus Pontica', 'Quercus Pubescens', 'Quercus Pyrenaica', 'Quercus Rhysophylla', 'Quercus Rubra', 'Quercus Semecarpifolia', 'Quercus Shumardii', 'Quercus Suber', 'Quercus Texana', 'Quercus Trojana', 'Quercus Variabilis', 'Quercus Vulcanica', 'Quercus x Hispanica', 'Quercus x Turneri', 'Rhododendron x Russellianum', 'Salix Fragilis', 'Salix Intergra', 'Sorbus Aria', 'Tilia Oliveri', 'Tilia Platyphyllos', 'Tilia Tomentosa', 'Ulmus Bergmanniana', 'Viburnum Tinus', 'Viburnum x Rhytidophylloides', 'Zelkova Serrata' 
## 
## Pre-processing: centered (64), scaled (64), principal component
##  signal extraction (64) 
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 1082, 1076, 1078, 1081, 1086, 1079, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.7947299  0.7924885
Copy

Testset confusion matrix.

result$confusionMatrix$overall[1:4]
##      Accuracy         Kappa AccuracyLower AccuracyUpper 
##     0.7950000     0.7929293     0.7520681     0.8335033
Copy

We take notice of accuracy for final report.

margin_data_accuracy <- result$confusionMatrix$overall[1]
Copy

Further details can be printed out by:

result$model$finalModel
varImp(result$model)
Copy

that we do not show for brevity.

Shape dataset model

We build our model for shape dataset only.

result <- plant_leaf_model(shape_data)
Copy

Model fit results.

result$model
## Linear Discriminant Analysis 
## 
## 1200 samples
##   64 predictor
##  100 classes: 'Acer Campestre', 'Acer Capillipes', 'Acer Circinatum', 'Acer Mono', 'Acer Opalus', 'Acer Palmatum', 'Acer Pictum', 'Acer Platanoids', 'Acer Rubrum', 'Acer Rufinerve', 'Acer Saccharinum', 'Alnus Cordata', 'Alnus Maximowiczii', 'Alnus Rubra', 'Alnus Sieboldiana', 'Alnus Viridis', 'Arundinaria Simonii', 'Betula Austrosinensis', 'Betula Pendula', 'Callicarpa Bodinieri', 'Castanea Sativa', 'Celtis Koraiensis', 'Cercis Siliquastrum', 'Cornus Chinensis', 'Cornus Controversa', 'Cornus Macrophylla', 'Cotinus Coggygria', 'Crataegus Monogyna', 'Cytisus Battandieri', 'Eucalyptus Glaucescens', 'Eucalyptus Neglecta', 'Eucalyptus Urnigera', 'Fagus Sylvatica', 'Ginkgo Biloba', 'Ilex Aquifolium', 'Ilex Cornuta', 'Liquidambar Styraciflua', 'Liriodendron Tulipifera', 'Lithocarpus Cleistocarpus', 'Lithocarpus Edulis', 'Magnolia Heptapeta', 'Magnolia Salicifolia', 'Morus Nigra', 'Olea Europaea', 'Phildelphus', 'Populus Adenopoda', 'Populus Grandidentata', 'Populus Nigra', 'Prunus Avium', 'Prunus X Shmittii', 'Pterocarya Stenoptera', 'Quercus Afares', 'Quercus Agrifolia', 'Quercus Alnifolia', 'Quercus Brantii', 'Quercus Canariensis', 'Quercus Castaneifolia', 'Quercus Cerris', 'Quercus Chrysolepis', 'Quercus Coccifera', 'Quercus Coccinea', 'Quercus Crassifolia', 'Quercus Crassipes', 'Quercus Dolicholepis', 'Quercus Ellipsoidalis', 'Quercus Greggii', 'Quercus Hartwissiana', 'Quercus Ilex', 'Quercus Imbricaria', 'Quercus Infectoria sub', 'Quercus Kewensis', 'Quercus Nigra', 'Quercus Palustris', 'Quercus Phellos', 'Quercus Phillyraeoides', 'Quercus Pontica', 'Quercus Pubescens', 'Quercus Pyrenaica', 'Quercus Rhysophylla', 'Quercus Rubra', 'Quercus Semecarpifolia', 'Quercus Shumardii', 'Quercus Suber', 'Quercus Texana', 'Quercus Trojana', 'Quercus Variabilis', 'Quercus Vulcanica', 'Quercus x Hispanica', 'Quercus x Turneri', 'Rhododendron x Russellianum', 'Salix Fragilis', 'Salix Intergra', 'Sorbus Aria', 'Tilia Oliveri', 'Tilia Platyphyllos', 'Tilia Tomentosa', 'Ulmus Bergmanniana', 'Viburnum Tinus', 'Viburnum x Rhytidophylloides', 'Zelkova Serrata' 
## 
## Pre-processing: Box-Cox transformation (64), centered (64), scaled
##  (64), principal component signal extraction (64) 
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 1080, 1072, 1074, 1084, 1076, 1087, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.5167374  0.5116926
Copy

Testset confusion matrix.

result$confusionMatrix$overall[1:4]
##      Accuracy         Kappa AccuracyLower AccuracyUpper 
##     0.5150000     0.5101010     0.4648181     0.5649584
Copy

We take notice of accuracy for final report.

shape_data_accuracy <- result$confusionMatrix$overall[1]
Copy

Texture dataset model

We build our model for texture dataset only.

result <- plant_leaf_model(texture_data)
Copy

Model fit results.

result$model
## Linear Discriminant Analysis 
## 
## 1200 samples
##   64 predictor
##  100 classes: 'Acer Campestre', 'Acer Capillipes', 'Acer Circinatum', 'Acer Mono', 'Acer Opalus', 'Acer Palmatum', 'Acer Pictum', 'Acer Platanoids', 'Acer Rubrum', 'Acer Rufinerve', 'Acer Saccharinum', 'Alnus Cordata', 'Alnus Maximowiczii', 'Alnus Rubra', 'Alnus Sieboldiana', 'Alnus Viridis', 'Arundinaria Simonii', 'Betula Austrosinensis', 'Betula Pendula', 'Callicarpa Bodinieri', 'Castanea Sativa', 'Celtis Koraiensis', 'Cercis Siliquastrum', 'Cornus Chinensis', 'Cornus Controversa', 'Cornus Macrophylla', 'Cotinus Coggygria', 'Crataegus Monogyna', 'Cytisus Battandieri', 'Eucalyptus Glaucescens', 'Eucalyptus Neglecta', 'Eucalyptus Urnigera', 'Fagus Sylvatica', 'Ginkgo Biloba', 'Ilex Aquifolium', 'Ilex Cornuta', 'Liquidambar Styraciflua', 'Liriodendron Tulipifera', 'Lithocarpus Cleistocarpus', 'Lithocarpus Edulis', 'Magnolia Heptapeta', 'Magnolia Salicifolia', 'Morus Nigra', 'Olea Europaea', 'Phildelphus', 'Populus Adenopoda', 'Populus Grandidentata', 'Populus Nigra', 'Prunus Avium', 'Prunus X Shmittii', 'Pterocarya Stenoptera', 'Quercus Afares', 'Quercus Agrifolia', 'Quercus Alnifolia', 'Quercus Brantii', 'Quercus Canariensis', 'Quercus Castaneifolia', 'Quercus Cerris', 'Quercus Chrysolepis', 'Quercus Coccifera', 'Quercus Coccinea', 'Quercus Crassifolia', 'Quercus Crassipes', 'Quercus Dolicholepis', 'Quercus Ellipsoidalis', 'Quercus Greggii', 'Quercus Hartwissiana', 'Quercus Ilex', 'Quercus Imbricaria', 'Quercus Infectoria sub', 'Quercus Kewensis', 'Quercus Nigra', 'Quercus Palustris', 'Quercus Phellos', 'Quercus Phillyraeoides', 'Quercus Pontica', 'Quercus Pubescens', 'Quercus Pyrenaica', 'Quercus Rhysophylla', 'Quercus Rubra', 'Quercus Semecarpifolia', 'Quercus Shumardii', 'Quercus Suber', 'Quercus Texana', 'Quercus Trojana', 'Quercus Variabilis', 'Quercus Vulcanica', 'Quercus x Hispanica', 'Quercus x Turneri', 'Rhododendron x Russellianum', 'Salix Fragilis', 'Salix Intergra', 'Sorbus Aria', 'Tilia Oliveri', 'Tilia Platyphyllos', 'Tilia Tomentosa', 'Ulmus Bergmanniana', 'Viburnum Tinus', 'Viburnum x Rhytidophylloides', 'Zelkova Serrata' 
## 
## Pre-processing: centered (64), scaled (64), principal component
##  signal extraction (64) 
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 1078, 1078, 1086, 1076, 1076, 1082, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.7580786  0.7554372
Copy

Testset confusion matrix.

result$confusionMatrix$overall[1:4]
##      Accuracy         Kappa AccuracyLower AccuracyUpper 
##     0.7575000     0.7550505     0.7124314     0.7987117
Copy

We take notice of accuracy for final report.

texture_data_accuracy <- result$confusionMatrix$overall[1]
Copy

Margin+Shape datasets model

We build our model for margin and shape datasets.

margin_shape_data <- left_join(margin_data, shape_data[,-which(colnames(texture_data) %in% c("species"))], by=c("id"))
result <- plant_leaf_model(margin_shape_data)
Copy

Model fit results.

result$model
## Linear Discriminant Analysis 
## 
## 1200 samples
##  128 predictor
##  100 classes: 'Acer Campestre', 'Acer Capillipes', 'Acer Circinatum', 'Acer Mono', 'Acer Opalus', 'Acer Palmatum', 'Acer Pictum', 'Acer Platanoids', 'Acer Rubrum', 'Acer Rufinerve', 'Acer Saccharinum', 'Alnus Cordata', 'Alnus Maximowiczii', 'Alnus Rubra', 'Alnus Sieboldiana', 'Alnus Viridis', 'Arundinaria Simonii', 'Betula Austrosinensis', 'Betula Pendula', 'Callicarpa Bodinieri', 'Castanea Sativa', 'Celtis Koraiensis', 'Cercis Siliquastrum', 'Cornus Chinensis', 'Cornus Controversa', 'Cornus Macrophylla', 'Cotinus Coggygria', 'Crataegus Monogyna', 'Cytisus Battandieri', 'Eucalyptus Glaucescens', 'Eucalyptus Neglecta', 'Eucalyptus Urnigera', 'Fagus Sylvatica', 'Ginkgo Biloba', 'Ilex Aquifolium', 'Ilex Cornuta', 'Liquidambar Styraciflua', 'Liriodendron Tulipifera', 'Lithocarpus Cleistocarpus', 'Lithocarpus Edulis', 'Magnolia Heptapeta', 'Magnolia Salicifolia', 'Morus Nigra', 'Olea Europaea', 'Phildelphus', 'Populus Adenopoda', 'Populus Grandidentata', 'Populus Nigra', 'Prunus Avium', 'Prunus X Shmittii', 'Pterocarya Stenoptera', 'Quercus Afares', 'Quercus Agrifolia', 'Quercus Alnifolia', 'Quercus Brantii', 'Quercus Canariensis', 'Quercus Castaneifolia', 'Quercus Cerris', 'Quercus Chrysolepis', 'Quercus Coccifera', 'Quercus Coccinea', 'Quercus Crassifolia', 'Quercus Crassipes', 'Quercus Dolicholepis', 'Quercus Ellipsoidalis', 'Quercus Greggii', 'Quercus Hartwissiana', 'Quercus Ilex', 'Quercus Imbricaria', 'Quercus Infectoria sub', 'Quercus Kewensis', 'Quercus Nigra', 'Quercus Palustris', 'Quercus Phellos', 'Quercus Phillyraeoides', 'Quercus Pontica', 'Quercus Pubescens', 'Quercus Pyrenaica', 'Quercus Rhysophylla', 'Quercus Rubra', 'Quercus Semecarpifolia', 'Quercus Shumardii', 'Quercus Suber', 'Quercus Texana', 'Quercus Trojana', 'Quercus Variabilis', 'Quercus Vulcanica', 'Quercus x Hispanica', 'Quercus x Turneri', 'Rhododendron x Russellianum', 'Salix Fragilis', 'Salix Intergra', 'Sorbus Aria', 'Tilia Oliveri', 'Tilia Platyphyllos', 'Tilia Tomentosa', 'Ulmus Bergmanniana', 'Viburnum Tinus', 'Viburnum x Rhytidophylloides', 'Zelkova Serrata' 
## 
## Pre-processing: Box-Cox transformation (64), centered (128), scaled
##  (128), principal component signal extraction (128) 
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 1082, 1084, 1074, 1087, 1084, 1081, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.9511508  0.9506051
Copy

Testset confusion matrix.

result$confusionMatrix$overall[1:4]
##      Accuracy         Kappa AccuracyLower AccuracyUpper 
##     0.9300000     0.9292929     0.9004175     0.9529847
Copy

We take notice of accuracy for final report.

margin_shape_data_accuracy <- result$confusionMatrix$overall[1]
Copy

Margin+Texture datasets model

We build our model for margin and texture datasets.

margin_texture_data <- left_join(margin_data, texture_data[,-which(colnames(texture_data) %in% c("species"))], by=c("id"))
result <- plant_leaf_model(margin_texture_data)
Copy

Model fit results.

result$model
## Linear Discriminant Analysis 
## 
## 1200 samples
##  128 predictor
##  100 classes: 'Acer Campestre', 'Acer Capillipes', 'Acer Circinatum', 'Acer Mono', 'Acer Opalus', 'Acer Palmatum', 'Acer Pictum', 'Acer Platanoids', 'Acer Rubrum', 'Acer Rufinerve', 'Acer Saccharinum', 'Alnus Cordata', 'Alnus Maximowiczii', 'Alnus Rubra', 'Alnus Sieboldiana', 'Alnus Viridis', 'Arundinaria Simonii', 'Betula Austrosinensis', 'Betula Pendula', 'Callicarpa Bodinieri', 'Castanea Sativa', 'Celtis Koraiensis', 'Cercis Siliquastrum', 'Cornus Chinensis', 'Cornus Controversa', 'Cornus Macrophylla', 'Cotinus Coggygria', 'Crataegus Monogyna', 'Cytisus Battandieri', 'Eucalyptus Glaucescens', 'Eucalyptus Neglecta', 'Eucalyptus Urnigera', 'Fagus Sylvatica', 'Ginkgo Biloba', 'Ilex Aquifolium', 'Ilex Cornuta', 'Liquidambar Styraciflua', 'Liriodendron Tulipifera', 'Lithocarpus Cleistocarpus', 'Lithocarpus Edulis', 'Magnolia Heptapeta', 'Magnolia Salicifolia', 'Morus Nigra', 'Olea Europaea', 'Phildelphus', 'Populus Adenopoda', 'Populus Grandidentata', 'Populus Nigra', 'Prunus Avium', 'Prunus X Shmittii', 'Pterocarya Stenoptera', 'Quercus Afares', 'Quercus Agrifolia', 'Quercus Alnifolia', 'Quercus Brantii', 'Quercus Canariensis', 'Quercus Castaneifolia', 'Quercus Cerris', 'Quercus Chrysolepis', 'Quercus Coccifera', 'Quercus Coccinea', 'Quercus Crassifolia', 'Quercus Crassipes', 'Quercus Dolicholepis', 'Quercus Ellipsoidalis', 'Quercus Greggii', 'Quercus Hartwissiana', 'Quercus Ilex', 'Quercus Imbricaria', 'Quercus Infectoria sub', 'Quercus Kewensis', 'Quercus Nigra', 'Quercus Palustris', 'Quercus Phellos', 'Quercus Phillyraeoides', 'Quercus Pontica', 'Quercus Pubescens', 'Quercus Pyrenaica', 'Quercus Rhysophylla', 'Quercus Rubra', 'Quercus Semecarpifolia', 'Quercus Shumardii', 'Quercus Suber', 'Quercus Texana', 'Quercus Trojana', 'Quercus Variabilis', 'Quercus Vulcanica', 'Quercus x Hispanica', 'Quercus x Turneri', 'Rhododendron x Russellianum', 'Salix Fragilis', 'Salix Intergra', 'Sorbus Aria', 'Tilia Oliveri', 'Tilia Platyphyllos', 'Tilia Tomentosa', 'Ulmus Bergmanniana', 'Viburnum Tinus', 'Viburnum x Rhytidophylloides', 'Zelkova Serrata' 
## 
## Pre-processing: centered (128), scaled (128), principal component
##  signal extraction (128) 
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 1079, 1078, 1084, 1074, 1085, 1081, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.9666299  0.9662565
Copy

Testset confusion matrix.

result$confusionMatrix$overall[1:4]
##      Accuracy         Kappa AccuracyLower AccuracyUpper 
##     0.9475000     0.9469697     0.9208654     0.9672116
Copy

We take notice of accuracy for final report.

margin_texture_data_accuracy <- result$confusionMatrix$overall[1]
Copy

Shape+Texture datasets model

We build our model for shape and texture datasets.

shape_texture_data <- left_join(shape_data, texture_data[,-which(colnames(texture_data) %in% c("species"))], by=c("id"))
result <- plant_leaf_model(shape_texture_data)
Copy

Model fit results.

result$model
## Linear Discriminant Analysis 
## 
## 1200 samples
##  128 predictor
##  100 classes: 'Acer Campestre', 'Acer Capillipes', 'Acer Circinatum', 'Acer Mono', 'Acer Opalus', 'Acer Palmatum', 'Acer Pictum', 'Acer Platanoids', 'Acer Rubrum', 'Acer Rufinerve', 'Acer Saccharinum', 'Alnus Cordata', 'Alnus Maximowiczii', 'Alnus Rubra', 'Alnus Sieboldiana', 'Alnus Viridis', 'Arundinaria Simonii', 'Betula Austrosinensis', 'Betula Pendula', 'Callicarpa Bodinieri', 'Castanea Sativa', 'Celtis Koraiensis', 'Cercis Siliquastrum', 'Cornus Chinensis', 'Cornus Controversa', 'Cornus Macrophylla', 'Cotinus Coggygria', 'Crataegus Monogyna', 'Cytisus Battandieri', 'Eucalyptus Glaucescens', 'Eucalyptus Neglecta', 'Eucalyptus Urnigera', 'Fagus Sylvatica', 'Ginkgo Biloba', 'Ilex Aquifolium', 'Ilex Cornuta', 'Liquidambar Styraciflua', 'Liriodendron Tulipifera', 'Lithocarpus Cleistocarpus', 'Lithocarpus Edulis', 'Magnolia Heptapeta', 'Magnolia Salicifolia', 'Morus Nigra', 'Olea Europaea', 'Phildelphus', 'Populus Adenopoda', 'Populus Grandidentata', 'Populus Nigra', 'Prunus Avium', 'Prunus X Shmittii', 'Pterocarya Stenoptera', 'Quercus Afares', 'Quercus Agrifolia', 'Quercus Alnifolia', 'Quercus Brantii', 'Quercus Canariensis', 'Quercus Castaneifolia', 'Quercus Cerris', 'Quercus Chrysolepis', 'Quercus Coccifera', 'Quercus Coccinea', 'Quercus Crassifolia', 'Quercus Crassipes', 'Quercus Dolicholepis', 'Quercus Ellipsoidalis', 'Quercus Greggii', 'Quercus Hartwissiana', 'Quercus Ilex', 'Quercus Imbricaria', 'Quercus Infectoria sub', 'Quercus Kewensis', 'Quercus Nigra', 'Quercus Palustris', 'Quercus Phellos', 'Quercus Phillyraeoides', 'Quercus Pontica', 'Quercus Pubescens', 'Quercus Pyrenaica', 'Quercus Rhysophylla', 'Quercus Rubra', 'Quercus Semecarpifolia', 'Quercus Shumardii', 'Quercus Suber', 'Quercus Texana', 'Quercus Trojana', 'Quercus Variabilis', 'Quercus Vulcanica', 'Quercus x Hispanica', 'Quercus x Turneri', 'Rhododendron x Russellianum', 'Salix Fragilis', 'Salix Intergra', 'Sorbus Aria', 'Tilia Oliveri', 'Tilia Platyphyllos', 'Tilia Tomentosa', 'Ulmus Bergmanniana', 'Viburnum Tinus', 'Viburnum x Rhytidophylloides', 'Zelkova Serrata' 
## 
## Pre-processing: Box-Cox transformation (64), centered (128), scaled
##  (128), principal component signal extraction (128) 
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 1083, 1080, 1080, 1082, 1082, 1081, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.9102814  0.9092814
Copy

Testset confusion matrix.

result$confusionMatrix$overall[1:4]
##      Accuracy         Kappa AccuracyLower AccuracyUpper 
##     0.9225000     0.9217172     0.8917971     0.9467373
Copy

We take notice of accuracy for final report.

shape_texture_data_accuracy <- result$confusionMatrix$overall[1]
Copy

Margin+Shape+Texture datasets model

We build our model for all three datasets.

leaves_data <- left_join(margin_data, shape_texture_data[,-which(colnames(texture_data) %in% c("species"))], by = c("id"))
result <- plant_leaf_model(leaves_data)
Copy

Model fit results.

result$model
## Linear Discriminant Analysis 
## 
## 1200 samples
##  192 predictor
##  100 classes: 'Acer Campestre', 'Acer Capillipes', 'Acer Circinatum', 'Acer Mono', 'Acer Opalus', 'Acer Palmatum', 'Acer Pictum', 'Acer Platanoids', 'Acer Rubrum', 'Acer Rufinerve', 'Acer Saccharinum', 'Alnus Cordata', 'Alnus Maximowiczii', 'Alnus Rubra', 'Alnus Sieboldiana', 'Alnus Viridis', 'Arundinaria Simonii', 'Betula Austrosinensis', 'Betula Pendula', 'Callicarpa Bodinieri', 'Castanea Sativa', 'Celtis Koraiensis', 'Cercis Siliquastrum', 'Cornus Chinensis', 'Cornus Controversa', 'Cornus Macrophylla', 'Cotinus Coggygria', 'Crataegus Monogyna', 'Cytisus Battandieri', 'Eucalyptus Glaucescens', 'Eucalyptus Neglecta', 'Eucalyptus Urnigera', 'Fagus Sylvatica', 'Ginkgo Biloba', 'Ilex Aquifolium', 'Ilex Cornuta', 'Liquidambar Styraciflua', 'Liriodendron Tulipifera', 'Lithocarpus Cleistocarpus', 'Lithocarpus Edulis', 'Magnolia Heptapeta', 'Magnolia Salicifolia', 'Morus Nigra', 'Olea Europaea', 'Phildelphus', 'Populus Adenopoda', 'Populus Grandidentata', 'Populus Nigra', 'Prunus Avium', 'Prunus X Shmittii', 'Pterocarya Stenoptera', 'Quercus Afares', 'Quercus Agrifolia', 'Quercus Alnifolia', 'Quercus Brantii', 'Quercus Canariensis', 'Quercus Castaneifolia', 'Quercus Cerris', 'Quercus Chrysolepis', 'Quercus Coccifera', 'Quercus Coccinea', 'Quercus Crassifolia', 'Quercus Crassipes', 'Quercus Dolicholepis', 'Quercus Ellipsoidalis', 'Quercus Greggii', 'Quercus Hartwissiana', 'Quercus Ilex', 'Quercus Imbricaria', 'Quercus Infectoria sub', 'Quercus Kewensis', 'Quercus Nigra', 'Quercus Palustris', 'Quercus Phellos', 'Quercus Phillyraeoides', 'Quercus Pontica', 'Quercus Pubescens', 'Quercus Pyrenaica', 'Quercus Rhysophylla', 'Quercus Rubra', 'Quercus Semecarpifolia', 'Quercus Shumardii', 'Quercus Suber', 'Quercus Texana', 'Quercus Trojana', 'Quercus Variabilis', 'Quercus Vulcanica', 'Quercus x Hispanica', 'Quercus x Turneri', 'Rhododendron x Russellianum', 'Salix Fragilis', 'Salix Intergra', 'Sorbus Aria', 'Tilia Oliveri', 'Tilia Platyphyllos', 'Tilia Tomentosa', 'Ulmus Bergmanniana', 'Viburnum Tinus', 'Viburnum x Rhytidophylloides', 'Zelkova Serrata' 
## 
## Pre-processing: Box-Cox transformation (64), centered (192), scaled
##  (192), principal component signal extraction (192) 
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 1078, 1074, 1084, 1076, 1083, 1082, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.9832037  0.9830154
Copy

Testset confusion matrix.

result$confusionMatrix$overall[1:4]
##      Accuracy         Kappa AccuracyLower AccuracyUpper 
##     0.9900000     0.9898990     0.9745952     0.9972688
Copy

We take notice of accuracy for final report.

leaf_data_accuracy <- result$confusionMatrix$overall[1]
Copy

Final Results

Finally, we gather all results in a data frame and show for comparison. The V symbol indicates when a dataset is selected for model building.

LDA_accuracy = c(shape_data_accuracy, texture_data_accuracy, margin_data_accuracy,
                 shape_texture_data_accuracy, margin_shape_data_accuracy, margin_texture_data_accuracy,
                 leaf_data_accuracy)
LDA_accuracy <- 100*LDA_accuracy # as percentage

results <- data.frame(SHA = c("V", " ", " ", "V", "V", " ", "V"),
                      TEX = c(" ", "V", " ", "V", " ", "V", "V"),
                      MAR = c(" ", " ", "V", " ", "V", "V", "V"),
                      prop_KNN_accuracy = c(62.13, 72.94, 75.00, 86.19, 87.19, 93.38, 96.81),
                      wprop_KNN__accuracy = c(61.88, 72.75, 75.75, 86.06, 86.75, 93.31, 96.69),
                      LDA_accuracy = LDA_accuracy
                      )
Copy

Here are ref.[1] results compared to ours.

results
##   SHA TEX MAR prop_KNN_accuracy wprop_KNN__accuracy LDA_accuracy
## 1   V                     62.13               61.88        51.50
## 2       V                 72.94               72.75        75.75
## 3           V             75.00               75.75        79.50
## 4   V   V                 86.19               86.06        92.25
## 5   V       V             87.19               86.75        93.00
## 6       V   V             93.38               93.31        94.75
## 7   V   V   V             96.81               96.69        99.00
Copy

As we can see, with the only exception of the shape dataset scenario, for all other cases as defined by the datasets in use, Linear Discriminant Analysis achieves higher accuracy with respect ref. [1] K-Nearest-Neighbor based models.

References

  1. Charles Mallah, James Cope and James Orwell, “PLANT LEAF CLASSIFICATION USING PROBABILISTIC INTEGRATION OF SHAPE, TEXTURE AND MARGIN FEATURES” link
  2. Leaf Dataset
  3. Caret package – train models by tag
  4. Combining PCA and LDA
  5. Linear Discriminant Analysis
  6. Linear Discriminant Analysis – Bit by Bit