Multi-label classification with optimal thresholding for multi-composition spectroscopic analysis

Date

2019-08-30

Authors

Gan, Luyun

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Spectroscopic analysis has several applications in physics, chemistry, bioinformatics, geophysics, astronomy, etc. It has been widely used for detecting mineral samples, gas emission, and food volatiles. Machine learning algorithms for spectroscopic analysis focus on either regression or single-label classification problems. Using multi-label classification to identify multiple chemical components from the spectrum, has not been explored. In this thesis, we implement Feed-forward Neural Network with Optimal Thresholding (FNN-OT) identifying gas species among a multi gas mixture in a cluttered environment. Spectrum signals are initially processed by a feed-forward neural network (FNN) model, which produces individual prediction scores for each gas. These scores will be the input of a following optimal thresholding (OT) system. Predictions of each gas component in one testing sample will be made by comparing its output score from FNN against a threshold from the OT system. If its output score is larger than the threshold, the prediction is 1 and 0 otherwise, representing the existence/non-existence of that gas component in the spectrum. Using infrared absorption spectroscopy and tested on synthesized spectral datasets, our approach outperforms FNN itself and conventional binary relevance - Partial Least Squares with Binary Relevance (PLS-BR). All three models are trained and tested on 18 synthesized datasets with 6 levels of \signal-to-noise ratio and 3 types of gas correlation. They are evaluated and compared with micro, macro and sample averaged precision, recall and F1 score. For mutually independent and randomly correlated gas data, FNN-OT yields better performance than FNN itself or the conventional PLS-BR, by significantly by increasing recall without sacrificing much precision. For positively correlated gas data, FNN-OT performs better in capturing information of positive label correlation from noisy datasets than the other two models.

Description

Keywords

Multi-label Classification, Infrared Spectroscopy, Feed-forward Neural Network, Machine Learning

Citation