Machine learning methods and data evaluation

If a contextualization into a larger sample (or spectra) population is required or if correlations are to be examined that are not visible at first glance in individual samples, a machine learning approach respectively a mathematical-statistical evaluation of the data is therefore required, also known as a 'soft modeling' approach. After reducing the number of wavelengths or the description using feature images, methods of image evaluation and graphical pattern recognition, such as wavelet analysis, can describe structures in the surface using just a few parameters. In a final step, a regression model, for example, is used to classify the sample as a whole. In this way, extensive, complex raw data can be analyzed for a target value.

Schematically procedure for data analysis

The prerequisite for this procedure, however, is that a machine learning model is trained using reference samples with a known target value, which can be used for the subsequent prediction of the target value from unknown spectra or new samples. A distinction is made between continuous regression models and classification models with discrete assignment of the target value. This prediction can be automated in the process and can be carried out very quickly, optical inspection using HSI technology is therefore also inline-capable.

If only the spectra totality is of interest and not their spatial distribution or the derivation of a target value, an evaluation and/or classification of the spectra or samples can also be carried out directly using methods of descriptive statistics or quantitative and qualitative multivariate data analysis (MVDA) [ ]. Qualitative methods of MVDA are also suitable if the target value of a sample population is unknown or no reference exists. Frequently used methods are principle component analysis (PCA) and cluster analysis (CA).