2.4 KiB
layout, title, categories, tags, excerpt, header, scholar_link
layout | title | categories | tags | excerpt | header | scholar_link | ||
---|---|---|---|---|---|---|---|---|
single | Surgical-Mask Detection | research | audio-classification deep-learning data-augmentation computer-vision paralinguistics | CNN mask detection in speech using augmented spectrograms. |
|
https://scholar.google.de/citations?user=NODAd94AAAAJ&hl=en |
This study investigates the efficacy of various data augmentation techniques applied directly to mel-spectrogram representations of audio data for improving classification performance. The specific task addressed is the detection of surgical mask usage based on human speech signals, a relevant problem in paralinguistics and audio analysis.
We systematically evaluated the impact of data augmentation when training Convolutional Neural Networks (CNNs) for this binary classification task. The input to the networks consisted of mel-spectrograms derived from voice samples. The effectiveness of augmentation strategies (such as frequency masking, time masking, or combined approaches like SpecAugment) was assessed across four different CNN architectures.

The core finding of this research is that applying appropriate data augmentation directly to the spectrogram inputs significantly enhances the performance and generalization capabilities of the CNN models for surgical mask detection. The augmented models demonstrated improved accuracy, robustness, and notably surpassed many established benchmark results from the relevant ComParE (Computational Paralinguistics Challenge) tasks. This highlights the importance of data augmentation as a crucial component in building effective deep learning models for audio classification, particularly when dealing with limited or variable datasets. For a detailed description of the methods and results, please refer to {% cite illium2020surgical %}.
