website/2020-10-25-surgical-mask-detection.md at 755fd297bb3a54f1e2c9442660f855548411219b

steffen/website

Fork 0

Steffen Illium 755fd297bb Website overhaul

2025-03-27 22:57:31 +01:00

2.4 KiB

Raw Blame History

layout, title, categories, tags, excerpt, header, scholar_link

layout

title

tags

excerpt

header

scholar_link

single

Surgical-Mask Detection

research

audio-classification deep-learning data-augmentation computer-vision paralinguistics

CNN mask detection in speech using augmented spectrograms.

teaser
/assets/figures/7_mask_models.jpg

https://scholar.google.de/citations?user=NODAd94AAAAJ&hl=en

This study investigates the efficacy of various data augmentation techniques applied directly to mel-spectrogram representations of audio data for improving classification performance. The specific task addressed is the detection of surgical mask usage based on human speech signals, a relevant problem in paralinguistics and audio analysis.

We systematically evaluated the impact of data augmentation when training Convolutional Neural Networks (CNNs) for this binary classification task. The input to the networks consisted of mel-spectrograms derived from voice samples. The effectiveness of augmentation strategies (such as frequency masking, time masking, or combined approaches like SpecAugment) was assessed across four different CNN architectures.

Examples of mel-spectrograms of speech with and without a surgical mask

Mel-spectrogram representations of speech signals used as input for CNNs.

The core finding of this research is that applying appropriate data augmentation directly to the spectrogram inputs significantly enhances the performance and generalization capabilities of the CNN models for surgical mask detection. The augmented models demonstrated improved accuracy, robustness, and notably surpassed many established benchmark results from the relevant ComParE (Computational Paralinguistics Challenge) tasks. This highlights the importance of data augmentation as a crucial component in building effective deep learning models for audio classification, particularly when dealing with limited or variable datasets. For a detailed description of the methods and results, please refer to {% cite illium2020surgical %}.

Diagrams illustrating the different CNN architectures tested

Overview of the different Convolutional Neural Network architectures evaluated.

2.4 KiB Raw Blame History

2.4 KiB

Raw Blame History