website/_posts/research/2021-03-05-Vision_Transformer.md

842 B

layout, title, categories, excerpt, header
layout title categories excerpt header
single Mel-Vision Transformer research audio deep-learning anomalie-detection Attention based audio classification on Mel-Spektrograms
teaser
assets/figures/12_vision_transformer_teaser.jpg

Approach{:style="display:block; width:80%" .align-center}

This work utilizes the vision transformer model on mel-spectrogram audio data, enhanced by mel-based data augmentation and sample weighting, to achieve notable performance in the ComParE21 challenge, surpassing many single model baselines. The introduction of overlapping vertical patching and the analysis of parameter configurations further refine the approach, demonstrating the model's adaptability and effectiveness in audio processing tasks. {% cite illium2021visual %}