website/2021-03-05-Vision_Transformer.md at 2b75326eac0aa7c6b8cac33c53833550283e588a

steffen/website

Fork 0

Steffen Illium da72fdcf7f general overhaul, better images, better texts

2024-11-10 12:17:02 +01:00

842 B

Raw Blame History

layout, title, categories, excerpt, header

layout

title

categories

excerpt

header

single

Mel-Vision Transformer

research audio deep-learning anomalie-detection

Attention based audio classification on Mel-Spektrograms

teaser
assets/figures/12_vision_transformer_teaser.jpg

{:style="display:block; width:80%" .align-center}

This work utilizes the vision transformer model on mel-spectrogram audio data, enhanced by mel-based data augmentation and sample weighting, to achieve notable performance in the ComParE21 challenge, surpassing many single model baselines. The introduction of overlapping vertical patching and the analysis of parameter configurations further refine the approach, demonstrating the model's adaptability and effectiveness in audio processing tasks. {% cite illium2021visual %}

842 B Raw Blame History

842 B

Raw Blame History