website/_posts/research/2021-03-05-Vision_Transformer.md

---
layout: single
title:  "Mel-Vision Transformer"
categories: research audio deep-learning anomalie-detection
excerpt: "Attention based audio classification on Mel-Spektrograms"
header:
  teaser: assets/figures/12_vision_transformer_teaser.jpg
---

![Approach](\assets\figures\12_vision_transformer_models.jpg){:style="display:block; width:80%" .align-center}

This work utilizes the vision transformer model on mel-spectrogram audio data, enhanced by mel-based data augmentation and sample weighting, to achieve notable performance in the ComParE21 challenge, surpassing many single model baselines. The introduction of overlapping vertical patching and the analysis of parameter configurations further refine the approach, demonstrating the model's adaptability and effectiveness in audio processing tasks.
{% cite illium2021visual %}