Deep Audio Representation Learning for Music Using Weak Supervision

Автор: MusicTechnologyGroup

Загружено: 2024-10-14

Просмотров: 263

Описание:

PhD thesis defense of Pablo Alonso
October 3rd, 2024

Abstract:

Music audio tagging is the Music Information Retrieval task of assigning one or multiple labels to an audio signal. Music tagging systems are essential for developing applications involving cataloging, retrieval, or recommendation, so enhancing the accuracy, robustness, and efficiency of these models is beneficial for many real-world music applications. Current state-of-the-art music tagging systems rely on deep learning approaches, which offer high performance but also introduce challenges due to their large data requirements and tendency to overfit. In this thesis, we propose addressing music tagging from the perspective of representation learning to alleviate these limitations.

The goal of representation learning is to design pre-training objectives that make the learned representations suitable for several downstream tasks. When the representations are well-suited to the downstream task, it is often possible to achieve good performance using shallow models that require few resources to train and run. Additionally, using a single representation model to feed several shallow models is more efficient than having individual end-to-end models for each task, and enables addressing new related tasks with little additional effort.

Our work starts by investigating the capabilities of the representations learned by competitive music and audio tagging systems and evaluating their capabilities on out-of-distribution data, finding that pre-trained representations provide generalization benefits. To support the rest of this thesis, we create a large-scale dataset matched to Discogs' open music metadata that we use to develop novel representation models. Then, we investigate the effectiveness of using editorial and consumption metadata (such as artist names and playlists) as a source of supervision, showing that this information favors downstream performance without the need for explicit annotations which are typically much harder to obtain.

After this, we look into the transformer architecture, proposing design choices that optimize its performance for music representation learning. In our last contribution, we propose adapting existing audio interpretability strategies to operate with pretrained representations, thus contributing to more insightful music classification models.

Finally, this work is carried out in the context of Essentia an open-source library and collection of models for audio and music analysis. The techniques and models developed in this thesis are openly available as part of Essentia and have already been used both by the research community and industry.

Deep Audio Representation Learning for Music Using Weak Supervision

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Violin Performance Analysis using Weak Supervision

Violin Performance Analysis using Weak Supervision

Introduction to Representation learning: Approaches, Challenges and Applications

Introduction to Representation learning: Approaches, Challenges and Applications

Music Identification with Audio Fingerprinting. An Industrial Perspective

Music Identification with Audio Fingerprinting. An Industrial Perspective

Design, Development, and Deployment of Real-Time Drum Accompaniment Systems

Design, Development, and Deployment of Real-Time Drum Accompaniment Systems

NotebookLM: твой AI наставник в самообучение

NotebookLM: твой AI наставник в самообучение

Studying a Musical Repertoire with Computational Approaches: The Case of Carnatic Music

Studying a Musical Repertoire with Computational Approaches: The Case of Carnatic Music

Алексей Савватеев: кто уничтожил образование в России?

Алексей Савватеев: кто уничтожил образование в России?

ЛУЧШАЯ БЕСПЛАТНАЯ НЕЙРОСЕТЬ Google, которой нет аналогов

ЛУЧШАЯ БЕСПЛАТНАЯ НЕЙРОСЕТЬ Google, которой нет аналогов

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

ГЛАВНЫЕ правила переговоров. СЕКРЕТ адвоката дьявола — Александр Добровинский.

ГЛАВНЫЕ правила переговоров. СЕКРЕТ адвоката дьявола — Александр Добровинский.

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Усилители класса D против High End

Усилители класса D против High End

Analyzing Singing Voice Expressivity: Focus on Singing Voice Musical Dynamics

Analyzing Singing Voice Expressivity: Focus on Singing Voice Musical Dynamics

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

КОЗЫРЕВ - астрофизик ДОКАЗАЛ, что ВРЕМЯ это ЭНЕРГИЯ: дважды СИДЕЛ, приговорён к РАССТРЕЛУ

КОЗЫРЕВ - астрофизик ДОКАЗАЛ, что ВРЕМЯ это ЭНЕРГИЯ: дважды СИДЕЛ, приговорён к РАССТРЕЛУ

A journey through generative music AI with Valerio Velardo

A journey through generative music AI with Valerio Velardo

Embedded Machine Learning in Musical Instrument Design, by Chris Kiefer

Embedded Machine Learning in Musical Instrument Design, by Chris Kiefer

Как Перельман доказал гипотезу Пуанкаре? // 900 секунд

Как Перельман доказал гипотезу Пуанкаре? // 900 секунд

«Open AI — это пузырь»! Откровения из Кремниевой долины | Братья Либерманы

«Open AI — это пузырь»! Откровения из Кремниевой долины | Братья Либерманы

$100.000 — Твой Ключ К Миллиону! Секрет Который Изменит Твою ЖИЗНЬ НАВСЕГДА! | Чарли Мангер

$100.000 — Твой Ключ К Миллиону! Секрет Который Изменит Твою ЖИЗНЬ НАВСЕГДА! | Чарли Мангер