Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

Stanford CS25: V1 I Audio Research: Transformers for Applications in Audio, Speech, Music

Автор: Stanford Online

Загружено: 2022-07-18

Просмотров: 12808

Описание:

Transformers have touched many fields of research and audio and music is no different. This talk will present 3 of my papers as a case study done, on how we can leverage powerfulness of Transformers, with that of representation learning, signal processing and clustering. For the first part, we would discuss how were able to beat wildly popular wavenet architecture, proposed by Google-DeepMind in raw audio synthesis. We would also show how we overcame, the quadratic constraint of the Transformers by conditioning on the context itself. Secondly, a version of Audio Transformers for large scale audio understanding, which is inspired by viT, operating on raw waveforms is presented. It combines powerful ideas from traditional signal processing, aka wavelets on intermediate transformer embeddings to produce state of the art results. Investigating into the front-end to see why they do so well, we show they learn auditory filter-bank which in a way adapts time-frequency representation according to a task which makes machine listening really cool. Finally, for the third part, the powerfulness of operating on latent code, and discuss language modeling on continuous audio signals using discrete tokens will be discussed. This will describe how simple unsupervised tasks can give us strong competitive results compared with that of end-to-end supervision. We will give an overview of some recent trends in the field and papers by Google, OpenAI etc about the current “fashion”. This work was done in collaboration with Prof. Chris Chafe, Prof. Jonathan Berger and Prof. Julius Smith, all at the Center for Computer Research in Music and Acoustics at Stanford University. Thanks to Stanford’s Human Centered AI for supporting this work, by a generous Google cloud computing grant.

Prateek Verma is currently a research assistant working with Prof. Anshul Kundaje in the Department of Computer Science and Genomics. He works on modeling genomic sequences using machine learning, tackling long sequences, and developing techniques to understand them. He also splits his time working on audio research at Stanford’s Center for Computer Research in Music and Acoustics, with Prof. Chris Chafe, Prof. Jonathan Berger and Prof. Julius Smith. He got his Master's degree from Stanford, and before that, he was at IIT Bombay. He loves biking, hiking, and playing sports.

View the entire CS25 Transformers United playlist:    • Stanford CS25 - Transformers United  

0:00 Introduction
0:06 Transformers for Music and Audio: Language Modelling to Understanding to Synthesis
1:35 The Transformer Revolution
5:02 Models getting bigger ...
7:43 What are spectograms
14:30 Raw Audio Synthesis: Difficulty Classical FM synthesis Karplus Strong
17:14 Baseline : Classic WaveNet
20:04 Improving Transformer Baseline • Major bottleneck of Transformers
21:02 Results & Unconditioned Setup • Evaluation Criterion o Comparing Wavenet, Transformers on next sample prediction Top-5 accuracy, out of 256 possible states as a error metric Why this setup 7 1. Application agnostic 2. Suits training setup
22:11 A Framework for Generative and Contrastive Learning of Audio Representations
22:38 Acoustic Scene Understanding
24:34 Recipe of doing
26:00 Turbocharging best of two worlds Vector Quantization: A powerful and under-uilized algorithm Combining VQwih auto-encoders and Transformers
33:24 Turbocharging best of two worlds Leaming clusters from vector quantization Use long term dependency kaming with that cluster based representation for markovian assumption Better we become in prediction, the better the summarization is
37:06 Audio Transformers: Transformer Architectures for Large Scale Audio Understanding - Adieu Convolutions Stanford University March 2021
38:45 Wavelets on Transformer Embeddings
41:20 Methodology + Results
44:04 What does it learn -- the front end
47:18 Final Thoughts

Stanford CS25: V1 I Audio Research: Transformers for Applications in Audio, Speech, Music

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

Stanford CS25: V2 I Represent part-whole hierarchies in a neural network, Geoff Hinton

Stanford CS25: V2 I Represent part-whole hierarchies in a neural network, Geoff Hinton

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

3577. Count the Number of Computer Unlocking Permutations | Leetcode Daily - Python

3577. Count the Number of Computer Unlocking Permutations | Leetcode Daily - Python

Stanford CS25: V5 I Transformers for Video Generation, Andrew Brown of Meta

Stanford CS25: V5 I Transformers for Video Generation, Andrew Brown of Meta

Stanford CS25: V1 I DeepMind's Perceiver and Perceiver IO: new data family architecture

Stanford CS25: V1 I DeepMind's Perceiver and Perceiver IO: new data family architecture

Почему «Трансформеры» заменяют CNN?

Почему «Трансформеры» заменяют CNN?

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Stanford CS25: V5 I Overview of Transformers

Stanford CS25: V5 I Overview of Transformers

Вся IT-база в ОДНОМ видео: Память, Процессор, Код

Вся IT-база в ОДНОМ видео: Память, Процессор, Код

Краткое объяснение больших языковых моделей

Краткое объяснение больших языковых моделей

ДНК создал Бог? Самые свежие научные данные о строении. Как работает информация для жизни организмов

ДНК создал Бог? Самые свежие научные данные о строении. Как работает информация для жизни организмов

4 Hours Chopin for Studying, Concentration & Relaxation

4 Hours Chopin for Studying, Concentration & Relaxation

Изображение стоит 16x16 слов: Трансформеры для масштабного распознавания изображений (с пояснения...

Изображение стоит 16x16 слов: Трансформеры для масштабного распознавания изображений (с пояснения...

Stanford AI Club: Jeff Dean on Important AI Trends

Stanford AI Club: Jeff Dean on Important AI Trends

Stanford CS25: V1 I Transformers in Vision: Tackling problems in Computer Vision

Stanford CS25: V1 I Transformers in Vision: Tackling problems in Computer Vision

GraphRAG: союз графов знаний и RAG: Эмиль Эйфрем

GraphRAG: союз графов знаний и RAG: Эмиль Эйфрем

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Stanford CS25: V1 I Transformers United: Модели DL, которые произвели революцию в области естеств...

Stanford CS25: V1 I Transformers United: Модели DL, которые произвели революцию в области естеств...

Что я нашла в меню людей, которые в 50 выглядят на 35. Еда, которая уменьшает биологический возраст.

Что я нашла в меню людей, которые в 50 выглядят на 35. Еда, которая уменьшает биологический возраст.

«Вся математика — это тайная обработка изображений»

«Вся математика — это тайная обработка изображений»

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]