Build your own Voice Generator w/ Tacotron 2: Text-To-Speech (TTS) From Scratch
Автор: Priyam Mazumdar
Загружено: 2025-10-12
Просмотров: 1138
Code: https://github.com/priyammaz/PyTorch-...
Building Neural Networks that can speak has been a crucial aspect of making artificial intelligence more interactive. This technology shows up everywhere like Siri and Alexa. So lets see how these TTS systems (Text-To-Speech) actually work! Today we focus on one of the earliest examples, Tacotron2!
Please take a look at the following references as well!
Kaituoxu: https://github.com/kaituoxu/Tacotron2...
Nvidia: https://github.com/NVIDIA/tacotron2/t...
Prereqs:
I hope you already know some audio basics! If not take a look at my audio processing fundamentals: • Intro to Audio Processing for Deep Learning
Timestamps:
00:00:00 - Introduction to TTS and Tacotron2
00:07:50 - LJSpeech Dataset
00:14:00 - Character Tokenizer
00:19:30 - Method to load/normalize audio
00:28:22 - Compute Mel Spectrograms from Audio
00:45:30 - Inverse Mel Specs to Waveforms w/ Griffin Lim
00:51:10 - Write the TTS Dataset class
00:54:50 - Write the data collator
01:06:30 - BatchSampler for Efficiency
01:12:20 - Start the Tacotron Model
01:13:00 - Linear and Conv layers w/ Custom Inits
01:16:38 - Character Encoder
01:26:40 - Prenet Mel Projection
01:32:00 - What is Location Sensitive Attention (Bahdanau Attention)
01:42:45 - Implementing Location Sensitive Attention
01:59:00 - Postnet to learn residuals
02:00:45 - Setup Decoder Module
02:05:37 - Initializing the Decoder
02:10:06 - Define a single decoding step
02:19:30 - Forward method through T_dec decoding steps
02:22:55 - What is Teacher Forcing?
02:25:35 - Complete the forward loop and store outputs
02:28:00 - Write an inference method
02:31:55 - Draw the forward pass out
02:39:21 - Final Tacotron Model
02:40:10 - Write a Training Script
02:49:22 - Debugging
02:52:50 - Alignments Results
02:57:16 - Listen to Generations
03:00:25 - Limits of Griffin Lim and Neural Vocoders
Socials!
X / data_adventurer
Instagram / nixielights
Linkedin / priyammaz
Discord / discord
🚀 Github: https://github.com/priyammaz
🌐 Website: https://www.priyammazumdar.com/
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: