Classify Images with a Vision Transformer (ViT): PyTorch Deep Learning Tutorial
Автор: Luke Ditria
Загружено: 2024-06-19
Просмотров: 4674
TIMESTAMPS
00:00 Introduction
00:28 Overview of Vision Transformers
00:43 Reference to "An Image is Worth 16x16 Words" Paper
01:50 Comparison with CNNs
03:00 Explanation of Transformer Blocks
04:41 Network Implementation
05:18 Forward Pass
07:43 Model Instantiation
08:19 Training Process
08:52 Training Results
09:12 Significance of Vision Transformers
09:31 Visualization of Positional Embeddings
10:30 Future Directions and Conclusion
In this Pytorch Tutorial video I introduce the Vision Transformer model! By simply splitting our image into patches we can use Encoder-Only Transformers to perform image classification!
An Image is Worth 16x16 words:
https://arxiv.org/pdf/2010.11929
Donations, Help Support this work!
https://www.buymeacoffee.com/lukeditria
The corresponding code is available here! ( Section 14)
https://github.com/LukeDitria/pytorch...
Discord Server:
/ discord
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: