Deep Learning I | Minnesota Supercomputing Institute | UMN
Автор: Minnesota Supercomputing Institute | UMN
Загружено: 2025-10-15
Просмотров: 72
Topic: Deep Learning I
Date: October 14 2025
Presentation Slides and Training Materials: https://tinyurl.com/4uz6d2rv
Learn more at https://www.msi.umn.edu
00:00:00 Introduction to Deep Learning at MSI
00:00:27 Prerequisites (MSI account, Linux, Bash, Python)
00:02:40 Deep Learning Workflows (Data Processing, Model Training, Inference)
00:05:07 Hardware and Performance Considerations
00:05:22 NVIDIA GPUs (V100 and H100)
00:06:58 Deep Learning Software Stack (CUDA, cuDNN)
00:08:42 Using Modules to load the Deep Learning environment
00:11:45 Python and Conda Environments (Miniconda)
00:15:37 Installing PyTorch
00:17:30 Getting the tutorial code and data (wget, unzip)
00:21:02 How to request a GPU for interactive work on the Slurm cluster
00:22:19 Using srun for interactive GPU sessions
00:27:07 Overview of the Training Script (main.py)
00:28:44 Training the model: Running a multi-GPU job with srun
00:32:00 Discussion: Data loading and performance
00:34:30 Saving and loading model weights
00:36:20 Using a Slurm Batch Script for large jobs
00:42:00 Running the Batch Script (sbatch)
00:44:00 Monitoring the job output
00:45:50 The Inference step: Using the trained model
00:48:40 Using Open OnDemand to access the cluster
00:50:35 Accessing Jupyter Notebooks through Open OnDemand
00:54:19 Demo: Running PyTorch in a Jupyter Notebook
01:00:25 Creating and using a custom Conda Kernel for Jupyter
01:03:00 Demo: Configuring the Conda Kernel
01:08:00 Running the training code in the Jupyter Notebook
01:13:30 Model Inference in the Jupyter Notebook
01:15:40 Transfer Learning (Concept and Code)
01:17:15 Freezing layers in Transfer Learning
01:21:05 Hyperparameter Tuning with Ray Tune (Example Overview)
01:24:50 Using the Ray Tune client for remote execution
01:28:40 Introduction to Distributed Deep Learning
01:31:00 Data Parallelism vs. Model Parallelism
01:32:50 DDP (Distributed Data Parallel) with torch.distributed
01:35:45 Scaling and Multi-node considerations
01:36:30 Running the DDP Demo
01:40:00 DDP Slurm script
01:42:00 The Future of Deep Learning at MSI (H100 and Agate)
01:44:10 Monitoring Tools (e.g., Weights & Biases)
01:45:45 Computer Vision Example (Point Cloud Data)
01:51:30 MSI Resources and Help
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: