LLAMP: Assessing Network Latency Sensitivity Tolerance of HPC Applications with Linear Programming

Автор: Scalable Parallel Computing Lab, SPCL @ ETH Zurich

Загружено: 2025-07-06

Просмотров: 137

Описание:

Paper Title: LLAMP: Assessing Network Latency Sensitivity Tolerance of HPC Applications with Linear Programming
Conference: SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis
Speaker: Siyuan Shen
Authors: Siyuan Shen, Langwen Huang, Marcin Chrapek, Timo Schneider, Jai Dayal, Manisha Gajbe, Robert Wisniewski, Torsten Hoefler
Abstract:
The shift towards high-bandwidth networks driven by AI workloads in data centers and HPC clusters has unintentionally aggravated network latency, adversely affecting the performance of communication-intensive HPC applications. As large-scale MPI applications often exhibit significant differences in their network latency tolerance, it is crucial to accurately determine the extent of network latency an application can withstand without significant performance degradation. Current approaches to assessing this metric often rely on specialized hardware or network simulators, which can be inflexible and time-consuming. In response, we introduce LLAMP, a novel toolchain that offers an efficient, analytical approach to evaluating HPC applications' network latency tolerance using the LogGPS model and linear programming. LLAMP equips software developers and network architects with essential insights for optimizing HPC infrastructures and strategically deploying applications to minimize latency impacts. Through our validation on a variety of MPI applications like MILC, LULESH, and LAMMPS, we demonstrate our tool's high accuracy, with relative prediction errors generally below 2%. Additionally, we include a case study of the ICON weather and climate model to illustrate LLAMP's broad applicability in evaluating collective algorithms and network topologies.

Learn more: conference paper: https://dl.acm.org/doi/10.1109/SC4140...

#SC24 #HPC #LinearProgramming #SenstivityAnalysis #PerformanceModeling

Timestamps:
00:00 Introduction
05:33 LLAMP Toolchain
10:20 Network Latency Sensitivity
14:47 Linear Programming
21:15 Evaluation
24:24 Conclusion
25:23 Q&A

LLAMP: Assessing Network Latency Sensitivity Tolerance of HPC Applications with Linear Programming

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects

Proximity is all You Need– Two Tricks for Chiplet Interconnects

Proximity is all You Need– Two Tricks for Chiplet Interconnects

RAGvsFinetuning

RAGvsFinetuning

ARMing GPUs for Impactful Science with the GH200 Superchip (and true Exascale)

ARMing GPUs for Impactful Science with the GH200 Superchip (and true Exascale)

[SPCL_Bcast] Data Selection - Data Challenges when Training Generative Models

[SPCL_Bcast] Data Selection - Data Challenges when Training Generative Models

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC

EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC

Цепи Маркова — математика предсказаний [Veritasium]

Цепи Маркова — математика предсказаний [Veritasium]

Уязвимости в современных JavaScript-фреймворках на примере React, Vue и Angular / А. Важинская

Уязвимости в современных JavaScript-фреймворках на примере React, Vue и Angular / А. Важинская

Сисадмины больше не нужны? Gemini настраивает Linux сервер и устанавливает cтек N8N. ЭТО ЗАКОННО?

Сисадмины больше не нужны? Gemini настраивает Linux сервер и устанавливает cтек N8N. ЭТО ЗАКОННО?

Чем ОПАСЕН МАХ? Разбор приложения специалистом по кибер безопасности

Чем ОПАСЕН МАХ? Разбор приложения специалистом по кибер безопасности

Первый взгляд на новый одноплатный компьютер Orange pi 4 pro. Тест производительности.

Первый взгляд на новый одноплатный компьютер Orange pi 4 pro. Тест производительности.

Как удалить следы SUNO.AI (МАСТЕР SUNO)

Как удалить следы SUNO.AI (МАСТЕР SUNO)

Код работает в 100 раз медленнее из-за ложного разделения ресурсов.

Код работает в 100 раз медленнее из-за ложного разделения ресурсов.

[SPCL_Bcast] Measurement and Analysis of Application Performance on Exascale GPU-accelerated Systems

[SPCL_Bcast] Measurement and Analysis of Application Performance on Exascale GPU-accelerated Systems

The Windows 11 Disaster That's Killing Microsoft

The Windows 11 Disaster That's Killing Microsoft

Kubernetes — Простым Языком на Понятном Примере

Kubernetes — Простым Языком на Понятном Примере

Microsoft begs for mercy

Microsoft begs for mercy

ARMing GPUs: On the Memory Subsystem of Grace Hopper GH200

ARMing GPUs: On the Memory Subsystem of Grace Hopper GH200

Reasoning Language Models Will Solve All Our Problems (given the right machines)

Reasoning Language Models Will Solve All Our Problems (given the right machines)