Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

Sharing is Caring: Efficient LM Post-Trainingwith Collective RL Experience Sharing

Автор: Mayuresh Shilotri

Загружено: 2026-01-12

Просмотров: 0

Описание:

Paper: https://arxiv.org/abs/2509.08721v1

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing

Jeffrey Amico, Gabriel Passamani Andrade, John Donaghy, Ben Fielding, Tristin Forbus, Harry Grieve, Semih Kara, Jari Kolehmainen, Yihua Lou, Christopher Nies, Edward Phillip Flores Nuño, Diogo Ortega, Shikhar Rastogi, Austin Virts, Matthew J. Wright

Post-training language models (LMs) with reinforcement learning (RL) can enhance their complex reasoning capabilities without supervised fine-tuning, as demonstrated by DeepSeek-R1-Zero. However, effectively utilizing RL for LMs requires significant parallelization to scale-up inference, which introduces non-trivial technical challenges (e.g. latency, memory, and reliability) alongside ever-growing financial costs. We present Swarm sAmpling Policy Optimization (SAPO), a fully decentralized and asynchronous RL post-training algorithm. SAPO is designed for decentralized networks of heterogenous compute nodes, where each node manages its own policy model(s) while "sharing" rollouts with others in the network; no explicit assumptions about latency, model homogeneity, or hardware are required and nodes can operate in silo if desired. As a result, the algorithm avoids common bottlenecks in scaling RL post-training while also allowing (and even encouraging) new possibilities. By sampling rollouts "shared" across the network, it enables "Aha moments" to propagate, thereby bootstrapping the learning process. In this paper we show SAPO achieved cumulative reward gains of up to 94% in controlled experiments. We also share insights from tests on a network with thousands of nodes contributed by Gensyn community members running the algorithm on diverse hardware and models during an open-source demo.

Welcome to the Mayuresh Shilotri's Youtube . Maintained by Mayuresh Shilotri

You can follow me at
Blog - https://shilotri.com/
LinkedIn -   / mayureshshilotri  
Twitter -   / mshilotri  

Note: I only claim to have read the research paper and created a Video using AI tool. I am not the author. All intellectual heavy lifting was performed by the respective authors. 🙏

Sharing is Caring: Efficient LM Post-Trainingwith Collective RL Experience Sharing

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

Reinforcement Learning with Verifiable Rewards - Teaching LLMs to Solve Problems

Reinforcement Learning with Verifiable Rewards - Teaching LLMs to Solve Problems

Modeling Others' Minds as Code

Modeling Others' Minds as Code

Оптимизация проксимальной политики (PPO) — как обучать большие языковые модели

Оптимизация проксимальной политики (PPO) — как обучать большие языковые модели

Think Faster, Talk Smarter with Matt Abrahams

Think Faster, Talk Smarter with Matt Abrahams

The FASTEST introduction to Reinforcement Learning on the internet

The FASTEST introduction to Reinforcement Learning on the internet

Как находить и оценивать идеи для стартапов | Стартап-школа

Как находить и оценивать идеи для стартапов | Стартап-школа

What is Artificial Intelligence? Simple explanation for beginners

What is Artificial Intelligence? Simple explanation for beginners

System Design Concepts Course and Interview Prep

System Design Concepts Course and Interview Prep

Abstract Purple & Blue Wave patterns Background | 2 hours 4k Screensaver | Neon Lines

Abstract Purple & Blue Wave patterns Background | 2 hours 4k Screensaver | Neon Lines

Fine-tuning a Small Language Model for browser control with GRPO and OpenEnv

Fine-tuning a Small Language Model for browser control with GRPO and OpenEnv

The World's Most Important Machine

The World's Most Important Machine

Do AI Models Perform Human-like Abstract Reasoning Across Modalities

Do AI Models Perform Human-like Abstract Reasoning Across Modalities

ЛУЧШАЯ БЕСПЛАТНАЯ НЕЙРОСЕТЬ Google, которой нет аналогов

ЛУЧШАЯ БЕСПЛАТНАЯ НЕЙРОСЕТЬ Google, которой нет аналогов

Запуск нейросетей локально. Генерируем - ВСЁ

Запуск нейросетей локально. Генерируем - ВСЁ

How are holograms possible?

How are holograms possible?

The Strange Math That Predicts (Almost) Anything

The Strange Math That Predicts (Almost) Anything

Самая сложная модель из тех, что мы реально понимаем

Самая сложная модель из тех, что мы реально понимаем

9 AI-навыков, которые должен освоить каждый в 2026 году

9 AI-навыков, которые должен освоить каждый в 2026 году

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 1: Class Intro

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 1: Class Intro

Сисадмины больше не нужны? Gemini настраивает Linux сервер и устанавливает cтек N8N. ЭТО ЗАКОННО?

Сисадмины больше не нужны? Gemini настраивает Linux сервер и устанавливает cтек N8N. ЭТО ЗАКОННО?

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: infodtube@gmail.com