Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

🧪 Live Demo: Training LLMs with RFT in the Predibase SDK (Tool Use + Reward Functions Explained)

Автор: Predibase

Загружено: 2025-04-08

Просмотров: 167

Описание:

In this live walkthrough, we show you exactly how to train an LLM using Reinforcement Fine-Tuning (RFT) in the Predibase SDK—and how to monitor performance using our built-in observability tools.

You'll learn how to:
✅ Set up your dataset and prompts
✅ Define custom reward functions for correctness, format, and length
✅ Use the Predibase SDK to launch a fine-tuning job
✅ View reward graphs, logs, and completions in the RFT dashboard
✅ Update reward functions live during training — no restart needed!

We walk through a real-world function calling task using the Glaive dataset, where the model must select the correct tool based on a user prompt (e.g., get stock price, create calendar event).

🔍 Unlike traditional SFT, RFT lets you define flexible, dynamic rules (e.g., Think/Tool tags, argument parsing, completion length) and reward models accordingly — even with minimal labeled data.

This is a must-watch if you're:
💡 Building agentic systems
💻 Customizing open-source LLMs
⚙️ Designing robust inference + training stacks
📉 Looking to reduce data labeling costs

👉 Try Reinforcement Fine-Tuning on your own task: https://pbase.ai/4brbC8u
🔗 Watch the full video and get a notebook link:    • 🔥 Live Demo: Reinforcement Fine-Tuning for...  
🔔 SUBSCRIBE for the latest on LLM fine-tuning, AI scaling, and reinforcement learning hacks!
👉   / @predibase  
👉 Schedule a live demo: https://pbase.ai/41FZKfy
👉 Learn more: https://pbase.ai/Intro-RFT-platform

00:00 - Intro: Setting up RFT in the Predibase SDK
01:15 - Loading the Glaive function calling dataset
02:05 - Prompt and tool call structure explained
03:00 - What makes a reward function in Predibase
04:30 - Writing the correctness reward function (Python)
06:15 - Writing the formatting reward function
07:35 - Adding a completion length constraint
08:50 - Launching the RFT job via the SDK
10:00 - Defining GRPO config and training parameters
11:45 - Assigning and packaging reward functions
12:30 - Job launched! Switching to the UI
13:15 - Exploring the Reward Functions tab
14:25 - Viewing Reward Graphs and interpreting metrics
15:35 - Using logs to debug your reward functions
16:40 - Completions Viewer: Compare model generations by epoch
18:10 - Updating a reward function live during training
19:25 - Adding a more flexible length function (sliding scale)
20:30 - Pushing live updates to running RFT job
21:10 - Summary: Why Predibase RFT simplifies LLM fine-tuning

#predibase #rft #llmtraining #functioncalling #reinforcementfinetuning #ai #machinelearning #mlengineering #rlhf #opensourcellms #customllm #fewshotlearning #agenticai #pythonai #AIDevTools #llmops #observability #RewardFunctions #finetuning #dataefficiency #aiinfrastructure

🧪 Live Demo: Training LLMs with RFT in the Predibase SDK (Tool Use + Reward Functions Explained)

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

array(10) { [0]=> object(stdClass)#7527 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "tHvZypO3le4" ["related_video_title"]=> string(105) "🔥 Live Demo: Reinforcement Fine-Tuning for LLMs — Build Smarter Models with Less Data l Tutorial" ["posted_time"]=> string(25) "2 месяца назад" ["channelName"]=> string(9) "Predibase" } [1]=> object(stdClass)#7500 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "MQis5kQ99mw" ["related_video_title"]=> string(45) "CREATE Your Own Dataset Like a Pro in 30 mins" ["posted_time"]=> string(27) "8 месяцев назад" ["channelName"]=> string(15) "Prompt Engineer" } [2]=> object(stdClass)#7525 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "CI7gfiQIca4" ["related_video_title"]=> string(72) "Microsoft Sentinel: Create Your First Analytic Rule (Step-by-Step Guide)" ["posted_time"]=> string(26) "52 минуты назад" ["channelName"]=> string(12) "IT Professor" } [3]=> object(stdClass)#7532 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "mojZpktAiYQ" ["related_video_title"]=> string(84) "A Complete Guide To Vercel’s AI SDK // The ESSENTIAL Tool For Shipping AI Apps" ["posted_time"]=> string(27) "5 месяцев назад" ["channelName"]=> string(11) "Matt Pocock" } [4]=> object(stdClass)#7511 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "i40tCb7bkmg" ["related_video_title"]=> string(75) "What is Reinforcement Fine-Tuning (RFT) - Supervised vs. RL LLM Re-training" ["posted_time"]=> string(25) "3 месяца назад" ["channelName"]=> string(37) "What's AI by Louis-François Bouchard" } [5]=> object(stdClass)#7529 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "vlG3DbZSIqw" ["related_video_title"]=> string(94) "Как MCP улучшает Cursor AI в 10x раз? И что вообще это такое?" ["posted_time"]=> string(25) "2 месяца назад" ["channelName"]=> string(65) "Ivan Abramov: стартап разборы | Гроус хаки" } [6]=> object(stdClass)#7524 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "QdEuh2UVbu0" ["related_video_title"]=> string(45) "DeepSeek R1 Theory Overview | GRPO + RL + SFT" ["posted_time"]=> string(25) "4 месяца назад" ["channelName"]=> string(25) "Deep Learning with Yacine" } [7]=> object(stdClass)#7534 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "22tkx79icy4" ["related_video_title"]=> string(55) "RAG | САМОЕ ПОНЯТНОЕ ОБЪЯСНЕНИЕ!" ["posted_time"]=> string(23) "1 месяц назад" ["channelName"]=> string(8) "AI RANEZ" } [8]=> object(stdClass)#7510 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "LCEmiRjPEtQ" ["related_video_title"]=> string(45) "Andrej Karpathy: Software Is Changing (Again)" ["posted_time"]=> string(19) "3 дня назад" ["channelName"]=> string(12) "Y Combinator" } [9]=> object(stdClass)#7528 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "kFQb6qobPoc" ["related_video_title"]=> string(98) "FASTEST Finetuning with Unsloth in 30 Minutes – Real World Example Fine Tuning SQUAD Dataset" ["posted_time"]=> string(25) "2 месяца назад" ["channelName"]=> string(19) "DS-AI with Khanh Vy" } }
🔥 Live Demo: Reinforcement Fine-Tuning for LLMs — Build Smarter Models with Less Data l Tutorial

🔥 Live Demo: Reinforcement Fine-Tuning for LLMs — Build Smarter Models with Less Data l Tutorial

CREATE Your Own Dataset Like a Pro in 30 mins

CREATE Your Own Dataset Like a Pro in 30 mins

Microsoft Sentinel: Create Your First Analytic Rule (Step-by-Step Guide)

Microsoft Sentinel: Create Your First Analytic Rule (Step-by-Step Guide)

A Complete Guide To Vercel’s AI SDK // The ESSENTIAL Tool For Shipping AI Apps

A Complete Guide To Vercel’s AI SDK // The ESSENTIAL Tool For Shipping AI Apps

What is Reinforcement Fine-Tuning (RFT) - Supervised vs. RL LLM Re-training

What is Reinforcement Fine-Tuning (RFT) - Supervised vs. RL LLM Re-training

Как MCP улучшает Cursor AI в 10x раз? И что вообще это такое?

Как MCP улучшает Cursor AI в 10x раз? И что вообще это такое?

DeepSeek R1 Theory Overview | GRPO + RL + SFT

DeepSeek R1 Theory Overview | GRPO + RL + SFT

RAG | САМОЕ ПОНЯТНОЕ ОБЪЯСНЕНИЕ!

RAG | САМОЕ ПОНЯТНОЕ ОБЪЯСНЕНИЕ!

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

FASTEST Finetuning with Unsloth in 30 Minutes – Real World Example Fine Tuning SQUAD Dataset

FASTEST Finetuning with Unsloth in 30 Minutes – Real World Example Fine Tuning SQUAD Dataset

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]