🧪 Live Demo: Training LLMs with RFT in the Predibase SDK (Tool Use + Reward Functions Explained)

Автор: Predibase

Загружено: 2025-04-08

Просмотров: 167

Описание:

In this live walkthrough, we show you exactly how to train an LLM using Reinforcement Fine-Tuning (RFT) in the Predibase SDK—and how to monitor performance using our built-in observability tools.

You'll learn how to:
✅ Set up your dataset and prompts
✅ Define custom reward functions for correctness, format, and length
✅ Use the Predibase SDK to launch a fine-tuning job
✅ View reward graphs, logs, and completions in the RFT dashboard
✅ Update reward functions live during training — no restart needed!

We walk through a real-world function calling task using the Glaive dataset, where the model must select the correct tool based on a user prompt (e.g., get stock price, create calendar event).

🔍 Unlike traditional SFT, RFT lets you define flexible, dynamic rules (e.g., Think/Tool tags, argument parsing, completion length) and reward models accordingly — even with minimal labeled data.

This is a must-watch if you're:
💡 Building agentic systems
💻 Customizing open-source LLMs
⚙️ Designing robust inference + training stacks
📉 Looking to reduce data labeling costs

👉 Try Reinforcement Fine-Tuning on your own task: https://pbase.ai/4brbC8u
🔗 Watch the full video and get a notebook link: • 🔥 Live Demo: Reinforcement Fine-Tuning for...
🔔 SUBSCRIBE for the latest on LLM fine-tuning, AI scaling, and reinforcement learning hacks!
👉 / @predibase
👉 Schedule a live demo: https://pbase.ai/41FZKfy
👉 Learn more: https://pbase.ai/Intro-RFT-platform

00:00 - Intro: Setting up RFT in the Predibase SDK
01:15 - Loading the Glaive function calling dataset
02:05 - Prompt and tool call structure explained
03:00 - What makes a reward function in Predibase
04:30 - Writing the correctness reward function (Python)
06:15 - Writing the formatting reward function
07:35 - Adding a completion length constraint
08:50 - Launching the RFT job via the SDK
10:00 - Defining GRPO config and training parameters
11:45 - Assigning and packaging reward functions
12:30 - Job launched! Switching to the UI
13:15 - Exploring the Reward Functions tab
14:25 - Viewing Reward Graphs and interpreting metrics
15:35 - Using logs to debug your reward functions
16:40 - Completions Viewer: Compare model generations by epoch
18:10 - Updating a reward function live during training
19:25 - Adding a more flexible length function (sliding scale)
20:30 - Pushing live updates to running RFT job
21:10 - Summary: Why Predibase RFT simplifies LLM fine-tuning

#predibase #rft #llmtraining #functioncalling #reinforcementfinetuning #ai #machinelearning #mlengineering #rlhf #opensourcellms #customllm #fewshotlearning #agenticai #pythonai #AIDevTools #llmops #observability #RewardFunctions #finetuning #dataefficiency #aiinfrastructure

🧪 Live Demo: Training LLMs with RFT in the Predibase SDK (Tool Use + Reward Functions Explained)

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

array(10) { [0]=> object(stdClass)#7527 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "tHvZypO3le4" ["related_video_title"]=> string(105) "🔥 Live Demo: Reinforcement Fine-Tuning for LLMs — Build Smarter Models with Less Data l Tutorial" ["posted_time"]=> string(25) "2 месяца назад" ["channelName"]=> string(9) "Predibase" } [1]=> object(stdClass)#7500 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "MQis5kQ99mw" ["related_video_title"]=> string(45) "CREATE Your Own Dataset Like a Pro in 30 mins" ["posted_time"]=> string(27) "8 месяцев назад" ["channelName"]=> string(15) "Prompt Engineer" } [2]=> object(stdClass)#7525 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "CI7gfiQIca4" ["related_video_title"]=> string(72) "Microsoft Sentinel: Create Your First Analytic Rule (Step-by-Step Guide)" ["posted_time"]=> string(26) "52 минуты назад" ["channelName"]=> string(12) "IT Professor" } [3]=> object(stdClass)#7532 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "mojZpktAiYQ" ["related_video_title"]=> string(84) "A Complete Guide To Vercel’s AI SDK // The ESSENTIAL Tool For Shipping AI Apps" ["posted_time"]=> string(27) "5 месяцев назад" ["channelName"]=> string(11) "Matt Pocock" } [4]=> object(stdClass)#7511 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "i40tCb7bkmg" ["related_video_title"]=> string(75) "What is Reinforcement Fine-Tuning (RFT) - Supervised vs. RL LLM Re-training" ["posted_time"]=> string(25) "3 месяца назад" ["channelName"]=> string(37) "What's AI by Louis-François Bouchard" } [5]=> object(stdClass)#7529 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "vlG3DbZSIqw" ["related_video_title"]=> string(94) "Как MCP улучшает Cursor AI в 10x раз? И что вообще это такое?" ["posted_time"]=> string(25) "2 месяца назад" ["channelName"]=> string(65) "Ivan Abramov: стартап разборы | Гроус хаки" } [6]=> object(stdClass)#7524 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "QdEuh2UVbu0" ["related_video_title"]=> string(45) "DeepSeek R1 Theory Overview | GRPO + RL + SFT" ["posted_time"]=> string(25) "4 месяца назад" ["channelName"]=> string(25) "Deep Learning with Yacine" } [7]=> object(stdClass)#7534 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "22tkx79icy4" ["related_video_title"]=> string(55) "RAG | САМОЕ ПОНЯТНОЕ ОБЪЯСНЕНИЕ!" ["posted_time"]=> string(23) "1 месяц назад" ["channelName"]=> string(8) "AI RANEZ" } [8]=> object(stdClass)#7510 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "LCEmiRjPEtQ" ["related_video_title"]=> string(45) "Andrej Karpathy: Software Is Changing (Again)" ["posted_time"]=> string(19) "3 дня назад" ["channelName"]=> string(12) "Y Combinator" } [9]=> object(stdClass)#7528 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "kFQb6qobPoc" ["related_video_title"]=> string(98) "FASTEST Finetuning with Unsloth in 30 Minutes – Real World Example Fine Tuning SQUAD Dataset" ["posted_time"]=> string(25) "2 месяца назад" ["channelName"]=> string(19) "DS-AI with Khanh Vy" } }

🔥 Live Demo: Reinforcement Fine-Tuning for LLMs — Build Smarter Models with Less Data l Tutorial

🔥 Live Demo: Reinforcement Fine-Tuning for LLMs — Build Smarter Models with Less Data l Tutorial

CREATE Your Own Dataset Like a Pro in 30 mins

CREATE Your Own Dataset Like a Pro in 30 mins

Microsoft Sentinel: Create Your First Analytic Rule (Step-by-Step Guide)

Microsoft Sentinel: Create Your First Analytic Rule (Step-by-Step Guide)

A Complete Guide To Vercel’s AI SDK // The ESSENTIAL Tool For Shipping AI Apps

A Complete Guide To Vercel’s AI SDK // The ESSENTIAL Tool For Shipping AI Apps

What is Reinforcement Fine-Tuning (RFT) - Supervised vs. RL LLM Re-training

What is Reinforcement Fine-Tuning (RFT) - Supervised vs. RL LLM Re-training

Как MCP улучшает Cursor AI в 10x раз? И что вообще это такое?

Как MCP улучшает Cursor AI в 10x раз? И что вообще это такое?

DeepSeek R1 Theory Overview | GRPO + RL + SFT

DeepSeek R1 Theory Overview | GRPO + RL + SFT

RAG | САМОЕ ПОНЯТНОЕ ОБЪЯСНЕНИЕ!

RAG | САМОЕ ПОНЯТНОЕ ОБЪЯСНЕНИЕ!

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

FASTEST Finetuning with Unsloth in 30 Minutes – Real World Example Fine Tuning SQUAD Dataset

FASTEST Finetuning with Unsloth in 30 Minutes – Real World Example Fine Tuning SQUAD Dataset