2.3. Tutorial on LLM evaluation methods: Reference-free evals.

Автор: Evidently AI

Загружено: 2025-05-11

Просмотров: 1045

Описание:

Notebook example: https://github.com/evidentlyai/commun...

Part 1:    • 2.3. Tutorial on LLM evaluation methods: R...
Part 2:    • 2.2. Tutorial on LLM evaluation methods: R...

00:02 Intro and data prep
01:00 Regular expressions
02:44 Text statistics
04:20 Semantic similarity (proxy for Relevance and Hallucinations)
06:09 Using ML models (Sentiment and Topic detection)
09:20 LLM as a judge: custom helpfulness criteria
11:02 LLM as a judge: session-level evals
13:40 Recap

COURSE PLAYLIST
Full playlist:    • Course: LLM evaluation for builders
Instructor: Elena Samuylova, CEO Evidently AI.

LINKS:
https://www.evidentlyai.com/llm-guide... LLM evaluation methods and metrics

EVIDENTLY
Sign up for Evidently Cloud https://www.evidentlyai.com/register
Support Evidently on GitHub https://github.com/evidentlyai/evidently

2.3. Tutorial on LLM evaluation methods: Reference-free evals.

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

3. Tutorial: How to create an LLM judge and align with human labels

3. Tutorial: How to create an LLM judge and align with human labels

LLM as a Judge: Scaling AI Evaluation Strategies

LLM as a Judge: Scaling AI Evaluation Strategies

Build RAG with FAISS | RAG Langchain | RAG with Langchain and Huggingface | Groq with RAG | RAG LLM

Build RAG with FAISS | RAG Langchain | RAG with Langchain and Huggingface | Groq with RAG | RAG LLM

4. Tutorial: Evaluating LLMs on classification tasks

4. Tutorial: Evaluating LLMs on classification tasks

Трехсторонние переговоры, Послевкусие Давоса, Машенька для Уиткоффа. Белковский, Чижов, Романова

Трехсторонние переговоры, Послевкусие Давоса, Машенька для Уиткоффа. Белковский, Чижов, Романова

RAG Evaluation Metrics Explained: Context Precision, Recall, Relevancy & Faithfulness

RAG Evaluation Metrics Explained: Context Precision, Recall, Relevancy & Faithfulness

Conversation with Elon Musk | World Economic Forum Annual Meeting 2026

Conversation with Elon Musk | World Economic Forum Annual Meeting 2026

1. Introduction to LLM evaluations in 10 key ideas

1. Introduction to LLM evaluations in 10 key ideas

Трамп хочет часть Европы: это новый миропорядок? | Гренландия, конфликт США, Европы и НАТО

Трамп хочет часть Европы: это новый миропорядок? | Гренландия, конфликт США, Европы и НАТО

Mitigating LLM Hallucinations with a Metrics-First Evaluation Framework

Mitigating LLM Hallucinations with a Metrics-First Evaluation Framework

Учебное пособие по искусственному интеллекту (AI) — модели машинного обучения с открытым исходным...

Учебное пособие по искусственному интеллекту (AI) — модели машинного обучения с открытым исходным...

5. Tutorial: Evaluating LLMs on content generation tasks. Tracing and experiments.

5. Tutorial: Evaluating LLMs on content generation tasks. Tracing and experiments.

4 Hours Chopin for Studying, Concentration & Relaxation

4 Hours Chopin for Studying, Concentration & Relaxation

BREAKING NEWS: Elon Musk Holds Surprise Talk At The World Economic Forum In Davos

BREAKING NEWS: Elon Musk Holds Surprise Talk At The World Economic Forum In Davos

2.1. Tutorial on LLM evaluation methods. Overview and Basic API.

2.1. Tutorial on LLM evaluation methods. Overview and Basic API.

LLM Evals: Common Mistakes

LLM Evals: Common Mistakes

Vintage Floral Free Tv Art Wallpaper Screensaver Home Decor Samsung Oil Painting Digital Wildflower

Vintage Floral Free Tv Art Wallpaper Screensaver Home Decor Samsung Oil Painting Digital Wildflower

Golden Dust Particles Animation Background video | 4K Gold Dust

Golden Dust Particles Animation Background video | 4K Gold Dust

Beyond the Prompt: Evaluating, Testing, and Securing LLM Applications - Mete Atamel - NDC Oslo 2025

Beyond the Prompt: Evaluating, Testing, and Securing LLM Applications - Mete Atamel - NDC Oslo 2025

Opencode Заменил мне Claude Code – Вот Почему

Opencode Заменил мне Claude Code – Вот Почему