Beyond the Prompt: Evaluating, Testing, and Securing LLM Applications - Mete Atamel - NDC Oslo 2025

Автор: NDC Conferences

Загружено: 2025-07-31

Просмотров: 837

Описание:

This talk was recorded at NDC Oslo in Oslo, Norway. #ndcoslo #ndcconferences #developer #softwaredeveloper

Attend the next NDC conference near you:
https://ndcconferences.com
https://ndcoslo.com/

Subscribe to our YouTube channel and learn every day:
/ ‪@NDC‬

Follow our Social Media!

  / ndcconferences
  / ndc_conferences
  / ndc_conferences

When you change prompts or modify the Retrieval-Augmented Generation (RAG) pipeline in your LLM applications, how do you know it’s making a difference? You don’t—until you measure. But what should you measure, and how? Similarly, how can you ensure your LLM app is resilient against prompt injections or avoids providing harmful responses? More robust guardrails on inputs and outputs are needed beyond basic safety settings. In this talk, we’ll explore various evaluation frameworks such as Vertex AI Evaluation, DeepEval, and Promptfoo to assess LLM outputs, understand the types of metrics they offer, and how these metrics are useful. We’ll also dive into testing and security frameworks like LLM Guard to ensure your LLM apps are safe and limited to precisely what you need.

Beyond the Prompt: Evaluating, Testing, and Securing LLM Applications - Mete Atamel - NDC Oslo 2025

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Supercharging OAuth 2.0 security - Philippe De Ryck - NDC Oslo 2025

Supercharging OAuth 2.0 security - Philippe De Ryck - NDC Oslo 2025

Why I Left Quantum Computing Research

Why I Left Quantum Computing Research

MCP vs A2A vs RAG Explained Simply!

MCP vs A2A vs RAG Explained Simply!

Beyond the Prompt: Evaluating, Testing, and Securing LLM Applications - Mete Atamel

Beyond the Prompt: Evaluating, Testing, and Securing LLM Applications - Mete Atamel

Fine-grained Real-time Apps with Blazor & Orleans - Sjoerd van der Meer - NDC Oslo 2025

Fine-grained Real-time Apps with Blazor & Orleans - Sjoerd van der Meer - NDC Oslo 2025

Beyond the AI Hype: What's Real, What's Next - Richard Campbell - NDC Copenhagen 2025

Beyond the AI Hype: What's Real, What's Next - Richard Campbell - NDC Copenhagen 2025

System Design Concepts Course and Interview Prep

System Design Concepts Course and Interview Prep

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

Conversation with Elon Musk | World Economic Forum Annual Meeting 2026

Conversation with Elon Musk | World Economic Forum Annual Meeting 2026

Opencode Заменил мне Claude Code – Вот Почему

Opencode Заменил мне Claude Code – Вот Почему

Evaluating LLM-based Applications

Evaluating LLM-based Applications

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Как мы создаем эффективных агентов: Барри Чжан, Anthropic

Как мы создаем эффективных агентов: Барри Чжан, Anthropic

Claude Code: полный гайд по AI-кодингу (хаки, техники и секреты)

Claude Code: полный гайд по AI-кодингу (хаки, техники и секреты)

ПОСЛЕДНИЙ Выбор ЧЕЛОВЕЧЕСТВА | Либерманы

ПОСЛЕДНИЙ Выбор ЧЕЛОВЕЧЕСТВА | Либерманы

2026: Всё Уже Решено - Вот Что Будет Дальше

2026: Всё Уже Решено - Вот Что Будет Дальше

🔥 Европа ВОЕТ! ЕС рухнет в ближайшие годы. Экономике ХАНА!

🔥 Европа ВОЕТ! ЕС рухнет в ближайшие годы. Экономике ХАНА!

КАК СТАРТОВАТЬ в AI-разработке 2026: Claude Code для начинающих | 2 часа до результата

КАК СТАРТОВАТЬ в AI-разработке 2026: Claude Code для начинающих | 2 часа до результата

12-факторные агенты: модели надежных приложений LLM — Декс Хорти, HumanLayer

12-факторные агенты: модели надежных приложений LLM — Декс Хорти, HumanLayer

Интервью Middle Java с разработчиком ex-WB, Uzum

Интервью Middle Java с разработчиком ex-WB, Uzum