Large Language Model Security: Membership Inference Attacks
Автор: Fuzzy Labs
Загружено: 2024-03-21
Просмотров: 289
For those releasing LLMs into the wild, the data it was trained on is their secret sauce.
As an example, the data used to train OpenAI’s ChatGPT is scraped from the web, but its composition and specifics are unknown.
It’s also true for open source models such as Mistral, the details about the data used to train Mistral remains a mystery (Mistery?)
This opaqueness makes them a target for bad actors, for reasons such as breaching the privacy of individuals in the data, adversarial purposes, and competitive advantages, even more so in the case of proprietary services like ChatGPT.
One method of achieving those goals is through membership inference attacker, which takes a candidate piece of data and a black-box model (like ChatGPT) and aims to determine whether the candidate data sample was in the model’s training dataset.
Jonathan and Matt sit down to discuss this very topic: they cover some famous real-world examples and how you might prevent or detect these through techniques such as differential privacy and adversarial training.
00:00 Stepping into Matt
00:12 Intro
00:29 What are membership inference attacks?
01:03 Examples in the wild
01:21 Membership inference Vs Model stealing?
02:27 Defending against attacks
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: