Why your evals are probably off?
Автор: Ivan P. Yamshchikov
Загружено: 2025-04-15
Просмотров: 311
Here is my presentation of several research results that we have obtained recently at Pleias and THWS.
What the HellaSwag? On the Validity of Common-Sense Reasoning Benchmarks
https://arxiv.org/abs/2504.07825
Vygotsky Distance: Measure for Benchmark Task Similarity
https://aclanthology.org/2024.lrec-ma...
LLMs Simulate Big5 Personality Traits: Further Evidence
https://aclanthology.org/2024.persona...
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: