🔵 Want better RAG results? Optimize your Data
Автор: SAP Developers
Загружено: 2025-10-23
Просмотров: 212
In the evolving landscape of AI, improving Retrieval-Augmented Generation (RAG) results is crucial. A key challenge in LLM training is the dwindling availability of high-quality, human-generated data. While more data is often seen as beneficial, in practice, irrelevant and noisy data can in fact negatively impact performance. Recent research highlights the advantages of Selective Language Modeling (SLM) in pretraining LLMs. In one study, selecting specific tokens during pretraining can substantially reduce downstream loss and enhance model performance. As part of the Enterprise AI Search team at SAP, we have access to extensive indexed data from internal and external sources such as SAP Help and Community documents to name a few. By leveraging these insights, we aim to refine token selection strategies, improving RAG efficiency and model effectiveness, and in short get the most out of our existing data.
Speaker
Ceylin Ozdemir                
 
                Доступные форматы для скачивания:
Скачать видео mp4
- 
                                Информация по загрузке: