Fact-Checking with Wikidata - Philippe Saadé
Автор: DataTalksClub ⬛
Загружено: 2026-01-20
Просмотров: 1070
In this talk, Philippe, AI Project Manager at Wikimedia Deutschland, shares his expertise in building reliable AI systems—from maintaining one of the world's largest knowledge bases to implementing cutting-edge verification pipelines. We explore the critical intersection of Generative AI and structured data, focusing on how to use Wikidata to fight LLM hallucinations through the Model Context Protocol (MCP) and advanced NLP workflows.
You’ll learn about:
How to bridge the gap between unstructured LLM outputs and structured Knowledge Graphs.
Implementing the Model Context Protocol (MCP) to give AI real-time access to Wikidata.
The difference between semantic vector search and traditional keyword search in knowledge retrieval.
Using Re-ranker models to improve the precision of data retrieved from large-scale graphs.
Applying Natural Language Inference (NLI) to classify AI claims as true, false, or neutral.
The risks of "model collapse" and why human-in-the-loop moderation is vital for AI training data.
Links:
Colab: https://colab.research.google.com/dri...
Microservice: https://github.com/philippesaade-wmde...
MCP workshop: https://github.com/alexeygrigorev/wor...
TIMECODES:
00:00 Combating llm hallucinations with wiki data
05:44 Anatomy of wiki data statements and references
10:50 Connecting ai to facts via the mcp
17:41 Benchmarking llm accuracy with live tool calls
24:12 Web search vs. structured knowledge retrieval
31:54 Vector search and item embeddings for fact-checking
41:52 Transforming knowledge graph data for nlp
47:34 Filtering and scoring results with re-rankers
57:57 Using natural language inference for truth classification
1:05:51 Analyzing entailment and confidence scores
1:11:52 Scaling fact-checking for long-form articles
1:17:49 Hardware performance and the risk of model collapse
1:24:46 Community moderation and contributing to wiki data
This workshop is designed for AI engineers, data scientists, and developers looking to implement Fact-Checking or RAG (Retrieval-Augmented Generation) systems. It is also highly relevant for researchers interested in the ethics of AI, knowledge graph management, and the future of verifiable crowdsourced data.
Connect with Philippe:
Linkedin - / philippesaade1998 / wikidata
Website - https://www.wikidata.org/wiki/Wikidat...
Mastodon - https://wikis.world/@wikidata
Connect with DataTalks.Club:
Join the community - https://datatalks.club/slack.html
Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/...
Check other upcoming events - https://lu.ma/dtc-events
GitHub: https://github.com/DataTalksClub
LinkedIn - / datatalks-club
Twitter - / datatalksclub
Website - https://datatalks.club/
Connect with Alexey
Twitter - / al_grigor
Linkedin - / agrigorev
Check our free online courses:
ML Engineering course - http://mlzoomcamp.com
Data Engineering course - https://github.com/DataTalksClub/data...
MLOps course - https://github.com/DataTalksClub/mlop...
LLM course - https://github.com/DataTalksClub/llm-...
Open-source LLM course: https://github.com/DataTalksClub/open...
AI Dev Tools course: https://github.com/DataTalksClub/ai-d...
👉🏼 Read about all our courses in one place - https://datatalks.club/blog/guide-to-...
👋🏼 Support/inquiries
If you want to support our community, use this link - https://github.com/sponsors/alexeygri...
If you’re a company, reach us at alexey@datatalks.club
#wikidata #wikimedia #ai #llm #hallucinations #factchecking #knowledgegraph #nlp #machinelearning #mcp #rag #vectorsearch #embeddings #dataengineering #aiethics #wikimedia #mistralai #naturallanguageinference #structureddata #opensourcedata #datatalksclub
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: