RAG Explained (Retrieval Augmented Generation) | Gen AI / LLMs | Tech in Two Ep1
Автор: HandyAndy Tech Tips
Загружено: 2025-09-17
Просмотров: 373
In this video, I'll explain how RAG (Retrieval Augmented Generation) works in the context of Gen AI - in only two minutes! You'll learn how RAG allows information to be retrieved from accurate source documents and then ‘injected’ into the generated response, improving accuracy.
--
If you enjoyed this video, please SUBSCRIBE to HandyAndy Tech Tips!
--
If you want more in-depth information, here are the sources I used for this video:
https://cloud.google.com/discover/wha...
https://arstechnica.com/information-t...
https://www.ai-bites.net/retrieval-au...
https://learn.microsoft.com/en-us/azu...
/ introduction-to-rag-retrieval-augmented-ge...
https://weaviate.io/blog/vector-embed...
--
Image sources:
Rag on grass - https://picryl.com/media/rags-cloth-r...
Library - Roman Eisele, CC BY-SA 4.0, via Wikimedia Commons. https://upload.wikimedia.org/wikipedi...
Vector embedding diagram - https://weaviate.io/blog/vector-embed...
--
OK, so what is a RAG? It stands for Retrieval Augmented Generation, and it’s a technique used in conjunction with LLMs, or large language models.
Basically, the problem with current LLMs is that they can hallucinate, or make things up. Why does this happen? Well, they’re trained on a set of data, like a bunch of websites or e-books, and then they find patterns in this data which allow them to create new text. But if you ask them about something that didn’t appear very often, if at all, in their training dataset – like a recent news story, or specific information about your company – they won’t be able to give a good answer.
This is where RAG comes in. It allows information to be retrieved from accurate source documents and then ‘injected’ into the generated response, improving accuracy.
This is how it works. It starts with a corpus, or collection, of documents. These documents are divided into different sections called chunks, so that the LLM can process them. Then vector embeddings are generated based on the chunks. A vector embedding is a numeric representation of the text, which can be used to work out the relationships between words, and also to calculate how similar it is to other text. These embeddings are stored in a special kind of database called a vector database.
Now, what happens when you ask a question of the LLM? Well, your text query is also converted into a vector embedding, and then the RAG system searches the vector database to find the chunks that are the most similar to the query, and fetches the top results. This is the ‘retrieval’ part of RAG. The most relevant chunks are then added to your query as additional context, in order to ‘augment’ it. And then the model will ‘generate’ an answer that includes information from the chunks. An additional benefit of this approach is that, unlike the training data of the model, which is often a bit of a black box, systems using RAG can actually provide citations directly to the source documents that they use.
So that’s how retrieval augmented generation can improve the accuracy of AI models.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: