Фрагмент из книги.
In this book, you will often see functions from RAG frameworks like LangChain and LlamaIndex. This makes sense because these RAG frameworks are handy and offer many functions we need to create RAG applications. Nevertheless, check first if you really need them. They are still at an early stage and constantly changing, which can be challenging when deploying apps to production. Since they are merely a collection of more established frameworks, you could also use the standalone frameworks behind LangChain and LlamaIndex.

Data Preparation.
In RAG systems, we break down longer texts into smaller chunks to make them manageable for text embedding models. These embeddings capture the meaning of the text as vectors in a multi-dimensional space (see [Link to Come] for how to create these embeddings). Later, we use these vectors to measure how similar two text chunks are by calculating the distance between them (see [Link to Come] for details).
To ensure our RAG system finds the relevant information effectively, we need to create text chunks that clearly capture individual pieces of information. A strong data processing pipeline helps by cleaning up the raw text and splitting it at the right points to create meaningful chunks. Figure 2-1 shows some techniques for these steps.
CONTENTS.
1. Loading Data.
1.1. Loading Word Files in Python | 1-2. Loading PDF Files.
1.3. Loading and Handling CSV and Excel Files.
1.4. Querying a PostgreSQL Database.
1.5. Loading Audio Files by Using Speech-to-Text Models.
1.6. Extracting Text from Images and PDFs Using OCR.
1.7. Extracting Text from Images using Multimodal Models.
1.8. Generating Text Summaries for Images Using Multimodal Models.
1.9. Generating Text Summaries for Embedded Tables Using Multimodal Models.
1.10. Parsing PDFs with Multiple Media Content Using Unstructured and Multimodal Models.
1.11. Loading Videos Using Speech-to-Text and Multimodal Models.
2. Data Preparation.
2.1. Adding Metadata to Enable Metadata Filtering.
2.2. Enhancing Data Quality by Replacing Abbreviations and Technical Terms.
2.3. Improving Search Accuracy by Embedding Hypothetical Questions.
2.4. Splitting Documents Using Character Splitting.
2.5. Splitting Documents Using Recursive Text Splitters.
2.6. Document Aware Splitting.
2.7. Splitting the Text Using Semantic Aware Chunkers.
2.8. Splitting Text Using Agentic Chunkers.
Бесплатно скачать электронную книгу в удобном формате, смотреть и читать:
Скачать книгу RAG with Python Cookbook ER, Polzer D., 2026 - fileskachat.com, быстрое и бесплатное скачивание.
Скачать файл № 1 - epub
Скачать файл № 2 - mobi
Ниже можно купить эту книгу, если она есть в продаже, и похожие книги по лучшей цене со скидкой с доставкой по всей России.Купить книги
Скачать - epub - Яндекс.Диск.
Скачать - mobi - Яндекс.Диск.
Дата публикации:
Теги: учебник по программированию :: программирование :: Polzer
Смотрите также учебники, книги и учебные материалы:
Следующие учебники и книги:
Предыдущие статьи: