aaron.de (EN)

Part 2: Strategies for Better Results with RAG

In Part 1 we saw how crucial clean document preparation and thoughtful chunking are to the quality of Retrieval Augmented Generation. These basics form the starting point for a whole range of further optimizations that shape the entire process. In Part 2 we continue the series and focus on the next building blocks that build on this foundation and further develop the use of RAG in the enterprise. Embedding Domain-specific Embeddings Domain-specific embeddings mean that vector representations of texts are not generated with generally trained embedding models, but with models adapted to the technical language and content of a specific industry or company. General models are trained on very large, unspecific text corpora, including books, websites, Wikipedia and other sources. They understand everyday language and many standard concepts, but often miss the nuances in, for example, legal contracts, technical manuals or medical reports. Domain-specific embeddings are created either by fine-tuning an existing model with data from the respective domain or by training a custom model on a corpus of internal documents, guidelines, protocols and manuals. ...

Part 1: Strategies for Better Results with RAG

Retrieval Augmented Generation, or RAG for short, combines the power of language models with a company’s specific knowledge. The approach makes it possible to incorporate internal documents and data into responses in a targeted way without losing control over one’s own information. As a result, RAG is increasingly seen as a key technology for deploying language models securely and with data sovereignty. In practice, however, it quickly becomes apparent that simple vector search in combination with an LLM is not sufficient to achieve truly consistent and high-quality results. To fully exploit the potential of RAG, additional methods and optimizations are necessary. ...

Fine-Tuning of a Llama-3.x Model via LoRA

Introduction Large language models (LLMs) like Llama 3.x are trained in an elaborate pretraining process on massive amounts of text. This process typically takes place on specialized hardware such as GPUs and TPUs, which are optimized for parallel computation of large neural networks. After pretraining is complete, the model parameters are frozen and can no longer be directly changed during normal operation. This means that you cannot simply “correct” the model or reprogram it with simple interventions. Content such as facts about historical figures is not stored in individual, explicitly addressable neurons. Instead, such information is statistically distributed across the entirety of the model weights. This makes targeted modifications considerably more difficult, as there are no clearly identifiable storage locations for specific facts. ...

How an Ontology Improves the Answer Quality of LLMs

Introduction With the advent of large language models (LLMs) such as GPT, many people wonder how to provide these models with structured, precise information. Although LLMs are capable of answering questions very convincingly, many of their answers are based solely on statistical language probabilities, not on logical inference or explicit factual knowledge. This is where the use of an ontology offers systematic added value. In the following article, a fictional mission in the „Lord of the Rings“-Universe is used to demonstrate how an ontology can support an LLM in answering complex questions. ...

Unstructured.io Tutorial

Introduction Unstructured.io is an open-source framework for the structured preparation of unstructured documents such as PDFs, Word files, HTML pages, or emails. Its goal is to extract semantically usable content from these heterogeneous formats—such as headings, paragraphs, tables, or lists—and convert it into a unified, machine-readable format. The main use case lies in preparing text data for downstream AI processing, particularly for systems with retrieval-augmented generation (RAG). The typical application is document analysis, knowledge management, or preparing inputs for embedding models. Multiple processing steps are employed. These four steps form the core of the Unstructured.io pipeline and are executed in every regular use of the library. ...

Custom-Built RAG Pipeline

Introduction Retrieval Augmented Generation (RAG) is a technique in natural language processing (NLP) where a language model is combined with external knowledge to produce better and more precise answers. A language model like GPT is queried not only on its internal knowledge (training) but also receives context-specific information from an external knowledge source, e.g. a document collection or database. This article explains the structure and development of a RAG pipeline as part of a learning project. The goal was to develop a system that processes the content of a PDF document and enables an interactive chat to ask questions about this document. The application was born from the desire to practically understand the functionality and interplay of the individual components of a RAG application. ...

Linguistic Text Analysis: A Hybrid Pipeline with Stanza, DeepSeek and Transformers + Spacy Comparison

Introduction Stanza is an open source NLP library from Stanford University based on modern neural networks. It enables comprehensive linguistic analysis of texts in over 70 languages. Stanza’s goal is to provide a complete pipeline system that includes all common processing steps: tokenization, part-of-speech tagging (POS), lemmatization, syntactic analysis (dependencies and constituency) as well as Named Entity Recognition (NER). Stanza is suitable both for research purposes and for production applications, such as text classification, information extraction or preprocessing texts for retrieval-augmented generation (RAG). The models are pretrained but can also be fine-tuned. Internally, Stanza is based on the PyTorch framework. ...

Model Context Protocol (MCP)

In traditional software applications, workflows are strictly predefined. Functions are called in a specific order, inputs and outputs are clearly defined, and decisions are made through fixed rules that the developer has embedded in the code. The application itself makes no decisions; it merely follows a rigid sequence. If you want to integrate a language model like GPT into a system, you normally have to ensure that all required information is obtained and prepared in advance. For example: When current weather data is needed, you write a function that queries an API, processes the response, and passes the text to the model. The model only receives the final text snippet with the weather data. It does not know where the data comes from, which function provided it, or whether it is up to date. It also does not decide on its own when to call a specific function. It simply responds based on the provided context. ...

MCP-Controlled Workflow in n8n

This post describes the setup of an AI-driven agent system in n8n that, via the Model Context Protocol (MCP), identifies, selects, and executes external tools. Objective A user provides a natural language input, e.g.: “Give me the 10 largest cities in Germany, in descending order by area. Also search the internet to verify your result.” The agent recognizes the intent, checks available tools, decides on a tool selection, performs a web search if needed, and generates an appropriate response. The underlying control concept is based on MCP, a protocol for structured tool communication in agent-based systems. ...

RAGFlow Tutorial

RAGFlow is a framework for the structured implementation of Retrieval Augmented Generation (RAG) applications. It offers a modular architecture in which individual processing steps such as document import, text preparation, vectorization, indexing, and answer generation can be configured and executed separately. Models The platform supports different storage solutions for vector data and allows the connection of various LLMs. The list of supported LLMs can be found here. Column Meaning Provider Provider or source of the model. Can be a cloud service (e.g. OpenAI) or a model developer (e.g. Cohere, BAAI). Chat Supports conversational language models used for conversation or answer generation. Embedding Provides embedding models for converting texts into vectors for semantic search or classification. Rerank Models for reranking already found hits to display more relevant results at the top. Img2txt Models for image description: convert an image into a descriptive text. Speech2txt Models for converting spoken language into written text (ASR - Automatic Speech Recognition). TTS Text-to-Speech: converts written text into synthetic speech. No support yet in the table. OpenAI provides no support for the “Rerank” function. ...