Part 1: Strategies for Better Results with RAG

Retrieval Augmented Generation, or RAG for short, combines the power of language models with a company’s specific knowledge. The approach makes it possible to incorporate internal documents and data into responses in a targeted way without losing control over one’s own information. As a result, RAG is increasingly seen as a key technology for deploying language models securely and with data sovereignty. In practice, however, it quickly becomes apparent that simple vector search in combination with an LLM is not sufficient to achieve truly consistent and high-quality results. To fully exploit the potential of RAG, additional methods and optimizations are necessary. ...

August 27, 2025 · Aaron

Fine-Tuning of a Llama-3.x Model via LoRA

Introduction Large language models (LLMs) like Llama 3.x are trained in an elaborate pretraining process on massive amounts of text. This process typically takes place on specialized hardware such as GPUs and TPUs, which are optimized for parallel computation of large neural networks. After pretraining is complete, the model parameters are frozen and can no longer be directly changed during normal operation. This means that you cannot simply “correct” the model or reprogram it with simple interventions. Content such as facts about historical figures is not stored in individual, explicitly addressable neurons. Instead, such information is statistically distributed across the entirety of the model weights. This makes targeted modifications considerably more difficult, as there are no clearly identifiable storage locations for specific facts. ...

July 6, 2025 · Aaron

How an Ontology Improves the Answer Quality of LLMs

Introduction With the advent of large language models (LLMs) such as GPT, many people wonder how to provide these models with structured, precise information. Although LLMs are capable of answering questions very convincingly, many of their answers are based solely on statistical language probabilities, not on logical inference or explicit factual knowledge. This is where the use of an ontology offers systematic added value. In the following article, a fictional mission in the „Lord of the Rings“-Universe is used to demonstrate how an ontology can support an LLM in answering complex questions. ...

June 16, 2025 · Aaron

Unstructured.io Tutorial

Introduction Unstructured.io is an open-source framework for the structured preparation of unstructured documents such as PDFs, Word files, HTML pages, or emails. Its goal is to extract semantically usable content from these heterogeneous formats—such as headings, paragraphs, tables, or lists—and convert it into a unified, machine-readable format. The main use case lies in preparing text data for downstream AI processing, particularly for systems with retrieval-augmented generation (RAG). The typical application is document analysis, knowledge management, or preparing inputs for embedding models. Multiple processing steps are employed. These four steps form the core of the Unstructured.io pipeline and are executed in every regular use of the library. ...

June 14, 2025 · Aaron

Custom-Built RAG Pipeline

Introduction Retrieval Augmented Generation (RAG) is a technique in natural language processing (NLP) where a language model is combined with external knowledge to produce better and more precise answers. A language model like GPT is queried not only on its internal knowledge (training) but also receives context-specific information from an external knowledge source, e.g. a document collection or database. This article explains the structure and development of a RAG pipeline as part of a learning project. The goal was to develop a system that processes the content of a PDF document and enables an interactive chat to ask questions about this document. The application was born from the desire to practically understand the functionality and interplay of the individual components of a RAG application. ...

June 9, 2025 · Aaron

Linguistic Text Analysis: A Hybrid Pipeline with Stanza, DeepSeek and Transformers + Spacy Comparison

Introduction Stanza is an open source NLP library from Stanford University based on modern neural networks. It enables comprehensive linguistic analysis of texts in over 70 languages. Stanza’s goal is to provide a complete pipeline system that includes all common processing steps: tokenization, part-of-speech tagging (POS), lemmatization, syntactic analysis (dependencies and constituency) as well as Named Entity Recognition (NER). Stanza is suitable both for research purposes and for production applications, such as text classification, information extraction or preprocessing texts for retrieval-augmented generation (RAG). The models are pretrained but can also be fine-tuned. Internally, Stanza is based on the PyTorch framework. ...

June 7, 2025 · Aaron

Model Context Protocol (MCP)

In traditional software applications, workflows are strictly predefined. Functions are called in a specific order, inputs and outputs are clearly defined, and decisions are made through fixed rules that the developer has embedded in the code. The application itself makes no decisions; it merely follows a rigid sequence. If you want to integrate a language model like GPT into a system, you normally have to ensure that all required information is obtained and prepared in advance. For example: When current weather data is needed, you write a function that queries an API, processes the response, and passes the text to the model. The model only receives the final text snippet with the weather data. It does not know where the data comes from, which function provided it, or whether it is up to date. It also does not decide on its own when to call a specific function. It simply responds based on the provided context. ...

May 28, 2025 · Aaron

MCP-Controlled Workflow in n8n

This post describes the setup of an AI-driven agent system in n8n that, via the Model Context Protocol (MCP), identifies, selects, and executes external tools. Objective A user provides a natural language input, e.g.: “Give me the 10 largest cities in Germany, in descending order by area. Also search the internet to verify your result.” The agent recognizes the intent, checks available tools, decides on a tool selection, performs a web search if needed, and generates an appropriate response. The underlying control concept is based on MCP, a protocol for structured tool communication in agent-based systems. ...

May 27, 2025 · Aaron

RAGFlow Tutorial

RAGFlow is a framework for the structured implementation of Retrieval Augmented Generation (RAG) applications. It offers a modular architecture in which individual processing steps such as document import, text preparation, vectorization, indexing, and answer generation can be configured and executed separately. Models The platform supports different storage solutions for vector data and allows the connection of various LLMs. The list of supported LLMs can be found here. Column Meaning Provider Provider or source of the model. Can be a cloud service (e.g. OpenAI) or a model developer (e.g. Cohere, BAAI). Chat Supports conversational language models used for conversation or answer generation. Embedding Provides embedding models for converting texts into vectors for semantic search or classification. Rerank Models for reranking already found hits to display more relevant results at the top. Img2txt Models for image description: convert an image into a descriptive text. Speech2txt Models for converting spoken language into written text (ASR - Automatic Speech Recognition). TTS Text-to-Speech: converts written text into synthetic speech. No support yet in the table. OpenAI provides no support for the “Rerank” function. ...

May 27, 2025 · Aaron

Hugging Face CLI Practical Guide

This guide is based on the Hugging Face CLI from version 0.34.4 onwards. In this version the old syntax huggingface-cli is replaced by the new command hf. I created this cheat sheet to have a concise and clear reference to the Hugging Face CLI. Instead of having to search the official documentation, I can find the most important commands, descriptions and examples here at a glance. What is Hugging Face? Hugging Face is a platform for machine learning. At its core is the Hugging Face Hub, a public and private repository for AI models, datasets and applications (Spaces). Developers can share, download and reuse models there. In addition to the hub, Hugging Face also offers libraries such as transformers, datasets and diffusers that make it easier to use AI models in practice. The hub thus serves both as a marketplace and as an infrastructure for collaborative development. ...

May 19, 2025 · Aaron