In Part 1 we saw how crucial clean document preparation and thoughtful chunking are to the quality of Retrieval Augmented Generation. These basics form the starting point for a whole range of further optimizations that shape the entire process. In Part 2 we continue the series and focus on the next building blocks that build on this foundation and further develop the use of RAG in the enterprise.
Embedding Domain-specific Embeddings Domain-specific embeddings mean that vector representations of texts are not generated with generally trained embedding models, but with models adapted to the technical language and content of a specific industry or company. General models are trained on very large, unspecific text corpora, including books, websites, Wikipedia and other sources. They understand everyday language and many standard concepts, but often miss the nuances in, for example, legal contracts, technical manuals or medical reports. Domain-specific embeddings are created either by fine-tuning an existing model with data from the respective domain or by training a custom model on a corpus of internal documents, guidelines, protocols and manuals.
...