— by Philipp Renoth
Introduction
RAG (Retrieval‑Augmented Generation) has become a cornerstone for building trustworthy AI applications. With cRAiG – ConSol Retrieval Augmented Intelligent Generation – we can construct robust, privacy‑first RAG pipelines that align with EU data‑protection standards.
Why cRAiG?
cRAiG exposes an OpenAI‑compatible API, which enables seamless integration of proprietary data sources while keeping inference local, as well as it enables integration into multi-agent environments as OpenAI-compatible backend or MCP server.
Architecture Overview
cRAiG pipelines look like this.

Index Pipelines:
- Data Ingestion: Connectors for 3rd-party document stores, databases, or integration of other systems that holds important domain knowledge, for example Atlassian Confluence, GitLab, Microsoft SharePoint and many others.
- Document Indexing: Model embedding and pgvector-powered Postgres as vector database
Query RAG Pipeline:
- Retriever: Embedding similarity search.
- Generator: OpenAI‑compatible endpoint for generation.
Invoking the Query RAG Pipeline
The cRAiG query RAG pipeline is exposed through a standard OpenAI‑compatible API endpoint. This design enables two primary consumption solutions:
- Direct agent integration: Any downstream agent—whether a custom chatbot, an internal automation service, or a third‑party application—can call the endpoint using the usual
POST /v1/completionscontract. - Frontend Client integration: For teams that prefer an out‑of‑the‑box UI, e.g. Open WebUI is a good and well-tested option.
This dual‑mode approach supports both developer‑centric workflows and low‑code adoption scenarios, reinforcing our commitment to sovereign data handling and flexible AI deployment.
Deployment Options for cRAiG
cRAiG is distributed as a container image, which gives you flexibility while maintaining security best practices.
- Virtual Machine deployment: Run the image directly with a container runtime such as
DockerorPodman. This approach is ideal for on‑prem environments where you want full control over the host OS and network configuration. - Kubernetes and Red Hat OpenShift: Deploy cRAiG as a workload in any Kubernetes‑compatible cluster. Leveraging native resources will allow you to do the rollout your way and integrate with existing CI/CD pipelines.
For a deeper dive into also running AI workloads on Red Hat OpenShift, see our Local AI inference with GPT‑OSS, vLLM and OpenShift blog post.
Data Ingestion Stack
We build ingestion pipelines on a purpose‑crafted blend of Haystack, LangChain, LlamaIndex as well as custom-developed components. Each component is selected for its maturity and extensibility.
When you need to pull hundreds‑or‑thousands of documents through external APIs, a smooth and fast ingestion layer is important. Three practical constraints dominate:
- Iterative loading of document chunks: stream content in parallel (typically 2‑4 concurrent I/O streams) for example directly to temporary local storage, pushing the rest of the embedding workload downstream.
- Buffering & prefetch: keep a modest in‑memory queue so the embedding worker never stalls; the bottleneck is typically the vectorisation step, not the I/O.
- Batching embeddings: process chunks in appropriately sized batches so the inference backend can optimize parallelism. Batches that are too large increase failure risk, while overly small batches under‑utilise resources. For local inference, aim for batches that fully occupy your hardware. Be aware that some inference backends will silently truncate payloads that exceed size limits, while others will return an error.
Ground Truth Evaluation for RAG Pipelines
Every robust RAG Pipelines implementation relies on a solid ground‑truth evaluation loop. Even minor adjustments—tiny prompt tweaks, embedding model swaps, or indexing parameter changes—can cause a dramatic shift in retrieval quality.
- Essential baseline data: A curated set of question‑answer pairs provides the reference against which all system iterations are measured.
- Sensitivity to change: Because RAG pipelines compose multiple stochastic components, the smallest configuration tweak can swing metrics by several points.
- Continuous guardrail: Automated regression tests compare new runs against the ground‑truth baseline, ensuring that any code or model update yields a net positive or at least no regression in quality.
- Future deep‑dive: We will publish a dedicated post outlining our end‑to‑end quality‑measurement methodology and tooling stack.
Conclusion
While cRAiG provides a powerful foundation for sovereign RAG pipelines, it is not a plug‑and‑play product. The solution still requires careful handcrafted tuning to align with your specific data landscape, compliance requirements, and performance goals.
Our ground‑truth evaluation service is built into the delivery model. We walk you through the maze of AI buzzwords and continuously validate that your domain knowledge integrates seamlessly into the AI stack. The result is a tailored, production‑ready pipeline that delivers reliable, privacy‑first answers.
We at ConSol are ready to partner with you on every step—from data ingestion strategy to Go‑Live monitoring. Let us turn the complexity of RAG into a competitive advantage for your organization.
Are you interested in evaluating cRAiG? Feel free to contact us.


