Corporate RAG AI Assistant

An intelligent conversational assistant powered by Retrieval-Augmented Generation that transforms how visitors interact with the Elinext website. The system automatically indexes site content, performs hybrid BM25 and semantic search across a knowledge base, and generates precise answers with source citations — all streamed in real time through a modern chat interface.

OpenAI

React

Python

FastAPI

Docker

chatbot.elinext.ai

About the project

The Product: An intelligent AI navigation and search solution designed to transform vast corporate content into a seamless, conversational experience.

What it Does: It instantly surfaces relevant services, technologies, and portfolio details, delivering precise answers in real-time. Instead of browsing dozens of pages, users get direct information backed by source links.

How it Works: Built on a modern full-stack architecture (Python/React), the system uses a Hybrid Search Engine. By combining "semantic" understanding (intent) with traditional keyword matching and AI-driven reranking, it ensures the highest level of accuracy and eliminates "AI hallucinations."

The Advantage: A production-grade, fault-tolerant system featuring real-time response streaming and a dedicated quality evaluation framework. Every answer is transparent, verifiable, and optimized for enterprise-level reliability.

About the project

The Product: An intelligent AI navigation and search solution designed to transform vast corporate content into a seamless, conversational experience.

Features

Hybrid Search Engine

At the core of the chatbot lies a dual search engine that combines two complementary retrieval methods. BM25 full-text search via Tantivy delivers fast, precise document retrieval based on exact keyword matches with phrase boosting — ideal for specific queries with technical terms. Dense semantic search via Faiss uses OpenAI embeddings (text-embedding-3-large) to find relevant content even when there are no direct text overlaps between the query and the source material.

Results from both methods are merged through hybrid ranking with configurable weights and then refined by a reranking stage — either a local cross-encoder model (BAAI/bge-reranker-base) or LLM-based reranking via API. This multi-layered approach ensures that the final result set is both comprehensive and precisely ordered by relevance, delivering the highest quality context for answer generation.

Intelligent Query Pipeline

Every user question passes through a sophisticated multi-stage pipeline before an answer is generated. First, an LLM-powered routing layer determines whether the query requires a knowledge base lookup or can be handled directly — saving resources on small talk and off-topic requests. Next, the original question is rewritten with conversational context to produce an optimized search query that captures the user's true intent.

After the hybrid search returns the most relevant content chunks, the LLM synthesizes a natural language answer grounded in the retrieved sources. Each response includes inline source citations and a confidence score, giving users full transparency into where the information comes from. The entire conversation history is persisted for multi-turn dialogue support, enabling follow-up questions that build on previous context.

Production-Grade Architecture

The system is engineered for real-world reliability from the ground up. The LLM client supports primary and fallback API endpoints with automatic switching, ensuring uninterrupted service even during provider outages. When structured output parsing fails, a plain text fallback keeps the conversation flowing. All external dependencies — including Langfuse for observability, and third-party APIs — operate in graceful degradation mode, meaning their unavailability never blocks core chat functionality.

Responses are streamed to the frontend in real time via Server-Sent Events (SSE), providing an instant, fluid chat experience. The entire application is containerized with Docker using multi-stage builds, orchestrated via Docker Compose, and deployed through GitHub Actions CI/CD with automated linting, tests, builds, and health checks. A dedicated quality evaluation framework with benchmark datasets and A/B configuration comparison enables systematic, data-driven improvement of answer quality over time.