Building a RAG-Based Multi-Domain AI Chatbot using Cohere, ChromaDB & Google Gemini

Himadri Karan

June 21, 2025

ChatbotPythonCohereGemini LLM

Complete guide to building a powerful Retrieval-Augmented Generation chatbot using Python, ChromaDB, Cohere Embeddings, Google Gemini LLM, and Redis. Learn to create domain-specific knowledge querying with performance optimization.

RAG-Based Multi-Domain AI Chatbot with Cohere + Gemini

This project demonstrates how to build a scalable Retrieval-Augmented Generation (RAG) chatbot that can answer queries across multiple knowledge domains using vector embeddings and large language models.

🚀 What You’ll Learn

How to use Cohere for embedding documents and queries
Setting up ChromaDB for similarity search
Integrating Google Gemini LLM for natural language responses
Using FastAPI + React for a clean full-stack experience
Optimizing with Redis and credit tracking
Supporting multiple knowledge domains (e.g., Education, Legal, Healthcare)

🧱 Tech Stack Overview

Backend: Python, FastAPI, Cohere, ChromaDB, Google Gemini API, Upstash Redis
Frontend: React.js, Tailwind CSS
Storage: Local file system + ChromaDB vector store
Auth & Usage Control: Redis-based credit system

🧠 How It Works

Ingestion Phase
- Documents are chunked and embedded using Cohere.
- Embeddings are stored in ChromaDB under domain-specific collections.
Query Phase
- User inputs a query through the React interface.
- The query is embedded at runtime and matched against the selected domain's ChromaDB collection.
- Retrieved contexts are combined into a prompt and passed to Google Gemini for answer generation.
Redis for Credit Control
- Tracks per-user API usage with TTL (Time-to-Live) logic.
- Helps manage rate limits and cost control.

📦 Sample Folder Structure

Backend_Server/
│
├── app/                  # Core logic
│   ├── main.py
│   ├── retriver.py
│   ├── embedder.py
│   ├── llm.py
│   └── utils/
│       └── credit_tracker.py
│
├── chroma_store/         # Chroma vector store
├── sampleData/           # Source docs in JSON
├── requirements.txt
├── .env
└── .gitignore

🛠️ Redis Credit Tracking Example

def track_credits(user_id: str) -> bool:
    key = f"user:{user_id}:credits"
    current = redis.get(key)

    if current is None:
        redis.set(key, DEFAULT_CREDITS - 1, ex=TTL_SECONDS)
        return True
    if int(current) <= 0:
        return False

    redis.decr(key)
    return True

💻 Frontend with React + Tailwind

Clean interface for domain selection
Query input box with real-time results
Context citations for each answer
Responsive layout with Tailwind CSS

✅ Prerequisites

Python 3.9+
Node.js 18+
Redis (Upstash or local)
ChromeDB (local file-based)
API keys: GEMINI_API_KEY, COHERE_API_KEY, REDIS_URL, REDIS_TOKEN

🔧 Installation

# Clone the repository
git clone https://github.com/your-username/rag-chatbot.git
cd rag-chatbot/Backend_Server

# Create virtual environment
python -m venv venv
venv\Scripts\activate     # on Windows

# Install dependencies
pip install -r requirements.txt

⚙️ Configuration

Create a .env file in the root:

COHERE_API_KEY=your_key
GEMINI_API_KEY=your_key
REDIS_URL=https://...
REDIS_TOKEN=your_token

💡 Usage Guide

Start Backend Server

cd app
uvicorn main:app --reload

Start Frontend Server

cd Chat_UI
npm install
npm run dev

Interact via Browser

Open http://localhost:5173
Ask queries related to healthcare, education, law, etc.

🔒 Security Features

✅ Environment variables for all sensitive keys
✅ Credit-based usage control via Redis
✅ Token-level rate limiting (Upstash)
⛔ No hardcoded credentials in production

🎯 Performance Optimization

⚡ Efficient semantic search using ChromaDB
🔁 Batched embedding with Cohere
🔄 Redis caching to reduce API overuse
📦 Modular design to plug-and-play new domains

🔄 Future Enhancements

🧬 Add support for OpenAI/SentenceTransformers
🗃️ Optional Pinecone/Weaviate for scalable vector storage
🧑‍💻 User authentication and role-based access
🧠 Feedback learning loop for continuous improvement
📊 Admin dashboard for monitoring & analytics
📦 Docker + CI/CD for deployment pipeline

🙏 Acknowledgments

Google Gemini API
Cohere
ChromaDB
FastAPI
Upstash Redis
Open-source community & contributors

🎉 Conclusion

This RAG-based chatbot demonstrates the power of combining modern LLMs with vector databases for domain-specific knowledge retrieval. The integration of Cohere embeddings with Google Gemini provides accurate, contextual responses while maintaining scalability through Redis caching and ChromaDB's efficient vector operations.

Key Takeaways:

✅ RAG significantly improves response accuracy for specialized domains
✅ Vector embeddings enable semantic search across large knowledge bases
✅ Proper caching and rate limiting are crucial for production deployment
✅ Modular architecture allows easy expansion to new domains

Feel free to explore the code, contribute improvements, or adapt it for your own use cases!