Building a RAG-Based Multi-Domain AI Chatbot using Cohere, ChromaDB & Google Gemini
Complete guide to building a powerful Retrieval-Augmented Generation chatbot using Python, ChromaDB, Cohere Embeddings, Google Gemini LLM, and Redis. Learn to create domain-specific knowledge querying with performance optimization.
RAG-Based Multi-Domain AI Chatbot with Cohere + Gemini
This project demonstrates how to build a scalable Retrieval-Augmented Generation (RAG) chatbot that can answer queries across multiple knowledge domains using vector embeddings and large language models.
🚀 What You’ll Learn
- How to use Cohere for embedding documents and queries
- Setting up ChromaDB for similarity search
- Integrating Google Gemini LLM for natural language responses
- Using FastAPI + React for a clean full-stack experience
- Optimizing with Redis and credit tracking
- Supporting multiple knowledge domains (e.g., Education, Legal, Healthcare)
🧱 Tech Stack Overview
- Backend: Python, FastAPI, Cohere, ChromaDB, Google Gemini API, Upstash Redis
- Frontend: React.js, Tailwind CSS
- Storage: Local file system + ChromaDB vector store
- Auth & Usage Control: Redis-based credit system
🧠 How It Works
-
Ingestion Phase
- Documents are chunked and embedded using Cohere.
- Embeddings are stored in ChromaDB under domain-specific collections.
-
Query Phase
- User inputs a query through the React interface.
- The query is embedded at runtime and matched against the selected domain's ChromaDB collection.
- Retrieved contexts are combined into a prompt and passed to Google Gemini for answer generation.
-
Redis for Credit Control
- Tracks per-user API usage with TTL (Time-to-Live) logic.
- Helps manage rate limits and cost control.
📦 Sample Folder Structure
Backend_Server/
│
├── app/ # Core logic
│ ├── main.py
│ ├── retriver.py
│ ├── embedder.py
│ ├── llm.py
│ └── utils/
│ └── credit_tracker.py
│
├── chroma_store/ # Chroma vector store
├── sampleData/ # Source docs in JSON
├── requirements.txt
├── .env
└── .gitignore
🛠️ Redis Credit Tracking Example
def track_credits(user_id: str) -> bool:
key = f"user:{user_id}:credits"
current = redis.get(key)
if current is None:
redis.set(key, DEFAULT_CREDITS - 1, ex=TTL_SECONDS)
return True
if int(current) <= 0:
return False
redis.decr(key)
return True
💻 Frontend with React + Tailwind
- Clean interface for domain selection
- Query input box with real-time results
- Context citations for each answer
- Responsive layout with Tailwind CSS
✅ Prerequisites
- Python 3.9+
- Node.js 18+
- Redis (Upstash or local)
- ChromeDB (local file-based)
- API keys:
GEMINI_API_KEY
,COHERE_API_KEY
,REDIS_URL
,REDIS_TOKEN
🔧 Installation
# Clone the repository
git clone https://github.com/your-username/rag-chatbot.git
cd rag-chatbot/Backend_Server
# Create virtual environment
python -m venv venv
venv\Scripts\activate # on Windows
# Install dependencies
pip install -r requirements.txt
⚙️ Configuration
Create a .env
file in the root:
COHERE_API_KEY=your_key
GEMINI_API_KEY=your_key
REDIS_URL=https://...
REDIS_TOKEN=your_token
💡 Usage Guide
- Start Backend Server
cd app
uvicorn main:app --reload
- Start Frontend Server
cd Chat_UI
npm install
npm run dev
- Interact via Browser
- Open
http://localhost:5173
- Ask queries related to healthcare, education, law, etc.
🔒 Security Features
- ✅ Environment variables for all sensitive keys
- ✅ Credit-based usage control via Redis
- ✅ Token-level rate limiting (Upstash)
- ⛔ No hardcoded credentials in production
🎯 Performance Optimization
- ⚡ Efficient semantic search using ChromaDB
- 🔁 Batched embedding with Cohere
- 🔄 Redis caching to reduce API overuse
- 📦 Modular design to plug-and-play new domains
🔄 Future Enhancements
- 🧬 Add support for OpenAI/SentenceTransformers
- 🗃️ Optional Pinecone/Weaviate for scalable vector storage
- 🧑💻 User authentication and role-based access
- 🧠 Feedback learning loop for continuous improvement
- 📊 Admin dashboard for monitoring & analytics
- 📦 Docker + CI/CD for deployment pipeline
🙏 Acknowledgments
- Google Gemini API
- Cohere
- ChromaDB
- FastAPI
- Upstash Redis
- Open-source community & contributors
🎉 Conclusion
This RAG-based chatbot demonstrates the power of combining modern LLMs with vector databases for domain-specific knowledge retrieval. The integration of Cohere embeddings with Google Gemini provides accurate, contextual responses while maintaining scalability through Redis caching and ChromaDB's efficient vector operations.
Key Takeaways:
- ✅ RAG significantly improves response accuracy for specialized domains
- ✅ Vector embeddings enable semantic search across large knowledge bases
- ✅ Proper caching and rate limiting are crucial for production deployment
- ✅ Modular architecture allows easy expansion to new domains
Feel free to explore the code, contribute improvements, or adapt it for your own use cases!