AI-powered expense analysis and RAG system with CockroachDB vector search and multi-provider AI support (OpenAI, AWS Bedrock, IBM Watsonx, Google Gemini)
Project description
๐ค Banko AI Assistant - RAG Demo
A modern AI-powered expense analysis application with Retrieval-Augmented Generation (RAG) capabilities, built with CockroachDB vector search and multiple AI provider support.
โจ Features
- ๐ Advanced Vector Search: Enhanced expense search using CockroachDB vector indexes
- ๐ค Multi-AI Provider Support: OpenAI, AWS Bedrock, IBM Watsonx, Google Gemini
- ๐ Dynamic Model Switching: Switch between models without restarting the app
- ๐ค User-Specific Indexing: User-based vector indexes with regional partitioning
- ๐ Data Enrichment: Contextual expense descriptions for better search accuracy
- ๐พ Intelligent Caching: Multi-layer caching system for optimal performance
- ๐ Modern Web Interface: Clean, responsive UI with real-time chat
- ๐ Analytics Dashboard: Comprehensive expense analysis and insights
- ๐ฆ PyPI Package: Easy installation with
pip install banko-ai-assistant - ๐ฏ Enhanced Context: Merchant and amount information included in search context
- โก Performance Optimized: User-specific vector indexes for faster queries
๐ Quick Start
Prerequisites
- Python 3.8+
- CockroachDB v25.2.4+ (recommended: v25.3.3)
- Vector Index Feature Enabled (required for vector search)
- AI Provider API Key (OpenAI, AWS, IBM Watsonx, or Google Gemini)
CockroachDB Setup
-
Download and Install CockroachDB:
# Download CockroachDB v25.3.3 (recommended) # Visit: https://www.cockroachlabs.com/docs/releases/v25.3#v25-3-3 # Or install via package manager brew install cockroachdb/tap/cockroach # macOS
-
Start CockroachDB Single Node:
# Start a single-node cluster (for development) cockroach start-single-node \ --insecure \ --store=./cockroach-data \ --listen-addr=localhost:26257 \ --http-addr=localhost:8080 \ --background
-
Enable Vector Index Feature:
-- Connect to the database cockroach sql --url="cockroachdb://root@localhost:26257/defaultdb?sslmode=disable" -- Enable vector index feature (required for vector search) SET CLUSTER SETTING feature.vector_index.enabled = true;
-
Verify Setup:
-- Check if vector index is enabled SHOW CLUSTER SETTING feature.vector_index.enabled; -- Should return: true
Installation
Option 1: PyPI Installation (Recommended)
# Install from PyPI
pip install banko-ai-assistant
# Set up environment variables (example with OpenAI)
export AI_SERVICE="openai"
export OPENAI_API_KEY="your_openai_api_key_here"
export OPENAI_MODEL="gpt-4o-mini"
export DATABASE_URL="cockroachdb://root@localhost:26257/defaultdb?sslmode=disable"
# Run the application
banko-ai run
Option 2: Development Installation
# Clone the repository
git clone https://github.com/cockroachlabs-field/banko-ai-assistant-rag-demo
cd banko-ai-assistant-rag-demo
# Install the package in development mode
pip install -e .
# Run the application
banko-ai run
Option 3: Direct Dependencies
# Install dependencies from pyproject.toml
pip install -e .
# Run the application
banko-ai run
Configuration
Set up your environment variables:
# Required: Database connection
export DATABASE_URL="cockroachdb://root@localhost:26257/defaultdb?sslmode=disable"
# Required: AI Service (choose one)
export AI_SERVICE="watsonx" # or "openai", "aws", "gemini"
# AI Provider Configuration (choose based on AI_SERVICE)
# For IBM Watsonx:
export WATSONX_API_KEY="your_api_key_here"
export WATSONX_PROJECT_ID="your_project_id_here"
export WATSONX_MODEL="meta-llama/llama-2-70b-chat"
# For OpenAI:
export OPENAI_API_KEY="your_api_key_here"
export OPENAI_MODEL="gpt-4o-mini" # Options: gpt-4o-mini (default), gpt-4o, gpt-4-turbo, gpt-3.5-turbo
# For AWS Bedrock:
export AWS_ACCESS_KEY_ID="your_access_key"
export AWS_SECRET_ACCESS_KEY="your_secret_key"
export AWS_REGION="us-east-1"
export AWS_MODEL="us.anthropic.claude-3-5-sonnet-20241022-v2:0" # Claude 3.5 Sonnet (default)
# For Google Gemini:
export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account.json"
export GOOGLE_MODEL="gemini-1.5-pro"
Running the Application
The application automatically creates database tables and loads sample data (5000 records by default):
# Start with default settings (5000 sample records)
banko-ai run
# Start with custom data amount
banko-ai run --generate-data 10000
# Start without generating data
banko-ai run --no-data
# Start with debug mode
banko-ai run --debug
๐ฏ What Happens on Startup
- Database Connection: Connects to CockroachDB and creates necessary tables
- Table Creation: Creates
expensestable with vector indexes and cache tables - Data Generation: Automatically generates 5000 sample expense records with enriched descriptions
- AI Provider Setup: Initializes the selected AI provider and loads available models
- Web Server: Starts the Flask application on http://localhost:5000
๐ Sample Data Features
The generated sample data includes:
- Rich Descriptions: "Bought food delivery at McDonald's for $56.68 fast significant purchase restaurant and service paid with debit card this month"
- Merchant Information: Realistic merchant names and categories
- Amount Context: Expense amounts with contextual descriptions
- Temporal Context: Recent, this week, this month, etc.
- Payment Methods: Bank Transfer, Debit Card, Credit Card, Cash, Check
- User-Specific Data: Multiple user IDs for testing user-specific search
๐ Web Interface
Access the application at http://localhost:5000
Main Features
- ๐ Home: Overview dashboard with expense statistics
- ๐ฌ Chat: AI-powered expense analysis and Q&A
- ๐ Search: Vector-based expense search
- โ๏ธ Settings: AI provider and model configuration
- ๐ Analytics: Detailed expense analysis and insights
๐ง CLI Commands
# Run the application
banko-ai run [OPTIONS]
# Generate sample data
banko-ai generate-data --count 2000
# Clear all data
banko-ai clear-data
# Check application status
banko-ai status
# Search expenses
banko-ai search "food delivery" --limit 10
# Show help
banko-ai help
๐ API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Web interface |
/api/health |
GET | System health check |
/api/ai-providers |
GET | Available AI providers |
/api/models |
GET | Available models for current provider |
/api/search |
POST | Vector search expenses |
/api/rag |
POST | RAG-based Q&A |
API Examples
# Health check
curl http://localhost:5000/api/health
# Search expenses
curl -X POST http://localhost:5000/api/search \
-H "Content-Type: application/json" \
-d '{"query": "food delivery", "limit": 5}'
# RAG query
curl -X POST http://localhost:5000/api/rag \
-H "Content-Type: application/json" \
-d '{"query": "What are my biggest expenses this month?", "limit": 5}'
๐๏ธ Architecture
Database Schema
- expenses: Main expense table with vector embeddings
- query_cache: Cached search results
- embedding_cache: Cached embeddings
- insights_cache: Cached AI insights
- vector_search_cache: Cached vector search results
- cache_stats: Cache performance statistics
Vector Indexes
-- User-specific vector index for personalized search
CREATE INDEX idx_expenses_user_embedding ON expenses
USING cspann (user_id, embedding vector_l2_ops);
-- General vector index for global search
CREATE INDEX idx_expenses_embedding ON expenses
USING cspann (embedding vector_l2_ops);
-- Note: Regional partitioning syntax may vary by CockroachDB version
-- CREATE INDEX idx_expenses_regional ON expenses
-- USING cspann (user_id, embedding vector_l2_ops)
-- LOCALITY REGIONAL BY ROW AS region;
Benefits:
- User-specific queries: Faster search within user's data
- Contextual results: Enhanced merchant and amount information
- Scalable performance: Optimized for large datasets
- Multi-tenant support: Isolated user data with shared infrastructure
๐ AI Provider Switching
Switch between AI providers and models dynamically:
- Go to Settings in the web interface
- Select your preferred AI provider
- Choose from available models
- Changes take effect immediately
Supported Providers
- OpenAI: GPT-4o-mini (default), GPT-4o, GPT-4 Turbo, GPT-4, GPT-3.5 Turbo
- AWS Bedrock: Claude 3.5 Sonnet (default), Claude 3.5 Haiku, Claude 3 Opus, Claude 3 Sonnet
- IBM Watsonx: GPT-OSS-120B (default), Llama 2 (70B, 13B, 7B), Granite models
- Google Gemini: Gemini 1.5 Pro (default), Gemini 1.5 Flash, Gemini 1.0 Pro
๐ Performance Features
Caching System
- Query Caching: Caches search results for faster responses
- Embedding Caching: Caches vector embeddings to avoid recomputation
- Insights Caching: Caches AI-generated insights
- Multi-layer Optimization: Intelligent cache invalidation and refresh
Vector Search Optimization
- User-Specific Indexes: Faster search for individual users
- Regional Partitioning: Optimized for multi-region deployments
- Data Enrichment: Enhanced descriptions improve search accuracy
- Batch Processing: Efficient data loading and processing
Advanced Vector Features
For detailed demonstrations of vector indexing and search capabilities:
๐ Vector Index Demo Guide - Comprehensive guide covering:
- User-specific vector indexing
- Regional partitioning with multi-region CockroachDB
- Performance benchmarking
- Advanced search queries
- RAG with user context
- Troubleshooting and best practices
๐ ๏ธ Development
Project Structure
banko_ai/
โโโ ai_providers/ # AI provider implementations
โโโ config/ # Configuration management
โโโ static/ # Web assets and images
โโโ templates/ # HTML templates
โโโ utils/ # Database and cache utilities
โโโ vector_search/ # Vector search and data generation
โโโ web/ # Flask web application
Adding New AI Providers
- Create a new provider class in
ai_providers/ - Extend the
BaseAIProviderclass - Implement required methods
- Add to the factory in
ai_providers/factory.py
๐ Troubleshooting
Common Issues
CockroachDB Version Issues
# Check CockroachDB version (must be v25.2.4+)
cockroach version
# If version is too old, download v25.3.3:
# https://www.cockroachlabs.com/docs/releases/v25.3#v25-3-3
Vector Index Feature Not Enabled
# Connect to database and enable vector index feature
cockroach sql --url="cockroachdb://root@localhost:26257/defaultdb?sslmode=disable"
# Enable vector index feature
SET CLUSTER SETTING feature.vector_index.enabled = true;
# Verify it's enabled
SHOW CLUSTER SETTING feature.vector_index.enabled;
Database Connection Error
# Start CockroachDB single node
cockroach start-single-node \
--insecure \
--store=./cockroach-data \
--listen-addr=localhost:26257 \
--http-addr=localhost:8080 \
--background
# Verify database exists
cockroach sql --url="cockroachdb://root@localhost:26257/defaultdb?sslmode=disable" --execute "SHOW TABLES;"
AI Provider Disconnected
- Verify API keys are set correctly
- Check network connectivity
- Ensure the selected model is available
No Search Results
- Ensure sample data is loaded:
banko-ai generate-data --count 1000 - Check vector indexes are created
- Verify search query format
Debug Mode
# Run with debug logging
banko-ai run --debug
# Check application status
banko-ai status
๐ License
MIT License - see LICENSE file for details.
๐ค Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
๐ Support
For issues and questions:
- Check the troubleshooting section
- Review the API documentation
- See the Vector Index Demo Guide for advanced features
- Open an issue on GitHub
Built with โค๏ธ using CockroachDB, Flask, and modern AI technologies such as watsonx.ai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file banko_ai_assistant-1.0.26.tar.gz.
File metadata
- Download URL: banko_ai_assistant-1.0.26.tar.gz
- Upload date:
- Size: 5.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6dadfe19a212f72c7b72021e01ca9e0c83c3ff8f383080f9d247b11014bba40
|
|
| MD5 |
5cdd6b7a9821d1cff1551a098c09807e
|
|
| BLAKE2b-256 |
58b2215f4a92941dca989a19db17eba7f49699a65a33405a76f97435305c9182
|
File details
Details for the file banko_ai_assistant-1.0.26-py3-none-any.whl.
File metadata
- Download URL: banko_ai_assistant-1.0.26-py3-none-any.whl
- Upload date:
- Size: 5.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20c56077ec393ce8028cf901220d86cbaafc9404c9efde523f28ce200dd5a234
|
|
| MD5 |
d26236362be6d256c787ab82685af63d
|
|
| BLAKE2b-256 |
ce47b7fc53aec88182ebaa812cad16f287fac718456c812169945705decea6a5
|