Skip to main content

AI-powered expense analysis and RAG system with CockroachDB vector search and multi-provider AI support

Project description

๐Ÿค– Banko AI Assistant - RAG Demo

A modern AI-powered expense analysis application with Retrieval-Augmented Generation (RAG) capabilities, built with CockroachDB vector search and multiple AI provider support.

Banko AI Assistant

โœจ Features

  • ๐Ÿ” Advanced Vector Search: Enhanced expense search using CockroachDB vector indexes
  • ๐Ÿค– Multi-AI Provider Support: OpenAI, AWS Bedrock, IBM Watsonx, Google Gemini
  • ๐Ÿ”„ Dynamic Model Switching: Switch between models without restarting the app
  • ๐Ÿ‘ค User-Specific Indexing: User-based vector indexes with regional partitioning
  • ๐Ÿ“Š Data Enrichment: Contextual expense descriptions for better search accuracy
  • ๐Ÿ’พ Intelligent Caching: Multi-layer caching system for optimal performance
  • ๐ŸŒ Modern Web Interface: Clean, responsive UI with real-time chat
  • ๐Ÿ“ˆ Analytics Dashboard: Comprehensive expense analysis and insights
  • ๐Ÿ“ฆ PyPI Package: Easy installation with pip install banko-ai-assistant
  • ๐ŸŽฏ Enhanced Context: Merchant and amount information included in search context
  • โšก Performance Optimized: User-specific vector indexes for faster queries

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8+
  • CockroachDB (running locally or cloud)
  • AI Provider API Key (OpenAI, AWS, IBM Watsonx, or Google Gemini)

Installation

Option 1: PyPI Installation (Recommended)

# Install from PyPI (when published)
pip install banko-ai-assistant

# Run the application
banko-ai run

Option 2: Development Installation

# Clone the repository
git clone https://github.com/cockroachlabs-field/banko-ai-assistant-rag-demo
cd banko-ai-assistant-rag-demo

# Install the package in development mode
pip install -e .

# Run the application
banko-ai run

Option 3: Direct Dependencies

# Install dependencies directly
pip install -r requirements.txt

# Run the original app.py (legacy method)
python app.py

Configuration

Set up your environment variables:

# Required: Database connection
export DATABASE_URL="cockroachdb://root@localhost:26257/banko_ai?sslmode=disable"

# Required: AI Service (choose one)
export AI_SERVICE="watsonx"  # or "openai", "aws", "gemini"

# AI Provider Configuration (choose based on AI_SERVICE)
# For IBM Watsonx:
export WATSONX_API_KEY="your_api_key_here"
export WATSONX_PROJECT_ID="your_project_id_here"
export WATSONX_MODEL="meta-llama/llama-2-70b-chat"

# For OpenAI:
export OPENAI_API_KEY="your_api_key_here"
export OPENAI_MODEL="gpt-3.5-turbo"

# For AWS Bedrock:
export AWS_ACCESS_KEY_ID="your_access_key"
export AWS_SECRET_ACCESS_KEY="your_secret_key"
export AWS_REGION="us-east-1"
export AWS_MODEL="anthropic.claude-3-sonnet-20240229-v1:0"

# For Google Gemini:
export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account.json"
export GOOGLE_MODEL="gemini-1.5-pro"

Running the Application

The application automatically creates database tables and loads sample data (5000 records by default):

# Start with default settings (5000 sample records)
banko-ai run

# Start with custom data amount
banko-ai run --generate-data 10000

# Start without generating data
banko-ai run --no-data

# Start with debug mode
banko-ai run --debug

Database Operations

๐ŸŽฏ What Happens on Startup

  1. Database Connection: Connects to CockroachDB and creates necessary tables
  2. Table Creation: Creates expenses table with vector indexes and cache tables
  3. Data Generation: Automatically generates 5000 sample expense records with enriched descriptions
  4. AI Provider Setup: Initializes the selected AI provider and loads available models
  5. Web Server: Starts the Flask application on http://localhost:5000

๐Ÿ“Š Sample Data Features

The generated sample data includes:

  • Rich Descriptions: "Bought food delivery at McDonald's for $56.68 fast significant purchase restaurant and service paid with debit card this month"
  • Merchant Information: Realistic merchant names and categories
  • Amount Context: Expense amounts with contextual descriptions
  • Temporal Context: Recent, this week, this month, etc.
  • Payment Methods: Bank Transfer, Debit Card, Credit Card, Cash, Check
  • User-Specific Data: Multiple user IDs for testing user-specific search

Analytics Dashboard

๐ŸŒ Web Interface

Access the application at http://localhost:5000

Main Features

  • ๐Ÿ  Home: Overview dashboard with expense statistics
  • ๐Ÿ’ฌ Chat: AI-powered expense analysis and Q&A
  • ๐Ÿ” Search: Vector-based expense search
  • โš™๏ธ Settings: AI provider and model configuration
  • ๐Ÿ“Š Analytics: Detailed expense analysis and insights

Banko Response

๐Ÿ”ง CLI Commands

# Run the application
banko-ai run [OPTIONS]

# Generate sample data
banko-ai generate-data --count 2000

# Clear all data
banko-ai clear-data

# Check application status
banko-ai status

# Search expenses
banko-ai search "food delivery" --limit 10

# Show help
banko-ai help

๐Ÿ”Œ API Endpoints

Endpoint Method Description
/ GET Web interface
/api/health GET System health check
/api/ai-providers GET Available AI providers
/api/models GET Available models for current provider
/api/search POST Vector search expenses
/api/rag POST RAG-based Q&A

API Examples

# Health check
curl http://localhost:5000/api/health

# Search expenses
curl -X POST http://localhost:5000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "food delivery", "limit": 5}'

# RAG query
curl -X POST http://localhost:5000/api/rag \
  -H "Content-Type: application/json" \
  -d '{"query": "What are my biggest expenses this month?", "limit": 5}'

๐Ÿ—๏ธ Architecture

Database Schema

  • expenses: Main expense table with vector embeddings
  • query_cache: Cached search results
  • embedding_cache: Cached embeddings
  • insights_cache: Cached AI insights
  • vector_search_cache: Cached vector search results
  • cache_stats: Cache performance statistics

Vector Indexes

-- User-specific vector index for personalized search
CREATE INDEX idx_expenses_user_embedding ON expenses 
USING cspann (user_id, embedding vector_l2_ops);

-- General vector index for global search
CREATE INDEX idx_expenses_embedding ON expenses 
USING cspann (embedding vector_l2_ops);

-- Note: Regional partitioning syntax may vary by CockroachDB version
-- CREATE INDEX idx_expenses_regional ON expenses 
-- USING cspann (user_id, embedding vector_l2_ops) 
-- LOCALITY REGIONAL BY ROW AS region;

Benefits:

  • User-specific queries: Faster search within user's data
  • Contextual results: Enhanced merchant and amount information
  • Scalable performance: Optimized for large datasets
  • Multi-tenant support: Isolated user data with shared infrastructure

Cache Statistics

๐Ÿ”„ AI Provider Switching

Switch between AI providers and models dynamically:

  1. Go to Settings in the web interface
  2. Select your preferred AI provider
  3. Choose from available models
  4. Changes take effect immediately

Supported Providers

  • OpenAI: GPT-3.5, GPT-4, GPT-4 Turbo
  • AWS Bedrock: Claude 3 Sonnet, Claude 3 Haiku, Llama 2
  • IBM Watsonx: Granite models, Llama 2, Mistral
  • Google Gemini: Gemini 1.5 Pro, Gemini 1.5 Flash

AI Status

๐Ÿ“ˆ Performance Features

Caching System

  • Query Caching: Caches search results for faster responses
  • Embedding Caching: Caches vector embeddings to avoid recomputation
  • Insights Caching: Caches AI-generated insights
  • Multi-layer Optimization: Intelligent cache invalidation and refresh

Vector Search Optimization

  • User-Specific Indexes: Faster search for individual users
  • Regional Partitioning: Optimized for multi-region deployments
  • Data Enrichment: Enhanced descriptions improve search accuracy
  • Batch Processing: Efficient data loading and processing

Advanced Vector Features

For detailed demonstrations of vector indexing and search capabilities:

๐Ÿ“– Vector Index Demo Guide - Comprehensive guide covering:

  • User-specific vector indexing
  • Regional partitioning with multi-region CockroachDB
  • Performance benchmarking
  • Advanced search queries
  • RAG with user context
  • Troubleshooting and best practices

Query Watcher

๐Ÿ› ๏ธ Development

Project Structure

banko_ai/
โ”œโ”€โ”€ ai_providers/          # AI provider implementations
โ”œโ”€โ”€ config/               # Configuration management
โ”œโ”€โ”€ static/               # Web assets and images
โ”œโ”€โ”€ templates/            # HTML templates
โ”œโ”€โ”€ utils/                # Database and cache utilities
โ”œโ”€โ”€ vector_search/        # Vector search and data generation
โ””โ”€โ”€ web/                  # Flask web application

Adding New AI Providers

  1. Create a new provider class in ai_providers/
  2. Extend the BaseAIProvider class
  3. Implement required methods
  4. Add to the factory in ai_providers/factory.py

๐Ÿ› Troubleshooting

Common Issues

Database Connection Error

# Check CockroachDB is running
cockroach start --insecure --listen-addr=localhost:26257

# Verify database exists
cockroach sql --url="postgresql://root@localhost:26257/banko_ai?sslmode=disable" --execute "SHOW TABLES;"

AI Provider Disconnected

  • Verify API keys are set correctly
  • Check network connectivity
  • Ensure the selected model is available

No Search Results

  • Ensure sample data is loaded: banko-ai generate-data --count 1000
  • Check vector indexes are created
  • Verify search query format

Debug Mode

# Run with debug logging
banko-ai run --debug

# Check application status
banko-ai status

๐Ÿ“ License

MIT License - see LICENSE file for details.

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

๐Ÿ“ž Support

For issues and questions:


Built with โค๏ธ using CockroachDB, Flask, and modern AI technologies

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

banko_ai_assistant-1.0.4.tar.gz (8.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

banko_ai_assistant-1.0.4-py3-none-any.whl (8.9 MB view details)

Uploaded Python 3

File details

Details for the file banko_ai_assistant-1.0.4.tar.gz.

File metadata

  • Download URL: banko_ai_assistant-1.0.4.tar.gz
  • Upload date:
  • Size: 8.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for banko_ai_assistant-1.0.4.tar.gz
Algorithm Hash digest
SHA256 d7b9504881ffc0ba4c6a63f2d6654691c88de1f3bf2721c5d690d40d2c7de6f4
MD5 9a0e1bd3b22ef6cf53628441be446da6
BLAKE2b-256 0475e0b1f7199ff0ad193dbe78f42ddf6f1cd8227f96363c306098dadf14f52d

See more details on using hashes here.

File details

Details for the file banko_ai_assistant-1.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for banko_ai_assistant-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 37e0c706b04049f74238efea1a76925e6ec77e249c5c4e05babba7a74891e572
MD5 66172bf6a73674af29530e7a4aaa0d8f
BLAKE2b-256 7f2040760100694ca4993c59acc32b10fed0f4cee80c0a2fca08ac07e580e15d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page