Skip to main content

AI-powered expense analysis and RAG system with CockroachDB vector search and multi-provider AI support (OpenAI, AWS Bedrock, IBM Watsonx, Google Gemini)

Project description

PyPI version Python versions License Downloads

๐Ÿค– Banko AI Assistant - RAG Demo

A modern AI-powered expense analysis application with Retrieval-Augmented Generation (RAG) capabilities, built with CockroachDB vector search and multiple AI provider support.

Banko AI Assistant

โœจ Features

  • ๐Ÿ” Advanced Vector Search: Enhanced expense search using CockroachDB vector indexes
  • ๐Ÿค– Multi-AI Provider Support: OpenAI, AWS Bedrock, IBM Watsonx, Google Gemini
  • ๐Ÿ”„ Dynamic Model Switching: Switch between models without restarting the app
  • ๐Ÿ‘ค User-Specific Indexing: User-based vector indexes with regional partitioning
  • ๐Ÿ“Š Data Enrichment: Contextual expense descriptions for better search accuracy
  • ๐Ÿ’พ Intelligent Caching: Multi-layer caching system for optimal performance
  • ๐ŸŒ Modern Web Interface: Clean, responsive UI with real-time chat
  • ๐Ÿ“ˆ Analytics Dashboard: Comprehensive expense analysis and insights
  • ๐Ÿ“ฆ PyPI Package: Easy installation with pip install banko-ai-assistant
  • ๐ŸŽฏ Enhanced Context: Merchant and amount information included in search context
  • โšก Performance Optimized: User-specific vector indexes for faster queries

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8+
  • CockroachDB v25.2.4+ (recommended: v25.3.3)
  • Vector Index Feature Enabled (required for vector search)
  • AI Provider API Key (OpenAI, AWS, IBM Watsonx, or Google Gemini)

CockroachDB Setup

  1. Download and Install CockroachDB:

    # Download CockroachDB v25.3.3 (recommended)
    # Visit: https://www.cockroachlabs.com/docs/releases/v25.3#v25-3-3
    
    # Or install via package manager
    brew install cockroachdb/tap/cockroach  # macOS
    
  2. Start CockroachDB Single Node:

    # Start a single-node cluster (for development)
    cockroach start-single-node \
      --insecure \
      --store=./cockroach-data \
      --listen-addr=localhost:26257 \
      --http-addr=localhost:8080 \
      --background
    
  3. Enable Vector Index Feature:

    -- Connect to the database
    cockroach sql --url="cockroachdb://root@localhost:26257/defaultdb?sslmode=disable"
    
    -- Enable vector index feature (required for vector search)
    SET CLUSTER SETTING feature.vector_index.enabled = true;
    
  4. Verify Setup:

    -- Check if vector index is enabled
    SHOW CLUSTER SETTING feature.vector_index.enabled;
    -- Should return: true
    

Installation

Option 1: PyPI Installation (Recommended)

# Install from PyPI
pip install banko-ai-assistant

# Set up environment variables (example with OpenAI)
export AI_SERVICE="openai"
export OPENAI_API_KEY="your_openai_api_key_here"
export OPENAI_MODEL="gpt-4o-mini"
export DATABASE_URL="cockroachdb://root@localhost:26257/defaultdb?sslmode=disable"

# Run the application
banko-ai run

Option 2: Development Installation

# Clone the repository
git clone https://github.com/cockroachlabs-field/banko-ai-assistant-rag-demo
cd banko-ai-assistant-rag-demo

# Install the package in development mode
pip install -e .

# Run the application
banko-ai run

Option 3: Direct Dependencies

# Install dependencies from pyproject.toml
pip install -e .

# Run the application
banko-ai run

Configuration

Set up your environment variables:

# Required: Database connection
export DATABASE_URL="cockroachdb://root@localhost:26257/defaultdb?sslmode=disable"

# Required: AI Service (choose one)
export AI_SERVICE="watsonx"  # or "openai", "aws", "gemini"

# AI Provider Configuration (choose based on AI_SERVICE)
# For IBM Watsonx:
export WATSONX_API_KEY="your_api_key_here"
export WATSONX_PROJECT_ID="your_project_id_here"
export WATSONX_MODEL="meta-llama/llama-2-70b-chat"

# For OpenAI:
export OPENAI_API_KEY="your_api_key_here"
export OPENAI_MODEL="gpt-4o-mini"  # Options: gpt-4o-mini (default), gpt-4o, gpt-4-turbo, gpt-3.5-turbo

# For AWS Bedrock:
export AWS_ACCESS_KEY_ID="your_access_key"
export AWS_SECRET_ACCESS_KEY="your_secret_key"
export AWS_REGION="us-east-1"
export AWS_MODEL="us.anthropic.claude-3-5-sonnet-20241022-v2:0"  # Claude 3.5 Sonnet (default)

# For Google Gemini:
export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account.json"
export GOOGLE_MODEL="gemini-1.5-pro"

Running the Application

The application automatically creates database tables and loads sample data (5000 records by default):

# Start with default settings (5000 sample records)
banko-ai run

# Start with custom data amount
banko-ai run --generate-data 10000

# Start without generating data
banko-ai run --no-data

# Start with debug mode
banko-ai run --debug

Database Operations

๐ŸŽฏ What Happens on Startup

  1. Database Connection: Connects to CockroachDB and creates necessary tables
  2. Table Creation: Creates expenses table with vector indexes and cache tables
  3. Data Generation: Automatically generates 5000 sample expense records with enriched descriptions
  4. AI Provider Setup: Initializes the selected AI provider and loads available models
  5. Web Server: Starts the Flask application on http://localhost:5000

๐Ÿ“Š Sample Data Features

The generated sample data includes:

  • Rich Descriptions: "Bought food delivery at McDonald's for $56.68 fast significant purchase restaurant and service paid with debit card this month"
  • Merchant Information: Realistic merchant names and categories
  • Amount Context: Expense amounts with contextual descriptions
  • Temporal Context: Recent, this week, this month, etc.
  • Payment Methods: Bank Transfer, Debit Card, Credit Card, Cash, Check
  • User-Specific Data: Multiple user IDs for testing user-specific search

Analytics Dashboard

๐ŸŒ Web Interface

Access the application at http://localhost:5000

Main Features

  • ๐Ÿ  Home: Overview dashboard with expense statistics
  • ๐Ÿ’ฌ Chat: AI-powered expense analysis and Q&A
  • ๐Ÿ” Search: Vector-based expense search
  • โš™๏ธ Settings: AI provider and model configuration
  • ๐Ÿ“Š Analytics: Detailed expense analysis and insights

Banko Response

๐Ÿ”ง CLI Commands

# Run the application
banko-ai run [OPTIONS]

# Generate sample data
banko-ai generate-data --count 2000

# Clear all data
banko-ai clear-data

# Check application status
banko-ai status

# Search expenses
banko-ai search "food delivery" --limit 10

# Show help
banko-ai help

๐Ÿ”Œ API Endpoints

Endpoint Method Description
/ GET Web interface
/api/health GET System health check
/api/ai-providers GET Available AI providers
/api/models GET Available models for current provider
/api/search POST Vector search expenses
/api/rag POST RAG-based Q&A

API Examples

# Health check
curl http://localhost:5000/api/health

# Search expenses
curl -X POST http://localhost:5000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "food delivery", "limit": 5}'

# RAG query
curl -X POST http://localhost:5000/api/rag \
  -H "Content-Type: application/json" \
  -d '{"query": "What are my biggest expenses this month?", "limit": 5}'

๐Ÿ—๏ธ Architecture

Database Schema

  • expenses: Main expense table with vector embeddings
  • query_cache: Cached search results
  • embedding_cache: Cached embeddings
  • insights_cache: Cached AI insights
  • vector_search_cache: Cached vector search results
  • cache_stats: Cache performance statistics

Vector Indexes

-- User-specific vector index for personalized search
CREATE INDEX idx_expenses_user_embedding ON expenses 
USING cspann (user_id, embedding vector_l2_ops);

-- General vector index for global search
CREATE INDEX idx_expenses_embedding ON expenses 
USING cspann (embedding vector_l2_ops);

-- Note: Regional partitioning syntax may vary by CockroachDB version
-- CREATE INDEX idx_expenses_regional ON expenses 
-- USING cspann (user_id, embedding vector_l2_ops) 
-- LOCALITY REGIONAL BY ROW AS region;

Benefits:

  • User-specific queries: Faster search within user's data
  • Contextual results: Enhanced merchant and amount information
  • Scalable performance: Optimized for large datasets
  • Multi-tenant support: Isolated user data with shared infrastructure

Cache Statistics

๐Ÿ”„ AI Provider Switching

Switch between AI providers and models dynamically:

  1. Go to Settings in the web interface
  2. Select your preferred AI provider
  3. Choose from available models
  4. Changes take effect immediately

Supported Providers

  • OpenAI: GPT-4o-mini (default), GPT-4o, GPT-4 Turbo, GPT-4, GPT-3.5 Turbo
  • AWS Bedrock: Claude 3.5 Sonnet (default), Claude 3.5 Haiku, Claude 3 Opus, Claude 3 Sonnet
  • IBM Watsonx: GPT-OSS-120B (default), Llama 2 (70B, 13B, 7B), Granite models
  • Google Gemini: Gemini 1.5 Pro (default), Gemini 1.5 Flash, Gemini 1.0 Pro

AI Status

๐Ÿ“ˆ Performance Features

Caching System

  • Query Caching: Caches search results for faster responses
  • Embedding Caching: Caches vector embeddings to avoid recomputation
  • Insights Caching: Caches AI-generated insights
  • Multi-layer Optimization: Intelligent cache invalidation and refresh

Vector Search Optimization

  • User-Specific Indexes: Faster search for individual users
  • Regional Partitioning: Optimized for multi-region deployments
  • Data Enrichment: Enhanced descriptions improve search accuracy
  • Batch Processing: Efficient data loading and processing

Advanced Vector Features

For detailed demonstrations of vector indexing and search capabilities:

๐Ÿ“– Vector Index Demo Guide - Comprehensive guide covering:

  • User-specific vector indexing
  • Regional partitioning with multi-region CockroachDB
  • Performance benchmarking
  • Advanced search queries
  • RAG with user context
  • Troubleshooting and best practices

Query Watcher

๐Ÿ› ๏ธ Development

Project Structure

banko_ai/
โ”œโ”€โ”€ ai_providers/          # AI provider implementations
โ”œโ”€โ”€ config/               # Configuration management
โ”œโ”€โ”€ static/               # Web assets and images
โ”œโ”€โ”€ templates/            # HTML templates
โ”œโ”€โ”€ utils/                # Database and cache utilities
โ”œโ”€โ”€ vector_search/        # Vector search and data generation
โ””โ”€โ”€ web/                  # Flask web application

Adding New AI Providers

  1. Create a new provider class in ai_providers/
  2. Extend the BaseAIProvider class
  3. Implement required methods
  4. Add to the factory in ai_providers/factory.py

๐Ÿ› Troubleshooting

Common Issues

CockroachDB Version Issues

# Check CockroachDB version (must be v25.2.4+)
cockroach version

# If version is too old, download v25.3.3:
# https://www.cockroachlabs.com/docs/releases/v25.3#v25-3-3

Vector Index Feature Not Enabled

# Connect to database and enable vector index feature
cockroach sql --url="cockroachdb://root@localhost:26257/defaultdb?sslmode=disable"

# Enable vector index feature
SET CLUSTER SETTING feature.vector_index.enabled = true;

# Verify it's enabled
SHOW CLUSTER SETTING feature.vector_index.enabled;

Database Connection Error

# Start CockroachDB single node
cockroach start-single-node \
  --insecure \
  --store=./cockroach-data \
  --listen-addr=localhost:26257 \
  --http-addr=localhost:8080 \
  --background

# Verify database exists
cockroach sql --url="cockroachdb://root@localhost:26257/defaultdb?sslmode=disable" --execute "SHOW TABLES;"

AI Provider Disconnected

  • Verify API keys are set correctly
  • Check network connectivity
  • Ensure the selected model is available

No Search Results

  • Ensure sample data is loaded: banko-ai generate-data --count 1000
  • Check vector indexes are created
  • Verify search query format

Debug Mode

# Run with debug logging
banko-ai run --debug

# Check application status
banko-ai status

๐Ÿ“ License

MIT License - see LICENSE file for details.

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

๐Ÿ“ž Support

For issues and questions:


Built with โค๏ธ using CockroachDB, Flask, and modern AI technologies such as watsonx.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

banko_ai_assistant-1.0.26.tar.gz (5.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

banko_ai_assistant-1.0.26-py3-none-any.whl (5.7 MB view details)

Uploaded Python 3

File details

Details for the file banko_ai_assistant-1.0.26.tar.gz.

File metadata

  • Download URL: banko_ai_assistant-1.0.26.tar.gz
  • Upload date:
  • Size: 5.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for banko_ai_assistant-1.0.26.tar.gz
Algorithm Hash digest
SHA256 c6dadfe19a212f72c7b72021e01ca9e0c83c3ff8f383080f9d247b11014bba40
MD5 5cdd6b7a9821d1cff1551a098c09807e
BLAKE2b-256 58b2215f4a92941dca989a19db17eba7f49699a65a33405a76f97435305c9182

See more details on using hashes here.

File details

Details for the file banko_ai_assistant-1.0.26-py3-none-any.whl.

File metadata

File hashes

Hashes for banko_ai_assistant-1.0.26-py3-none-any.whl
Algorithm Hash digest
SHA256 20c56077ec393ce8028cf901220d86cbaafc9404c9efde523f28ce200dd5a234
MD5 d26236362be6d256c787ab82685af63d
BLAKE2b-256 ce47b7fc53aec88182ebaa812cad16f287fac718456c812169945705decea6a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page