Skip to main content

An AI Data framework to create AI Data Analyst

Project description

Data Neuron

Data Neuron is a powerful AI-driven data framework to create and maintain AI DATA analyst.

A small framework, Data Neuron is optimized for working with subsets of database, typically handling 10 to 15 tables.

Data Neuron's objective is to give an ability to maintain and improve the semantic layer/knowledge graph, there by letting an AI agent with general intelligence to be Data Intelligent specific to your data.

Features

  • Support for multiple database types (SQLite, PostgreSQL, MySQL, MSSQL)
  • Natural language to SQL query conversion
  • Interactive chat mode for continuous database querying
  • Automatic context generation from database schema
  • Customizable context for improved query accuracy
  • Support for various LLM providers (Claude, OpenAI, Azure, Custom, Ollama)
  • Optimized for smaller database subsets (up to 10 tables)

Installation

Data Neuron can be installed with different database support options:

  1. Base package (SQLite support only):

    pip install data-neuron
    
  2. With PostgreSQL support:

    pip install data-neuron[postgres]
    
  3. With MySQL support:

    pip install data-neuron[mysql]
    
  4. With MSSQL support:

    pip install data-neuron[mssql]
    
  5. With all database supports:

    pip install data-neuron[all]
    

Quick Start

  1. Initialize database configuration:

    dnn --db-init <database_type>
    

    Replace <database_type> with sqlite, mysql, mssql, or postgres.

  2. Generate context from your database:

    dnn --init
    

    This will create YAML files in the context/ directory.

  3. Ask a question about your database:

    dnn --ask "What is the total user count?"
    
  4. Or start an interactive chat session:

    dnn --chat
    

Configuration

Data Neuron supports various LLM providers. Set the following environment variables based on your chosen provider:

Claude (Default)

CLAUDE_API_KEY=your_claude_api_key_here

OpenAI

DATA_NEURON_LLM=openai
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4  # Optional, defaults to gpt-4o

Azure OpenAI

DATA_NEURON_LLM=azure
AZURE_OPENAI_API_KEY=your_azure_api_key_here
AZURE_OPENAI_API_VERSION=your_api_version_here
AZURE_OPENAI_ENDPOINT=your_azure_endpoint_here
AZURE_OPENAI_DEPLOYMENT_NAME=your_deployment_name_here

Custom Provider

DATA_NEURON_LLM=custom
DATA_NEURON_LLM_API_KEY=your_custom_api_key_here
DATA_NEURON_LLM_ENDPOINT=your_custom_endpoint_here
DATA_NEURON_LLM_MODEL=your_preferred_model_here

Ollama (for local LLM models)

Note: Doesn't generate good set of results.

DATA_NEURON_LLM=ollama
DATA_NEURON_LLM_MODEL=your_preferred_local_model_here

Usage

  • Initialize database config: dnn --db-init <database_type>
  • Generate context: dnn --init
  • Ask a question: dnn --ask "Your question here"
  • Start chat mode: dnn --chat

Roadmap

We have exciting plans for the future of Data Neuron:

  1. Expanded Database Support:

    • Add support for additional databases and data warehouses
    • Integrate with popular cloud data platforms
  2. API Server Capability:

    • Develop an API server mode to respond to queries based on context
    • Enable seamless integration with other applications and services
  3. Context Marts:

    • Implement the concept of context marts (e.g., marketing_context_mart, product_context_mart)
    • Allow for more focused and efficient querying within specific domains
  4. Synthetic Query Generation:

    • Create a system for generating synthetic queries
    • Enhance testing and development processes
  5. Deterministic Testing:

    • Develop a suite of deterministic tests for query accuracy
    • Enable easy comparison and evaluation of different LLM models
  6. Continuous Improvement Framework:

    • Implement mechanisms for ongoing learning and refinement of the AI model
    • Incorporate user feedback to enhance query generation accuracy
  7. Scalability Enhancements:

    • Optimize performance for larger datasets while maintaining focus on subset efficiency
    • Explore distributed processing options for more complex queries
  8. An Agentic Analyst.

Contributing

We welcome contributions to Data Neuron! Please see our Contributing Guide for more details on how to get started.

Development

To set up Data Neuron for development:

  1. Clone the repository:

    git clone https://github.com/databrainhq/dataneuron.git
    cd dataneuron
    
  2. Install dependencies using Poetry:

    poetry install
    
  3. Run tests:

    poetry run pytest
    

Note: Tests are still being added.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For questions, suggestions, or issues, please open an issue on the GitHub repository or contact the maintainers directly.

Happy querying with Data Neuron!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataneuron-0.1.0.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataneuron-0.1.0-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file dataneuron-0.1.0.tar.gz.

File metadata

  • Download URL: dataneuron-0.1.0.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.19 Darwin/23.5.0

File hashes

Hashes for dataneuron-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f0d436c8b651cbf71458e130c4917935635945336faa2b915915cadf8e0d98ee
MD5 b7077aded62b57d07983645297f05a4f
BLAKE2b-256 a53a282e27674edd8ed7f1a67feb6d8103348eedbbb9d2feee8cc30cd02963a9

See more details on using hashes here.

File details

Details for the file dataneuron-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dataneuron-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.19 Darwin/23.5.0

File hashes

Hashes for dataneuron-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 139ee1eba34275722b6498f2a7d2b1ae23087c934993e5990a6823f06373ac05
MD5 9d6dfaf2ed79a8c55398601e70db5e17
BLAKE2b-256 8c5e4c0b03baff748548882507ac5e5cb38de009ef8f46472901a0105d7f01ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page