Engage with your data (SQL, CSV, pandas, polars, mongodb, noSQL, etc.) using Ollama, an open-source tool that operates locally. Datadashr transforms data analysis into a conversational experience powered by Ollama LLMs and RAG.

These details have not been verified by PyPI

Project description

DataDashr Logo

Description

Converse with Your Data Through Open Source AI.

Unleash the power of your data with natural language questions.
Our open-source platform, built on Ollama, delivers powerful insights without the cost of APIs.

Integrate effortlessly with your existing infrastructure, connecting to various data sources including SQL, NoSQL, CSV, and XLS files.

Obtain in-depth analytics by aggregating data from multiple sources into a unified platform, providing a holistic view of your business.

Convert raw data into valuable insights, facilitating data-driven strategies and enhancing decision-making processes.

Design intuitive and interactive charts and visual representations to simplify the understanding and interpretation of your business metrics.

DataDashr Installation and Setup Guide

Installation

To install the DataDashr package, run the following command:

pip install datadashr

Requirements

To ensure a fully local system, we utilize Ollama and Codestral models. Follow these steps to set up the necessary components.

Step 1: Download Ollama

Download Ollama from the following link: https://ollama.com/download

Step 2: Install Models

Install the Codestral model for data processing by running the following command:

ollama pull codestral

Install the Llama3 model for conversation by running the following command:

ollama pull llama3

Install the Nomic-Embed-Text model for embedding by running the following command:

ollama pull nomic-embed-text

Configuration

Create a default settings file named datadashr_settings.json at the same level as your main script. This file should contain the following configuration:

{
  "llm_context": {
    "model_name": "llama3",
    "api_key": "None",
    "llm_type": "ollama"
  },
  "llm_data": {
    "model_name": "codestral",
    "api_key": "None",
    "llm_type": "ollama"
  },
  "vector_store": {
    "store_type": "chromadb"
  },
  "embedding": {
    "embedding_type": "ollama",
    "model_name": "nomic-embed-text:latest"
  },
  "enable_cache": "False",
  "format_type": "data",
  "reset_db": "True",
  "verbose": "True"
}

Initializing DataDashr

To initialize the DataDashr object with your data and LLM instance, use the following code:

from datadashr import DataDashr

# Define your import_data dictionary with your data sources
import_data = {
    'sources': [
        {"source_name": "employees_df", "data": employees_df, "source_type": "pandas",
         "description": "Contains employee details including their department.", "save_to_vector": False},
        {"source_name": "salaries_df", "data": salaries_df, "source_type": "pandas",
         "description": "Contains salary information for employees.", "save_to_vector": False},
        {"source_name": "departments_df", "data": departments_df, "source_type": "pandas",
         "description": "Contains information about departments and their managers.", "save_to_vector": False},
        {"source_name": "projects_df", "data": projects_df, "source_type": "pandas",
         "description": "Contains information about projects and the employees assigned to them.",
         "save_to_vector": False},
    ],
    'mapping': {
        "employeeid": [
            {"source": "employees_df", "field": "id"},
            {"source": "salaries_df", "field": "employeeid"},
            {"source": "projects_df", "field": "employeeid"}
        ],
        "department": [
            {"source": "employees_df", "field": "department"},
            {"source": "departments_df", "field": "department"}
        ]
    }
}

# Initialize DataDashr with the imported data
df = DataDashr(data=import_data)

# Execute a query on the combined DataFrame
result = df.chat('Show the Charlie salary', response_mode='data')

# Print the result
pprint(result)

Response Modes

response_mode = 'data': The system interacts with data in a tabular manner, automatically generating SQL queries and providing responses that include one or more tables, charts, or both.

response_mode = 'context': Enables RAG (Retrieval-Augmented Generation) mode, where data is vectorized. In this mode, you can import various sources such as PDFs, DOCs, websites, etc., and interact with the data naturally.

Here's an example of how to use the chat method with different response modes:

# Using response_mode 'data' for tabular interaction
result_data = df.chat('Show the Charlie salary', response_mode='data')
pprint(result_data)

# Using response_mode 'context' for natural interaction with vectorized data
result_context = df.chat('Explain the employee structure', response_mode='context')
pprint(result_context)

This tutorial provides a comprehensive guide to installing, configuring, and using DataDashr for both Pandas and Polars DataFrames, as well as interacting with data in different response modes.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.5

Aug 8, 2024

This version

0.2.4

Jul 8, 2024

0.2.3

Jun 28, 2024

0.2.1

Jun 26, 2024

0.2.0

Jun 26, 2024

0.1.7

Jun 11, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datadashr-0.2.4.tar.gz (10.4 MB view hashes)

Uploaded Jul 8, 2024 Source

Built Distribution

datadashr-0.2.4-py3-none-any.whl (10.7 MB view hashes)

Uploaded Jul 8, 2024 Python 3

Hashes for datadashr-0.2.4.tar.gz

Hashes for datadashr-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`735525fdc8c25c746952e0e9a1d798638f0e8007e52bb33738c1b494d80c7503`
MD5	`5ccef27e71fde53a7015fed981a7424a`
BLAKE2b-256	`efc490ec5c5a36a0d7833273bacd11019798d90fadfe8fec36fcb033a7811cd2`

Hashes for datadashr-0.2.4-py3-none-any.whl

Hashes for datadashr-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e0b342ae98d1e446fe2974b6c5cf045862ea76e8dbf3588a668bbe8e7174d12e`
MD5	`3fee6bec1bbfdfd01dc875ad2118f8a9`
BLAKE2b-256	`7ff682baa38b4d9da04dfee747d62e28ca9d0295a1172ccdc71b850f4d4ca5c2`