Query local or remote data files with natural language queries powered by OpenAI and DuckDB.

Project description

qabot

Query local or remote files with natural language queries powered by OpenAI's gpt and duckdb 🦆.

Can query local and remote files (CSV, parquet)

Installation

Install with uv, pipx, pip etc:

uv tool install qabot

Features

Works on local CSV, sqlite and Excel files:

remote CSV files:

$ qabot -f https://duckdb.org/data/holdings.csv -q "Tell me how many Apple holdings I currently have"
 🦆 Creating local DuckDB database...
 🦆 Loading data...
create view 'holdings' as select * from 'https://duckdb.org/data/holdings.csv';
 🚀 Sending query to LLM
 🧑 Tell me how many Apple holdings I currently have
 🤖 You currently have 32.23 shares of Apple.


This information was obtained by summing up all the Apple ('APPL') shares in the holdings table.

SELECT SUM(shares) as total_shares FROM holdings WHERE ticker = 'APPL'

Even on (public) data stored in S3:

$ qabot -f s3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv -q "how many confirmed cases of covid are there by month?" -v

🤖 Monthly confirmed cases from January to May 2020: ranging from 7 in January, 24 in February, 188,123 in March, 1,069,172 in April and 1,745,582 in May.

Extra Details (from qabot)

The above figures were computed by aggregating the dataset on a per-entity basis (using a unique identifier uid), selecting the last available (maximum) date in each month, and summing the confirmed case counts. Here is the SQL query that was used:

WITH monthly_data AS (
    SELECT uid, strftime('%Y-%m', date) AS month, MAX(date) AS max_date
    FROM memory.main.jhu_csse_covid_19_timeseries_merged
    GROUP BY uid, strftime('%Y-%m', date)
)
SELECT m.month, SUM(j.confirmed) AS confirmed
FROM monthly_data m
JOIN memory.main.jhu_csse_covid_19_timeseries_merged j
  ON m.uid = j.uid AND m.max_date = j.date
GROUP BY m.month
ORDER BY m.month;

This method ensures that for each month, the cumulative confirmed case count is captured at the end of the month based on the latest data available for each entity (uid).

Load data within a session

You can even load data from disk/URL via the natural language query:

Load the file 'data/titanic.csv' into a table called 'raw_passengers'. Create a view of the raw passengers table for just the male passengers. What was the average fare for surviving male passengers?

 🦆 Creating local DuckDB database...
 🚀 Sending query to LLM
 🤖 The average fare for surviving male passengers is approximately $40.82.


I created a table called `raw_passengers` from the Titanic dataset loaded from 'data/titanic.csv'. Then, I created a view called `male_passengers` that
includes only male passengers. Finally, I calculated the average fare for surviving male passengers, which is approximately $40.82.

SELECT AVG(Fare) AS average_fare_surviving_male FROM male_passengers WHERE Survived = 1;

Quickstart

You need to set the OPENAI_API_KEY environment variable to your OpenAI API key, which you can get from here. Other OpenAI compatible APIs can also be used by setting OPENAI_BASE_URL.

Install the qabot command line tool using uv/pip/pipx:

$ uv tool install qabot

Then run the qabot command with optional files (-f my-file.csv) and an initial query -q "How many...".

See all options with qabot --help

Security Risks

This program gives an LLM access to your local and network accessible files and allows it to execute arbitrary SQL queries in a DuckDB database, see Security for more information.

LLM Providers

qabot works with any OpenAI compatible api including Ollama and deepseek. Simple set the base URL:

export OPENAI_BASE_URL=https://api.deepseek.com

Or Ollama:

OPENAI_BASE_URL=http://localhost:11434/v1/ 
QABOT_MODEL_NAME=qwen2.5-coder:7b 
QABOT_PLANNING_MODEL_NAME=deepseek-r1:14b

Python API

from qabot import ask_wikidata, ask_file, ask_database

print(ask_wikidata("How many hospitals are there in New Zealand?"))
print(ask_file("How many men were aboard the titanic?", 'data/titanic.csv'))
print(ask_database("How many product images are there?", 'postgresql://user:password@localhost:5432/dbname'))

Output:

There are 54 hospitals in New Zealand.
There were 577 male passengers on the Titanic.
There are 6,225 product images.

Examples

Local CSV file/s

$ qabot -q "Show the survival rate by gender, and ticket class shown as an ASCII graph" -f data/titanic.csv
🦆 Loading data from files...
Loading data/titanic.csv into table titanic...

Here’s the survival count represented as a horizontal bar graph grouped by ticket class and gender:

Class 1:
Females  | ██████████████████████████████████████████ (91)
Males    | ██████████████ (45)

Class 2:
Females  | ██████████████████████████ (70)
Males    | ██████████ (17)

Class 3:
Females  | ██████████████████████████████ (72)
Males    | ██████████████ (47)


This representation allows us to observe that in all classes, a greater number of female passengers survived compared to male passengers, and also highlights the number of survivors is notably higher in the first class compared to the other classes.

Query WikiData

Use the -w flag to query wikidata.

$ qabot -w -q "How many Hospitals are there located in Beijing"

Intermediate steps and database queries

Use the -v flag to see the intermediate steps and database queries. Sometimes it takes a long route to get to the answer, but it's often interesting to see how it gets there.

Data accessed via http/s3

Use the -f <url> flag to load data from a url, e.g. a csv file on s3:

$ qabot -f s3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv -q "how many confirmed cases of covid are there?" -v
🦆 Loading data from files...
create table jhu_csse_covid_19_timeseries_merged as select * from 's3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv';

Result:
264308334 confirmed cases

Docker Usage

You can run qabot via Docker:

docker run --rm \
  -e OPENAI_API_KEY=<your_openai_api_key> \
  -v ./data:/opt
  ghcr.io/hardbyte/qabot -f /opt/titanic.csv -q "What ratio of passengers were under 30?"

Replace the mount path to your actual data along with replacing your_openai_api_key.

Ideas

G-Sheets via https://github.com/evidence-dev/duckdb_gsheets
Streaming mode to output results as they come in
token limits and better reporting of costs
Supervisor agent - assess whether a query is "safe" to run, could ask for user confirmation to run anything that gets flagged.
Often we can zero-shot the question and get a single query out - perhaps we try this before the MKL chain
test each zeroshot agent individually
Generate and pass back assumptions made to the user
Add an optional "clarify" tool to the chain that asks the user to clarify the question
Create a query checker tool that checks if the query looks valid and/or safe
Inject AWS credentials into duckdb for access to private resources in S3
Automatic publishing to pypi e.g. using trusted publishers

Project details

Release history Release notifications | RSS feed

This version

0.7.2

Feb 19, 2025

0.7.1

Feb 19, 2025

0.6.1

Feb 5, 2025

0.6.0

Sep 16, 2024

0.5.9

Aug 29, 2024

0.5.8

Jun 6, 2024

0.5.7

Mar 16, 2024

0.5.6

Mar 14, 2024

0.5.5

Mar 12, 2024

0.5.4

Mar 12, 2024

0.5.3

Mar 12, 2024

0.5.2

Mar 11, 2024

0.5.1

Mar 8, 2024

0.5.0

Mar 8, 2024

0.4.7

Nov 11, 2023

0.4.6

Nov 8, 2023

0.4.5

Nov 8, 2023

0.4.4

Sep 18, 2023

0.4.3

Sep 17, 2023

0.4.2

Sep 17, 2023

0.4.1

Sep 16, 2023

0.4.0

Sep 16, 2023

0.3.4

Apr 20, 2023

0.3.3

Apr 12, 2023

0.3.2

Apr 12, 2023

0.3.1

Apr 12, 2023

0.3.0

Apr 12, 2023

0.2.15

Apr 12, 2023

0.2.14

Apr 9, 2023

0.2.13

Apr 2, 2023

0.2.12

Apr 1, 2023

0.2.11

Apr 1, 2023

0.2.10

Mar 24, 2023

0.2.9

Mar 23, 2023

0.2.8

Mar 11, 2023

0.2.7

Mar 11, 2023

0.2.6

Mar 8, 2023

0.2.5

Mar 8, 2023

0.2.4

Mar 8, 2023

0.2.3

Mar 6, 2023

0.2.2

Mar 6, 2023

0.2.1

Mar 5, 2023

0.2.0

Mar 5, 2023

0.1.5

Mar 4, 2023

0.1.4

Mar 4, 2023

0.1.2

Mar 4, 2023

0.1.1

Mar 3, 2023

0.1.0

Mar 3, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qabot-0.7.2.tar.gz (25.8 kB view details)

Uploaded Feb 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qabot-0.7.2-py3-none-any.whl (26.7 kB view details)

Uploaded Feb 19, 2025 Python 3

File details

Details for the file qabot-0.7.2.tar.gz.

File metadata

Download URL: qabot-0.7.2.tar.gz
Upload date: Feb 19, 2025
Size: 25.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for qabot-0.7.2.tar.gz
Algorithm	Hash digest
SHA256	`ff64f857f689709da2e2ef0c570c94c9615036f20e2732cd803bd00ddeafa90c`
MD5	`425428f6917af91b74558220ec68ce9a`
BLAKE2b-256	`7308d35a35b6b884cf42018d5f7bd877089b71f128f4cc560415c1b9a292df2d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for qabot-0.7.2.tar.gz:

Publisher: ci.yml on hardbyte/qabot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: qabot-0.7.2.tar.gz
- Subject digest: ff64f857f689709da2e2ef0c570c94c9615036f20e2732cd803bd00ddeafa90c
- Sigstore transparency entry: 172382420
- Sigstore integration time: Feb 19, 2025
Source repository:
- Permalink: hardbyte/qabot@d471ce3fde67de137b027b0079359fa76bb6e0ee
- Branch / Tag: refs/tags/v0.7.2
- Owner: https://github.com/hardbyte
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@d471ce3fde67de137b027b0079359fa76bb6e0ee
- Trigger Event: release

File details

Details for the file qabot-0.7.2-py3-none-any.whl.

File metadata

Download URL: qabot-0.7.2-py3-none-any.whl
Upload date: Feb 19, 2025
Size: 26.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for qabot-0.7.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eede476e719e97ee2ad5ca5c7078faf144e978aa25d53bdcd21c6c926561b15c`
MD5	`20786aba994308512f31c6bc2533fa0e`
BLAKE2b-256	`23bf88329face1872bef9dcf88fca185fb5dbdd2d028b603a53fb928e86d8a24`

See more details on using hashes here.

Provenance

The following attestation bundles were made for qabot-0.7.2-py3-none-any.whl:

Publisher: ci.yml on hardbyte/qabot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: qabot-0.7.2-py3-none-any.whl
- Subject digest: eede476e719e97ee2ad5ca5c7078faf144e978aa25d53bdcd21c6c926561b15c
- Sigstore transparency entry: 172382421
- Sigstore integration time: Feb 19, 2025
Source repository:
- Permalink: hardbyte/qabot@d471ce3fde67de137b027b0079359fa76bb6e0ee
- Branch / Tag: refs/tags/v0.7.2
- Owner: https://github.com/hardbyte
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@d471ce3fde67de137b027b0079359fa76bb6e0ee
- Trigger Event: release

qabot 0.7.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

qabot

Installation

Features

Load data within a session

Quickstart

Security Risks

LLM Providers

Python API

Examples

Local CSV file/s

Query WikiData

Intermediate steps and database queries

Data accessed via http/s3

Docker Usage

Ideas

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance