No project description provided
Project description
qabot
Query local or remote files with natural language queries powered by
langchain
and gpt-3.5-turbo
and duckdb
๐ฆ.
Works on local CSV files:
remote CSV files:
$ qabot \
-f https://www.stats.govt.nz/assets/Uploads/Environmental-economic-accounts/Environmental-economic-accounts-data-to-2020/renewable-energy-stock-account-2007-2020-csv.csv \
-q "How many Gigawatt hours of generation was there for Solar resources in 2015 through to 2020?"
Even on (public) data stored in S3:
Quickstart
You need to set the OPENAI_API_KEY
environment variable to your OpenAI API key,
which you can get from here.
Install the qabot
command line tool using pip/poetry:
$ pip install qabot
Then run the qabot
command with either local files (-f my-file.csv
) or a database connection string.
Note if you want to use a database, you will need to install the relevant drivers,
e.g. pip install psycopg2-binary
for postgres.
Examples
Local CSV file/s
$ qabot -q "how many passengers survived by gender?" -f data/titanic.csv
๐ฆ Loading data from files...
Loading data/titanic.csv into table titanic...
Query: how many passengers survived by gender?
Result:
There were 233 female passengers and 109 male passengers who survived.
๐ any further questions? [y/n] (y): y
๐ Query: what was the largest family who did not survive?
Query: what was the largest family who did not survive?
Result:
The largest family who did not survive was the Sage family, with 8 members.
๐ any further questions? [y/n] (y): n
Intermediate steps and database queries
Use the -v
flag to see the intermediate steps and database queries.
Sometimes it takes a long route to get to the answer, but it's interesting to see how it gets there:
qabot -f data/titanic.csv -q "how many passengers survived by gender?" -v
๐ฆ Loading data from files...
Query: how many passengers survived by gender?
I need to check the columns in the 'titanic' table to see which ones contain gender and survival information.
Action: Describe Table
Action Input: titanic
Observation: titanic
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโ
โ column_name โ column_type โ null โ key โ default โ extra โ
โ varchar โ varchar โ varchar โ varchar โ varchar โ int32 โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโค
โ PassengerId โ BIGINT โ YES โ NULL โ NULL โ NULL โ
โ Survived โ BIGINT โ YES โ NULL โ NULL โ NULL โ
โ Pclass โ BIGINT โ YES โ NULL โ NULL โ NULL โ
โ Name โ VARCHAR โ YES โ NULL โ NULL โ NULL โ
โ Sex โ VARCHAR โ YES โ NULL โ NULL โ NULL โ
โ Age โ DOUBLE โ YES โ NULL โ NULL โ NULL โ
โ SibSp โ BIGINT โ YES โ NULL โ NULL โ NULL โ
โ Parch โ BIGINT โ YES โ NULL โ NULL โ NULL โ
โ Ticket โ VARCHAR โ YES โ NULL โ NULL โ NULL โ
โ Fare โ DOUBLE โ YES โ NULL โ NULL โ NULL โ
โ Cabin โ VARCHAR โ YES โ NULL โ NULL โ NULL โ
โ Embarked โ VARCHAR โ YES โ NULL โ NULL โ NULL โ
โโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโค
โ 12 rows 6 columns โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
I need to create a view that only includes the columns I need for this question.
Action: Data Op
Action Input:
CREATE VIEW titanic_gender_survival AS
SELECT Sex, Survived
FROM titanic
Thought:
> Entering new AgentExecutor chain...
This is a valid SQL query creating a view. We can execute it directly.
Action: execute
Action Input:
CREATE VIEW titanic_gender_survival AS
SELECT Sex, Survived
FROM titanic
Observation: No output
Thought:The view has been created successfully. We can now query it.
Action: execute
Action Input: SELECT * FROM titanic_gender_survival LIMIT 5
Observation:
โโโโโโโโโโโฌโโโโโโโโโโโ
โ Sex โ Survived โ
โ varchar โ int64 โ
โโโโโโโโโโโผโโโโโโโโโโโค
โ male โ 0 โ
โ female โ 1 โ
โ female โ 1 โ
โ female โ 1 โ
โ male โ 0 โ
โโโโโโโโโโโดโโโโโโโโโโโ
Thought:The view has been created successfully and we can see the first 5 rows of the view. The final answer is the first 5 rows of the titanic_gender_survival view, showing the sex and survival status of passengers on the
Titanic.
Final Answer:
โโโโโโโโโโโฌโโโโโโโโโโโ
โ Sex โ Survived โ
โ varchar โ int64 โ
โโโโโโโโโโโผโโโโโโโโโโโค
โ male โ 0 โ
โ female โ 1 โ
โ female โ 1 โ
โ female โ 1 โ
โ male โ 0 โ
โโโโโโโโโโโดโโโโโโโโโโโ
> Finished chain.
Observation: โโโโโโโโโโโฌโโโโโโโโโโโ
โ Sex โ Survived โ
โ varchar โ int64 โ
โโโโโโโโโโโผโโโโโโโโโโโค
โ male โ 0 โ
โ female โ 1 โ
โ female โ 1 โ
โ female โ 1 โ
โ male โ 0 โ
โโโโโโโโโโโดโโโโโโโโโโโ
I need to group the data by gender and count the number of survivors for each group.
Action: Data Op
Action Input:
SELECT Sex, SUM(Survived) AS num_survived
FROM titanic_gender_survival
GROUP BY Sex
Thought:
> Entering new AgentExecutor chain...
We need to check if the table titanic_gender_survival exists and if it has the columns Sex and Survived.
Action: Describe Table
Action Input: titanic_gender_survival
Observation: titanic_gender_survival
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโ
โ column_name โ column_type โ null โ key โ default โ extra โ
โ varchar โ varchar โ varchar โ varchar โ varchar โ int32 โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโค
โ Sex โ VARCHAR โ YES โ NULL โ NULL โ NULL โ
โ Survived โ BIGINT โ YES โ NULL โ NULL โ NULL โ
โโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโ
Thought:The table titanic_gender_survival exists and has the columns Sex and Survived. We can now run the query.
Action: execute
Action Input:
SELECT Sex, SUM(Survived) AS num_survived FROM titanic_gender_survival GROUP BY Sex LIMIT 5
Observation:
โโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ Sex โ num_survived โ
โ varchar โ int128 โ
โโโโโโโโโโโผโโโโโโโโโโโโโโโค
โ male โ 109 โ
โ female โ 233 โ
โโโโโโโโโโโดโโโโโโโโโโโโโโโ
Thought:The query returned the number of survivors grouped by gender. The table titanic_gender_survival has been used.
Final Answer: The number of survivors grouped by gender are:
- 109 males survived
- 233 females survived.
> Finished chain.
Observation: The number of survivors grouped by gender are:
- 109 males survived
- 233 females survived.
Intermediate Steps:
Step 1
Describe Table
titanic
titanic
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโ
โ column_name โ column_type โ null โ key โ default โ extra โ
โ varchar โ varchar โ varchar โ varchar โ varchar โ int32 โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโค
โ PassengerId โ BIGINT โ YES โ NULL โ NULL โ NULL โ
โ Survived โ BIGINT โ YES โ NULL โ NULL โ NULL โ
โ Pclass โ BIGINT โ YES โ NULL โ NULL โ NULL โ
โ Name โ VARCHAR โ YES โ NULL โ NULL โ NULL โ
โ Sex โ VARCHAR โ YES โ NULL โ NULL โ NULL โ
โ Age โ DOUBLE โ YES โ NULL โ NULL โ NULL โ
โ SibSp โ BIGINT โ YES โ NULL โ NULL โ NULL โ
โ Parch โ BIGINT โ YES โ NULL โ NULL โ NULL โ
โ Ticket โ VARCHAR โ YES โ NULL โ NULL โ NULL โ
โ Fare โ DOUBLE โ YES โ NULL โ NULL โ NULL โ
โ Cabin โ VARCHAR โ YES โ NULL โ NULL โ NULL โ
โ Embarked โ VARCHAR โ YES โ NULL โ NULL โ NULL โ
โโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโค
โ 12 rows 6 columns โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Step 2
Data Op
CREATE VIEW titanic_gender_survival AS
SELECT Sex, Survived
FROM titanic
โโโโโโโโโโโฌโโโโโโโโโโโ
โ Sex โ Survived โ
โ varchar โ int64 โ
โโโโโโโโโโโผโโโโโโโโโโโค
โ male โ 0 โ
โ female โ 1 โ
โ female โ 1 โ
โ female โ 1 โ
โ male โ 0 โ
โโโโโโโโโโโดโโโโโโโโโโโ
Step 3
Data Op
SELECT Sex, SUM(Survived) AS num_survived
FROM titanic_gender_survival
GROUP BY Sex
The number of survivors grouped by gender are:
- 109 males survived
- 233 females survived.
Thought:
Result:
109 males and 233 females survived.
Data accessed via http/s3
Use the -f <url>
flag to load data from a url, e.g. a csv file on s3:
$ qabot -f s3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv -q "how many confirmed cases are there" -v
๐ฆ Loading data from files...
create table jhu_csse_covid_19_timeseries_merged as select * from 's3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv';
Result:
264308334 confirmed cases
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.