No project description provided
Project description
qabot
Query local or remote files or databases with natural language queries powered by
langchain
and openai
.
Works on local CSV files:
remote CSV files:
$ qabot \
-f https://www.stats.govt.nz/assets/Uploads/Environmental-economic-accounts/Environmental-economic-accounts-data-to-2020/renewable-energy-stock-account-2007-2020-csv.csv \
-q "How many Gigawatt hours of generation was there for Solar resources in 2015 through to 2020?"
as well as on real databases:
Even on (public) data stored in S3:
Quickstart
You need to set the OPENAI_API_KEY
environment variable to your OpenAI API key,
which you can get from here.
Install the qabot
command line tool using pip/poetry:
$ poetry install qabot
Then run the qabot
command with either files or a database connection string.
Examples
Local CSV file/s
$ qabot -q "how many passengers survived by gender?" -f data/titanic.csv
🦆 Loading data from files...
Loading data/titanic.csv into table titanic...
Query: how many passengers survived by gender?
Result:
There were 233 female passengers and 109 male passengers who survived.
🚀 any further questions? [y/n] (y): y
🚀 Query: what was the largest family who did not survive?
Query: what was the largest family who did not survive?
Result:
The largest family who did not survive was the Sage family, with 8 members.
🚀 any further questions? [y/n] (y): n
Database
Install any required drivers for your database, e.g. pip install psycopg2-binary
for postgres.
For example to connect and query directly from the trains database in the relational dataset repository:
$ pip install mysqlclient
$ qabot -d mysql+mysqldb://guest:relational@relational.fit.cvut.cz:3306/trains -q "what are the unique load shapes of cars, what are the maximum number of cars per train?"
Query: what are the unique load shapes of cars, what are the maximum number of cars per train?
Result:
The unique load shapes of cars are circle, diamond, hexagon, rectangle, and triangle, and the maximum number of cars per train is 3.
Note you can also supply a database connection string via the environment variable
QABOT_DATABASE_URI
.
Limit the tables
You can limit the tables that are queried by passing the -t
flag - this will save your tokens!
For example, to only query the cars
table:
$ export QABOT_DATABASE_URI=mysql+mysqldb://guest:relational@relational.fit.cvut.cz:3306/trains
$ qabot -q "what are the unique load shapes of cars?" -t cars
The unique load shapes of cars are circle, hexagon, triangle, rectangle, and diamond.
Features
Intermediate steps and database queries
Use the -v
flag to see the intermediate steps and database queries:
$ qabot -d mysql+mysqldb://guest:relational@relational.fit.cvut.cz:3306/trains -q "what are the unique load shapes of cars, what are the maximum number of cars per train?" -v
Query: what are the unique load shapes of cars, what are the maximum number of cars per train?
Intermediate Steps:
Step 1
list_tables_sql_db(
)
Output:
trains, cars
Step 2
schema_sql_db(
trains, cars
)
Output:
CREATE TABLE trains (
id INTEGER(11) NOT NULL,
direction VARCHAR(4),
PRIMARY KEY (id)
)ENGINE=InnoDB DEFAULT CHARSET=latin1
SELECT * FROM 'trains' LIMIT 3;
id direction
1 east
2 east
3 east
CREATE TABLE cars (
id INTEGER(11) NOT NULL,
train_id INTEGER(11),
`position` INTEGER(11),
shape VARCHAR(255),
len VARCHAR(255),
sides VARCHAR(255),
roof VARCHAR(255),
wheels INTEGER(11),
load_shape VARCHAR(255),
load_num INTEGER(11),
PRIMARY KEY (id),
CONSTRAINT cars_ibfk_1 FOREIGN KEY(train_id) REFERENCES trains (id) ON DELETE CASCADE ON UPDATE CASCADE
)ENGINE=InnoDB DEFAULT CHARSET=latin1
SELECT * FROM 'cars' LIMIT 3;
id train_id position shape len sides roof wheels load_shape load_num
1 1 1 rectangle short not_double none 2 circle 1
2 1 2 rectangle long not_double none 3 hexagon 1
3 1 3 rectangle short not_double peaked 2 triangle 1
Step 3
query_sql_db(
SELECT load_shape, MAX(load_num) FROM cars GROUP BY load_shape
)
Output:
[('circle', 3), ('diamond', 1), ('hexagon', 1), ('rectangle', 3), ('triangle', 3)]
Result:
The unique load shapes of cars are circle, diamond, hexagon, rectangle, and triangle, and the maximum number of cars per train is 3.
Data accessed via http/s3
Use the -f <url>
flag to load data from a url, e.g. a csv file on s3:
$ qabot -f s3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv -q "how many confirmed cases are there" -v
🦆 Loading data from files...
create table jhu_csse_covid_19_timeseries_merged as select * from 's3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv';
Result:
264308334 confirmed cases
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.