Sample SQL datasets
Project description
datasets
This helps with the use of standard SQL datasets.
It comes with 4 datasets:
- 'extract': an extract from 2 simple datasets 'census' (from the US cenus) and 'beacon' (with japanese names and labels).
- 'financial': from https://relational.fit.cvut.cz/dataset/Financial
- 'imdb': from https://relational.fit.cvut.cz/dataset/IMDb
- 'hematitis': from https://relational.fit.cvut.cz/dataset/Hepatitis
Instalation
The package can be installed with:
pip install qrlew-datasets
The library assumes:
- either that postgresql is installed,
- or that docker is installed and can spawn postgresql containers.
Postgresql in a container
The library automatically spawns containers. There is nothing to do.
Without docker installed
Setup a psql
as in https://colab.research.google.com/github/tensorflow/io/blob/master/docs/tutorials/postgresql.ipynb
You can set the port to use: here 5433.
# Inspred by https://colab.research.google.com/github/tensorflow/io/blob/master/docs/tutorials/postgresql.ipynb#scrollTo=YUj0878jPyz7
sudo apt-get -y -qq update
sudo apt-get -y -qq install postgresql-14
# Start postgresql server
# sudo sed -i "s/#port = 5432/port = 5433/g" /etc/postgresql/14/main/postgresql.conf
sudo sed -i "s/port = 5432/port = 5433/g" /etc/postgresql/14/main/postgresql.conf
sudo service postgresql start
# Set password
sudo -u postgres psql -U postgres -c "ALTER USER postgres PASSWORD 'pyqrlew-db'"
# Install python packages
Testing the absence of docker if docker is installed:
You can simulate the absence of docker by running this code inside a container.
First run:
docker run --name test -d -i -t -v .:/datasets ubuntu:22.04
Then run:
docker exec -it test bash
Building the .sql
dumps
To build the datasets, install the requirements with:
poetry shell
You can then build the datasets with:
python -m datasets.build
You may need to install the requirements of some drivers such as: https://pypi.org/project/mysqlclient/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file qrlew_datasets-0.4.0.tar.gz
.
File metadata
- Download URL: qrlew_datasets-0.4.0.tar.gz
- Upload date:
- Size: 60.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e60ca238a3b1f2b6a017f2f6d93bf87f9a1a31f5a5196998cee8043bd7fbee6d |
|
MD5 | 81e4e963f4b36d095a7f947989d1a427 |
|
BLAKE2b-256 | f928d2f202a806c04add6c4f61aac08339f03c8a2422db7255523a3ca0810843 |
File details
Details for the file qrlew_datasets-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: qrlew_datasets-0.4.0-py3-none-any.whl
- Upload date:
- Size: 61.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e1a5c8001b8d97dc7c18c959a4e852582326364dba2d2b4261b15dcfde2833c |
|
MD5 | 60f8bbfcd272c2bdf97820a78abe65d3 |
|
BLAKE2b-256 | adc22b3503e08e6a94a1dfe6162651a42a8df05372e135efd9767840f0f813a3 |