Skip to main content

Deserialize the SQaLe dataset into populated SQLite databases.

Project description

SQaLe

A Python utility for deserializing the SQaLe dataset into populated SQLite databases.

Each unique schema in the dataset is materialized as a .db file and populated with the synthetic row data stored alongside it — ready to use for SQL benchmarking, evaluation, or development.

Installation

pip install SQaLe

Usage

CLI

# Download and deserialize all schemas
sqale-extract --output ./dbs

# Limit to the first 100 unique schemas
sqale-extract --output ./dbs --limit 100

Python API

from sqale import deserialize_sqale

results = deserialize_sqale(
    file_path="trl-lab/SQaLe_2",  # HuggingFace repo ID or local path
    output_dir="./dbs",
    limit=100,  # optional
)

for r in results:
    print(r["db_path"], r["rows_per_table"])

The function returns a list of dicts with the following fields:

Field Description
schema_id Original schema ID from the dataset
db_path Absolute path to the created .db file
tables List of table names found in the DDL
rows_per_table Dict mapping table name → number of rows inserted
error Error message if materialization failed, otherwise None

Loading from a local file

results = deserialize_sqale(
    file_path="./data/train.parquet",
    output_dir="./dbs",
)

Supported local formats: .parquet, .arrow, or a directory containing either.

Requirements

  • Python ≥ 3.9
  • pandas, tqdm, pyarrow, datasets

License

See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqale-0.1.4.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqale-0.1.4-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file sqale-0.1.4.tar.gz.

File metadata

  • Download URL: sqale-0.1.4.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqale-0.1.4.tar.gz
Algorithm Hash digest
SHA256 16a03839a936f91a07b78e62b23c19495656db34b1ec67cc189ee06cbc646ff9
MD5 086610830651d9288834ec7a913f1d2a
BLAKE2b-256 02c8b417a4d00cda876ebe0f23d4f794d9417ed444e716c86e8a2a5faea23194

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqale-0.1.4.tar.gz:

Publisher: publish.yml on trl-lab/SQaLe-Library

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqale-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: sqale-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqale-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 67008d1ef72a5dc1d916b74dec99892d8c8801f42ea50fc07ece0a75bef5ad7d
MD5 77901ecefbee473a4614f2b227d986dc
BLAKE2b-256 fcfa74467dd2527b8902dc7c6482b7b222c288bdd54eb4a727da4f860f5fdb66

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqale-0.1.4-py3-none-any.whl:

Publisher: publish.yml on trl-lab/SQaLe-Library

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page