Skip to main content

Deserialize the SQaLe dataset into populated SQLite databases.

Project description

SQaLe

PyPI Downloads

A Python utility for deserializing the SQaLe dataset into populated SQLite databases.

Each unique schema in the dataset is materialized as a .db file and populated with the synthetic row data stored alongside it — ready to use for SQL benchmarking, evaluation, or development.

Installation

pip install SQaLe

Usage

CLI

# Download and deserialize all schemas
sqale-extract --output ./dbs

# Limit to the first 100 unique schemas
sqale-extract --output ./dbs --limit 100

Python API

from sqale import deserialize_sqale

results = deserialize_sqale(
    file_path="trl-lab/SQaLe_2",  # HuggingFace repo ID or local path
    output_dir="./dbs",
    limit=100,  # optional
)

for r in results:
    print(r["db_path"], r["rows_per_table"])

The function returns a list of dicts with the following fields:

Field Description
schema_id Original schema ID from the dataset
db_path Absolute path to the created .db file
tables List of table names found in the DDL
rows_per_table Dict mapping table name → number of rows inserted
error Error message if materialization failed, otherwise None

Loading from a local file

results = deserialize_sqale(
    file_path="./data/train.parquet",
    output_dir="./dbs",
)

Supported local formats: .parquet, .arrow, or a directory containing either.

Requirements

  • Python ≥ 3.9
  • pandas, tqdm, pyarrow, datasets

License

See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqale-0.1.5.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqale-0.1.5-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file sqale-0.1.5.tar.gz.

File metadata

  • Download URL: sqale-0.1.5.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sqale-0.1.5.tar.gz
Algorithm Hash digest
SHA256 c55ae16c2e053b4d4f5054f243e99cbb60d2e2c4e77de659eb9decae78850c35
MD5 9f044bc9ef7bac653337f245edb6114a
BLAKE2b-256 406c52c3eb0003a69974dda3e138468eeb2f5cf5dcca6291109b76d1fca8ef5e

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqale-0.1.5.tar.gz:

Publisher: publish.yml on trl-lab/SQaLe-Library

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqale-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: sqale-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sqale-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b2569bc66267b703af0e63640ecf7d219f9863c0e3057deac73d65e856830719
MD5 8df024da2dd049b962b5f9e93f6c9b82
BLAKE2b-256 1d825dc7ae3bffc0ac929df7180079a1e180c66180ad4ad4fc4974e0da3fb9fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqale-0.1.5-py3-none-any.whl:

Publisher: publish.yml on trl-lab/SQaLe-Library

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page