Skip to main content

Deserialize the SQaLe dataset into populated SQLite databases.

Project description

SQaLe

A Python utility for deserializing the SQaLe dataset into populated SQLite databases.

Each unique schema in the dataset is materialized as a .db file and populated with the synthetic row data stored alongside it — ready to use for SQL benchmarking, evaluation, or development.

Installation

pip install SQaLe

Usage

CLI

# Download and deserialize all schemas
sqale-extract --output ./dbs

# Limit to the first 100 unique schemas
sqale-extract --output ./dbs --limit 100

Python API

from sqale import deserialize_sqale

results = deserialize_sqale(
    file_path="trl-lab/SQaLe_2",  # HuggingFace repo ID or local path
    output_dir="./dbs",
    limit=100,  # optional
)

for r in results:
    print(r["db_path"], r["rows_per_table"])

The function returns a list of dicts with the following fields:

Field Description
schema_id Original schema ID from the dataset
db_path Absolute path to the created .db file
tables List of table names found in the DDL
rows_per_table Dict mapping table name → number of rows inserted
error Error message if materialization failed, otherwise None

Loading from a local file

results = deserialize_sqale(
    file_path="./data/train.parquet",
    output_dir="./dbs",
)

Supported local formats: .parquet, .arrow, or a directory containing either.

Requirements

  • Python ≥ 3.9
  • pandas, tqdm, pyarrow, datasets

License

See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqale-0.1.2.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqale-0.1.2-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file sqale-0.1.2.tar.gz.

File metadata

  • Download URL: sqale-0.1.2.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqale-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d42b78d92398be2010a84b08be9230c9e900e597f8c31ffa81be750da0f36add
MD5 3d6091f29ed542e2059f5d32bea0ccf4
BLAKE2b-256 9a30a8a20763732399c3bdd1f626218a3d6e0b9e01430b7773a9a8e874a7191b

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqale-0.1.2.tar.gz:

Publisher: publish.yml on trl-lab/SQaLe-Library

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqale-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: sqale-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqale-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 566b101399efa7fc3ac8aeb75595f7dbd524d9dd7474fc9f59ca22a32b67a61e
MD5 79c1ff346aff33e8aaffe41208c2b74a
BLAKE2b-256 9182cdadb638774d7a8b45580835ca944024ffe7b9d37492977633daf42de107

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqale-0.1.2-py3-none-any.whl:

Publisher: publish.yml on trl-lab/SQaLe-Library

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page