Skip to main content

Sqlite databases as Grain datasets.

Project description

🌾 SQL Grain

SQLite databases as Grain data sources.

sql-grain lets you prototype ML data pipelines using SQL queries before committing to a production data format. Define your training examples with expressive SQL—joins, window functions, filtering—and iterate quickly without preprocessing. When you're ready to scale, convert to ArrayRecord or similar formats; sql-grain is not designed for large-scale training.

from sqlgrain import Sqlite3DataSource
import grain

source = Sqlite3DataSource(
    "data.db",
    key_query="SELECT id FROM users",
    record_query="SELECT item FROM purchases WHERE user_id = :id ORDER BY timestamp",
)
dataset = grain.MapDataset.source(source).shuffle().batch(32)

Converting to ArrayRecord

Once you're ready to run larger experiments, convert the dataset to ArrayRecords using the to_array_record function which serializes records in msgpack format with native support for NumPy arrays.

from sqlgrain import to_array_record

to_array_record(source, "output/", shard_every=1000)

Grain's ArrayRecordDataSource reads the raw bytes, and we need to integrate a decoder into the data pipeline. SQL Grain's decode_record takes care of that, matching the behavior of to_array_record.

from sqlgrain import from_array_record, decode_record

ar_source, metadata = from_array_record("output/")
dataset = grain.MapDataset.source(ar_source).map(decode_record).batch(32)

to_array_record is agnostic to the type of the records, and you can also serialize datasets, e.g., for pre-batching.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql_grain-0.2.3.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sql_grain-0.2.3-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file sql_grain-0.2.3.tar.gz.

File metadata

  • Download URL: sql_grain-0.2.3.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sql_grain-0.2.3.tar.gz
Algorithm Hash digest
SHA256 a2b965b5a3d56106162ec8262397a34f44627221130282ee7baa8cf761babbf9
MD5 10e02b9453ac355c7e877a018c30fb83
BLAKE2b-256 da8857470eb68470c1778b27daa1c5d97d70122521b009fa1ad71d90c653628e

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql_grain-0.2.3.tar.gz:

Publisher: ci.yml on tillahoffmann/sql-grain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sql_grain-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: sql_grain-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sql_grain-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 48130e28fdb6e1fe564656b8a4f9eec6e8d0c4ffa0ce6477efd1364b029a9c95
MD5 37f538b2be83bc105c69526e84b8bf51
BLAKE2b-256 42624eb46786da7b9ac0656ba937a57d390906398edb98874202811b695e508f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql_grain-0.2.3-py3-none-any.whl:

Publisher: ci.yml on tillahoffmann/sql-grain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page