Skip to main content

Sqlite databases as Grain datasets.

Project description

🌾 SQL Grain

SQLite databases as Grain data sources.

sql-grain lets you prototype ML data pipelines using SQL queries before committing to a production data format. Define your training examples with expressive SQL—joins, window functions, filtering—and iterate quickly without preprocessing. When you're ready to scale, convert to ArrayRecord or similar formats; sql-grain is not designed for large-scale training.

from sqlgrain import Sqlite3DataSource
import grain

source = Sqlite3DataSource(
    "data.db",
    key_query="SELECT id FROM users",
    record_query="SELECT item FROM purchases WHERE user_id = :id ORDER BY timestamp",
)
dataset = grain.MapDataset.source(source).shuffle().batch(32)

Converting to ArrayRecord

Once you're ready to run larger experiments, convert the dataset to ArrayRecords using the to_array_record function which serializes records in msgpack format with native support for NumPy arrays.

from sqlgrain import to_array_record

to_array_record(source, "output/", shard_every=1000)

Grain's ArrayRecordDataSource reads the raw bytes, and we need to integrate a decoder into the data pipeline. SQL Grain's decode_record takes care of that, matching the behavior of to_array_record.

from sqlgrain import from_array_record, decode_record

ar_source, metadata = from_array_record("output/")
dataset = grain.MapDataset.source(ar_source).map(decode_record).batch(32)

to_array_record is agnostic to the type of the records, and you can also serialize datasets, e.g., for pre-batching.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql_grain-0.2.2.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sql_grain-0.2.2-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file sql_grain-0.2.2.tar.gz.

File metadata

  • Download URL: sql_grain-0.2.2.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sql_grain-0.2.2.tar.gz
Algorithm Hash digest
SHA256 34bdab449fc06ea4725d8ce1afcf40157d2616e04f6aefaa7f7ad4c38720bec6
MD5 be60ed778fa63571059747744728e5bb
BLAKE2b-256 c779a39e5bfefc197ce7d6abeb2f7ebd5241514df255d1c651e8910245502581

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql_grain-0.2.2.tar.gz:

Publisher: ci.yml on tillahoffmann/sql-grain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sql_grain-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: sql_grain-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sql_grain-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 67b91cda6fa5bb81f8794675aa9eaa6762971d644a5c44e0aa794829d6c99332
MD5 9aee6639ff5ba70c0ee169777f9b12ec
BLAKE2b-256 ff42b5dacaab4a91ba7112a60bfcf05d24a8db63f217fb7a3f3f860e6d5fdc27

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql_grain-0.2.2-py3-none-any.whl:

Publisher: ci.yml on tillahoffmann/sql-grain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page