Skip to main content

Sqlite databases as Grain datasets.

Project description

🌾 SQL Grain

SQLite databases as Grain data sources.

sql-grain lets you prototype ML data pipelines using SQL queries before committing to a production data format. Define your training examples with expressive SQL—joins, window functions, filtering—and iterate quickly without preprocessing. When you're ready to scale, convert to ArrayRecord or similar formats; sql-grain is not designed for large-scale training.

from sqlgrain import Sqlite3DataSource
import grain

source = Sqlite3DataSource(
    "data.db",
    key_query="SELECT id FROM users",
    record_query="SELECT item FROM purchases WHERE user_id = :id ORDER BY timestamp",
)
dataset = grain.MapDataset.source(source).shuffle().batch(32)

Converting to ArrayRecord

Once you're ready to run larger experiments, convert the dataset to ArrayRecords using the to_array_record function which serializes records in msgpack format with native support for NumPy arrays.

from sqlgrain import to_array_record

to_array_record(source, "output/", shard_every=1000)

Grain's ArrayRecordDataSource reads the raw bytes, and we need to integrate a decoder into the data pipeline. SQL Grain's decode_record takes care of that, matching the behavior of to_array_record.

from sqlgrain import from_array_record, decode_record

ar_source, metadata = from_array_record("output/")
dataset = grain.MapDataset.source(ar_source).map(decode_record).batch(32)

to_array_record is agnostic to the type of the records, and you can also serialize datasets, e.g., for pre-batching.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sql_grain-0.2.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sql_grain-0.2.1-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file sql_grain-0.2.1.tar.gz.

File metadata

  • Download URL: sql_grain-0.2.1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sql_grain-0.2.1.tar.gz
Algorithm Hash digest
SHA256 1aa8440063eb82836415890bd98d2d4ba08e840f940b4d07d398f5acc6c29a25
MD5 0454c967d9a886f68cae6d2e35f7fd23
BLAKE2b-256 bed006b41b470a6c4efeffa03d8e59001fdb200cc44daff41864f7db295c7b49

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql_grain-0.2.1.tar.gz:

Publisher: ci.yml on tillahoffmann/sql-grain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sql_grain-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: sql_grain-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sql_grain-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 91308ce2db61654e0157348be6884859ff0774e209cd096d1b7cb8706a45968f
MD5 e91fdc5fe845a4316d7659f736cc1959
BLAKE2b-256 291147c73599824e60387907b35438d49c497f90f7cdda00c4020a20d2bb468c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sql_grain-0.2.1-py3-none-any.whl:

Publisher: ci.yml on tillahoffmann/sql-grain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page