Skip to main content

Efficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation from the ALEA Institute.

Project description

alea-preprocess

PyPI version License: MIT Python Versions

Description

Efficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation.

This library is part of ALEA's open source large language model training pipeline, used in the research and development of the KL3M project.

Installation

Note that this project is a work-in-progress and relies on compiled Rust code. As such, it is recommended to install the package from GitHub source until a stable release is available.

You can install a development version of the package by running the following command:

poetry run maturin develop

Installation via PyPI will be available once a stable release is published.

Examples

Example use cases are currently available under the tests/ directory.

Additional documentation and examples will be provided in the future.

License

This ALEA project is released under the MIT License. See the LICENSE file for details.

Support

If you encounter any issues or have questions about using this ALEA project, please open an issue on GitHub.

Learn More

To learn more about ALEA and its software and research projects like KL3M, visit the ALEA website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alea_preprocess-0.1.3.tar.gz (78.1 kB view details)

Uploaded Source

Built Distribution

alea_preprocess-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl (9.1 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.34+ x86-64

File details

Details for the file alea_preprocess-0.1.3.tar.gz.

File metadata

  • Download URL: alea_preprocess-0.1.3.tar.gz
  • Upload date:
  • Size: 78.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.0

File hashes

Hashes for alea_preprocess-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ab4d563d99084fdbd61e80dc13388005323654692b7c05e7f5ece7d842570840
MD5 88d07542af273a43bc0888b30b5d2628
BLAKE2b-256 7ffb40be1b069d7e8483dee3c4acdccec4c9d9693298a96858cd0cc881b52431

See more details on using hashes here.

File details

Details for the file alea_preprocess-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for alea_preprocess-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 18d4583532a68d415327a6aaef21b2041a4c793f62bf995824b5d2d4b746f4c6
MD5 b95e52c997f00cc3c7b089c4689eaaa7
BLAKE2b-256 9be0b1e7865986e462910e05dc2938fe64382ea812248194e25f1081e75a44e6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page