Skip to main content

Efficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation from the ALEA Institute.

Project description

alea-preprocess

PyPI version License: MIT Python Versions

Description

Efficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation.

This library is part of ALEA's open source large language model training pipeline, used in the research and development of the KL3M project.

Installation

Note that this project is a work-in-progress and relies on compiled Rust code. As such, it is recommended to install the package from GitHub source until a stable release is available.

You can install the latest release from PyPI using pip:

pip install alea-preprocess

You can install a development version of the package by running the following command:

poetry run maturin develop

Examples

Example use cases are currently available under the tests/ directory.

Additional documentation and examples will be provided in the future.

License

This ALEA project is released under the MIT License. See the LICENSE file for details.

Support

If you encounter any issues or have questions about using this ALEA project, please open an issue on GitHub.

Learn More

To learn more about ALEA and its software and research projects like KL3M, visit the ALEA website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alea_preprocess-0.1.9.tar.gz (82.3 kB view details)

Uploaded Source

Built Distribution

alea_preprocess-0.1.9-cp312-cp312-manylinux_2_34_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.34+ x86-64

File details

Details for the file alea_preprocess-0.1.9.tar.gz.

File metadata

  • Download URL: alea_preprocess-0.1.9.tar.gz
  • Upload date:
  • Size: 82.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.4

File hashes

Hashes for alea_preprocess-0.1.9.tar.gz
Algorithm Hash digest
SHA256 dc10ef2e7a4fab606d59fca2ab627133034d1dde570e833d1a4e28dcfbd47808
MD5 5d7d19832f5358407d8656e32ad597e8
BLAKE2b-256 aa3a291b2f2b9cf6b1f61c728cd4b437b9b9432c82377c7383cdd5723134efe7

See more details on using hashes here.

File details

Details for the file alea_preprocess-0.1.9-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for alea_preprocess-0.1.9-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1a4d10b3b91c21b255cde27875f1c8bd5e0a74a25a93ae231fea05f7237364cf
MD5 15b3c6d93745b1ce56394dbd4297a81f
BLAKE2b-256 4e79da69085c8053f458478b9abedd55e9c7ece348ff50ab31f14e4686c0c5f5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page