Skip to main content

Efficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation from the ALEA Institute.

Project description

alea-preprocess

PyPI version License: MIT Python Versions

Description

Efficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation.

This library is part of ALEA's open source large language model training pipeline, used in the research and development of the KL3M project.

Installation

Note that this project is a work-in-progress and relies on compiled Rust code. As such, it is recommended to install the package from GitHub source until a stable release is available.

You can install the latest release from PyPI using pip:

pip install alea-preprocess

You can install a development version of the package by running the following command:

poetry run maturin develop

Examples

Example use cases are currently available under the tests/ directory.

Additional documentation and examples will be provided in the future.

License

This ALEA project is released under the MIT License. See the LICENSE file for details.

Support

If you encounter any issues or have questions about using this ALEA project, please open an issue on GitHub.

Learn More

To learn more about ALEA and its software and research projects like KL3M, visit the ALEA website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alea_preprocess-0.1.11.tar.gz (82.8 kB view details)

Uploaded Source

Built Distribution

alea_preprocess-0.1.11-cp312-cp312-manylinux_2_34_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.34+ x86-64

File details

Details for the file alea_preprocess-0.1.11.tar.gz.

File metadata

  • Download URL: alea_preprocess-0.1.11.tar.gz
  • Upload date:
  • Size: 82.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.4

File hashes

Hashes for alea_preprocess-0.1.11.tar.gz
Algorithm Hash digest
SHA256 ed0fb04e2156d9c38f2f07da1a30ead36ce9a9df524079e56228d0709c173493
MD5 6daffa1229abc8c1d9cad376f0888101
BLAKE2b-256 fa183ab1998cb28da3ece9990e1304b7fe1cde5ca5fe1c5e195bf1ca5b777b75

See more details on using hashes here.

File details

Details for the file alea_preprocess-0.1.11-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for alea_preprocess-0.1.11-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 52ca9d3a1ba7c8b69e57ef64c72226bd44fa535674be1efd47a0d17e60c968e4
MD5 6c9514ae3b5b5a8de0966cbc8f98e622
BLAKE2b-256 89c8d6a7193f50ba27e2bef30e2189f1d8ef281d5ea2f332e673347fa45f438c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page