Efficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation from the ALEA Institute.
Project description
alea-preprocess
Description
Efficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation.
This library is part of ALEA's open source large language model training pipeline, used in the research and development of the KL3M project.
Installation
Note that this project is a work-in-progress and relies on compiled Rust code. As such, it is recommended to install the package from GitHub source until a stable release is available.
You can install the latest release from PyPI using pip:
pip install alea-preprocess
You can install a development version of the package by running the following command:
poetry run maturin develop
Examples
Example use cases are currently available under the tests/
directory.
Additional documentation and examples will be provided in the future.
License
This ALEA project is released under the MIT License. See the LICENSE file for details.
Support
If you encounter any issues or have questions about using this ALEA project, please open an issue on GitHub.
Learn More
To learn more about ALEA and its software and research projects like KL3M, visit the ALEA website.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file alea_preprocess-0.1.6.tar.gz
.
File metadata
- Download URL: alea_preprocess-0.1.6.tar.gz
- Upload date:
- Size: 81.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 758021073cbdf7e41d829cbbea2cd1d3edfeaea1ce2606c9513b77a31ff9b074 |
|
MD5 | ee5edcd00be9fa1c15e5de61b58bb04f |
|
BLAKE2b-256 | a1e423cb1d8774a8ff8fd5b5406850e916382131d4f85cf8bdf37b4611aa641a |
File details
Details for the file alea_preprocess-0.1.6-cp312-cp312-manylinux_2_34_x86_64.whl
.
File metadata
- Download URL: alea_preprocess-0.1.6-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 9.2 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4d53d42fabbb985f0debf01f85392dd30faa6f22a81a50b00f191fe33f0347e |
|
MD5 | 18149ebed2496225d80a9248c5c548e1 |
|
BLAKE2b-256 | 0e249456f2a89ecf50f3db156eccb98f10d172b751c99021b3cdb564a2b70a48 |