Skip to main content

Efficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation from the ALEA Institute.

Project description

alea-preprocess

PyPI version License: MIT Python Versions

Description

Efficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation.

This library is part of ALEA's open source large language model training pipeline, used in the research and development of the KL3M project.

Installation

Note that this project is a work-in-progress and relies on compiled Rust code. As such, it is recommended to install the package from GitHub source until a stable release is available.

You can install the latest release from PyPI using pip:

pip install alea-preprocess

You can install a development version of the package by running the following command:

poetry run maturin develop

Examples

Example use cases are currently available under the tests/ directory.

Additional documentation and examples will be provided in the future.

License

This ALEA project is released under the MIT License. See the LICENSE file for details.

Support

If you encounter any issues or have questions about using this ALEA project, please open an issue on GitHub.

Learn More

To learn more about ALEA and its software and research projects like KL3M, visit the ALEA website.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alea_preprocess-0.1.8.tar.gz (82.0 kB view details)

Uploaded Source

Built Distribution

alea_preprocess-0.1.8-cp312-cp312-manylinux_2_34_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.34+ x86-64

File details

Details for the file alea_preprocess-0.1.8.tar.gz.

File metadata

  • Download URL: alea_preprocess-0.1.8.tar.gz
  • Upload date:
  • Size: 82.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.4

File hashes

Hashes for alea_preprocess-0.1.8.tar.gz
Algorithm Hash digest
SHA256 c8583f36d6a1c17810fca2788f34d017a91b1bc52c401af8712b385b2d5416f8
MD5 6fb70233727b819036a253fc0a4886ab
BLAKE2b-256 a40196982afac61128f6962125b83895eaf526cb646cce89500a80916726d5f4

See more details on using hashes here.

File details

Details for the file alea_preprocess-0.1.8-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for alea_preprocess-0.1.8-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 06f464e876b0d42b0be8ba5e46192b542d2ac49672da95b722d835073c89c45b
MD5 dd03c6306e6ace04dbe127fa00d25e23
BLAKE2b-256 6029e1e950cbee8795f20140e6eb60a93c39be02c6a7d0df77b8fefb5d8d0b3a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page