Data filters
Project description
dolma
Data to feed OLMo's Appetite
Data and tools for generating and inspecting OLMo pre-training data.
Setup
Install Rust
curl https://sh.rustup.rs -sSf | sh
Install CMake
- On Mac OSX with
brew install cmake
- On Linux with
apt-get install cmake
Install OpenSSL
- On Mac OSX with
brew install openssl re2
- On Linux with
apt-get install openssl
Install Protobuf
- On Mac OSX with
brew install protobuf
- On Linux with
apt-get install protobuf-compiler
Setting up Python
conda create -n dolma python=3.10
Install Maturin
pip install maturin
maturin develop
Installing this repository
cd dolma
pip install -e .
Citation
If you use this repository, please cite it as:
@software{dolma,
author = {{Soldaini, Luca and Lo, Kyle and Kinney, Rodney and Naik, Aakanksha and Ravichander, Abhilasha and Bhagia, Akshita and Groeneveld, Dirk and Schwenk, Dustin and Magnusson, Ian and Chandu, Khyathi}},
license = {{Apache-2.0}},
title = {{DOLMa}},
url = {https://github.com/allenai/dolma}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dolma-0.6.0.tar.gz
(2.0 MB
view hashes)
Built Distribution
Close
Hashes for dolma-0.6.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ade2d93e3d04ef83888ab702d5fadcfaa8ae1b17b0546f0d1a683f29618f7d02 |
|
MD5 | 18b7d9354bd2aeaf5f2edf1c40d13812 |
|
BLAKE2b-256 | c9240880e811835c2941915c7f0b4f0998ee1cd196deeb19a3033d606baeb99f |