Skip to main content

Data filters

Project description

dolma

Data to feed OLMo's Appetite

DOLMa logo. It's a watercolor of grape leaves with the word DOLMa in the top left.

Data and tools for generating and inspecting OLMo pre-training data.

Setup

Install Rust

curl https://sh.rustup.rs -sSf | sh

Install CMake

  • On Mac OSX with brew install cmake
  • On Linux with apt-get install cmake

Install OpenSSL

  • On Mac OSX with brew install openssl re2
  • On Linux with apt-get install openssl

Install Protobuf

  • On Mac OSX with brew install protobuf
  • On Linux with apt-get install protobuf-compiler

Setting up Python

conda create -n dolma python=3.10

Install Maturin

pip install maturin
maturin develop

Installing this repository

cd dolma
pip install -e .

Citation

If you use this repository, please cite it as:

@software{dolma,
    author = {{Soldaini, Luca and Lo, Kyle and Kinney, Rodney and Naik, Aakanksha and Ravichander, Abhilasha and Bhagia, Akshita and Groeneveld, Dirk and Schwenk, Dustin and Magnusson, Ian and Chandu, Khyathi}},
    license = {{Apache-2.0}},
    title = {{DOLMa}},
    url = {https://github.com/allenai/dolma}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dolma-0.6.0.tar.gz (2.0 MB view hashes)

Uploaded Source

Built Distribution

dolma-0.6.0-cp38-cp38-macosx_11_0_arm64.whl (4.2 MB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page