No project description provided
Project description
costa-utils
This repo contains some personal utilities to do quick things. Currently we have utils to help visualize Hugging Face's preference and SFT datasets.
Get started
Visualizing a HF SFT dataset:
# visualizing https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture
python -m costa_utils.hf_viz \
--sft allenai/tulu-v2-sft-mixture \
--split train \
--sft_messages_column_name messages
python -m costa_utils.hf_viz \
--sft AI-MO/NuminaMath-TIR \
--split train \
--sft_messages_column_name messages
which is a bit easier to read than
Visualizing a HF preference dataset:
# visualizing https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized
python -m costa_utils.hf_viz \
--preference HuggingFaceH4/ultrafeedback_binarized \
--split train_prefs \
--preference_chosen_column_name chosen \
--preference_rejected_column_name rejected
which is a bit easier to read than
dev note
It's simple to debug. Just replace python -m costa_utils.hf_viz
with python costa_utils/hf_viz.py
python -m costa_utils.hf_viz \
--preference HuggingFaceH4/ultrafeedback_binarized \
--split train_prefs \
--preference_chosen_column_name chosen \
--preference_rejected_column_name rejected
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
costa_utils-0.1.1.tar.gz
(2.9 kB
view hashes)
Built Distribution
Close
Hashes for costa_utils-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53893eb199f485782f157b9ff3b248387469a764ee15a801e528d0b54894e13a |
|
MD5 | a1a20af3e1006ae3478a95c094799ed6 |
|
BLAKE2b-256 | 8a3b5b2d0ac48a6adf305c2cca7224276da6a66147c77eb765bdecd55c6a3e14 |