No project description provided
Project description
costa-utils
This repo contains some personal utilities to do quick things. Currently we have utils to help visualize Hugging Face's preference and SFT datasets.
Get started
Visualizing a HF SFT dataset:
# visualizing https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture
python -m costa_utils.hf_viz \
--sft allenai/tulu-v2-sft-mixture \
--split train \
--sft_messages_column_name messages
python -m costa_utils.hf_viz \
--sft AI-MO/NuminaMath-TIR \
--split train \
--sft_messages_column_name messages
which is a bit easier to read than
Visualizing a HF preference dataset:
# visualizing https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized
python -m costa_utils.hf_viz \
--preference HuggingFaceH4/ultrafeedback_binarized \
--split train_prefs \
--preference_chosen_column_name chosen \
--preference_rejected_column_name rejected
which is a bit easier to read than
dev note
It's simple to debug. Just replace python -m costa_utils.hf_viz with python costa_utils/hf_viz.py
python -m costa_utils.hf_viz \
--preference HuggingFaceH4/ultrafeedback_binarized \
--split train_prefs \
--preference_chosen_column_name chosen \
--preference_rejected_column_name rejected
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file costa_utils-0.1.1.tar.gz.
File metadata
- Download URL: costa_utils-0.1.1.tar.gz
- Upload date:
- Size: 2.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.1 Linux/5.15.0-60-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9adad4849e75da1cd9c4aa147d1843d918031c2da9abc5dcb1cbb324ae16afe
|
|
| MD5 |
aad60ce5f3de57e2647a51645b7f0e0d
|
|
| BLAKE2b-256 |
4d63bcc0017ab97b7ea91be90e8ed9b448179fa1e850a5a45bbc9cf7e1dec2bd
|
File details
Details for the file costa_utils-0.1.1-py3-none-any.whl.
File metadata
- Download URL: costa_utils-0.1.1-py3-none-any.whl
- Upload date:
- Size: 3.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.1 Linux/5.15.0-60-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53893eb199f485782f157b9ff3b248387469a764ee15a801e528d0b54894e13a
|
|
| MD5 |
a1a20af3e1006ae3478a95c094799ed6
|
|
| BLAKE2b-256 |
8a3b5b2d0ac48a6adf305c2cca7224276da6a66147c77eb765bdecd55c6a3e14
|