Skip to main content

No project description provided

Project description

costa-utils

This repo contains some personal utilities to do quick things. Currently we have utils to help visualize Hugging Face's preference and SFT datasets.

Get started

Visualizing a HF SFT dataset:

# visualizing https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture
python -m costa_utils.hf_viz \
    --sft allenai/tulu-v2-sft-mixture \
    --split train \
    --sft_messages_column_name messages
python -m costa_utils.hf_viz \
    --sft AI-MO/NuminaMath-TIR \
    --split train \
    --sft_messages_column_name messages

which is a bit easier to read than

Visualizing a HF preference dataset:

# visualizing https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized
python -m costa_utils.hf_viz \
    --preference HuggingFaceH4/ultrafeedback_binarized \
    --split train_prefs \
    --preference_chosen_column_name chosen \
    --preference_rejected_column_name rejected

which is a bit easier to read than

dev note

It's simple to debug. Just replace python -m costa_utils.hf_viz with python costa_utils/hf_viz.py

python -m costa_utils.hf_viz \
    --preference HuggingFaceH4/ultrafeedback_binarized \
    --split train_prefs \
    --preference_chosen_column_name chosen \
    --preference_rejected_column_name rejected

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

costa_utils-0.1.1.tar.gz (2.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

costa_utils-0.1.1-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file costa_utils-0.1.1.tar.gz.

File metadata

  • Download URL: costa_utils-0.1.1.tar.gz
  • Upload date:
  • Size: 2.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.1 Linux/5.15.0-60-generic

File hashes

Hashes for costa_utils-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d9adad4849e75da1cd9c4aa147d1843d918031c2da9abc5dcb1cbb324ae16afe
MD5 aad60ce5f3de57e2647a51645b7f0e0d
BLAKE2b-256 4d63bcc0017ab97b7ea91be90e8ed9b448179fa1e850a5a45bbc9cf7e1dec2bd

See more details on using hashes here.

File details

Details for the file costa_utils-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: costa_utils-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.1 Linux/5.15.0-60-generic

File hashes

Hashes for costa_utils-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 53893eb199f485782f157b9ff3b248387469a764ee15a801e528d0b54894e13a
MD5 a1a20af3e1006ae3478a95c094799ed6
BLAKE2b-256 8a3b5b2d0ac48a6adf305c2cca7224276da6a66147c77eb765bdecd55c6a3e14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page