Skip to main content

onPanda Python package

Project description

onpanda: The Companion Python Package for onPanda

Contents: Features | Install | Example Data | Quick Start | Main Modules | Iterative Correction API | Data Assumptions

▮ Features

  • Parse .panda.json into SFT and preference-pair data (build_legacy_data_v1)
  • Build token-level supervision data (build_token_level_supervision_data_v1/v2)
  • Build Find-and-Replace correction training data (build_far_correction_data_v1)
  • Verify and score Find-and-Replace outputs (FindAndReplaceVerifier)
  • Run iterative correction as a Proxy API (onpanda.server.iterative_correction_api)
  • Build panda battle data from two arena result sets (build_panda_battle)

▮ Install

pip install onpanda -U

# Or want to run demos.
git clone https://github.com/on-panda/onpanda.git
cd onpanda
pip install -e .

If you want to use tokenizers, install transformers separately.

Example Data

on-panda-example-data is the example dataset repo for this project:

git clone https://github.com/on-panda/on-panda-example-data.git ../on-panda-example-data
ls ../on-panda-example-data/panda_json/

▮ Quick Start

import onpanda

panda_path = (
    "../on-panda-example-data/panda_json/"
    "2025-08-19_how-many-1s_tokenizer-Qwen2.5.panda.json"
)
tokenizer=onpanda.utf8_tokenizer
# Use built-in utf8_tokenizer for a minimal runnable flow.
tree = onpanda.PandaTree(panda_path, tokenizer)

# 1) SFT + preference pairs
legacy = tree.build_legacy_data_v1()
print("sfts:", len(legacy["sfts"]))
print("preferences:", len(legacy["preferences"]))

# 2) Token-level supervision
token_level_v1 = tree.build_token_level_supervision_data_v1(
    tokenizer
)
print("token_level_v1:", len(token_level_v1))

# 3) Find-and-Replace correction data
adapter = onpanda.FindAndReplaceCorrectionAdapter(
    tokenizer
)
correction_data = tree.build_far_correction_data_v1(adapter)
print("correction_data:", len(correction_data))

Build from plain chat messages:

import onpanda

messages = [
    {"role": "user", "content": "5+7=?"},
    {"role": "assistant", "content": "12"},
]
panda_json = onpanda.messages_to_panda_tree(messages, uuid="demo")
# dump to xxx.panda.json

▮ Main Modules

  • onpanda/parser.py: PandaTree and data conversion entrypoints
  • onpanda/token_level_supervision_utils.py: token-level patch extraction and masks
  • onpanda/correcting_model/far_correction_utils.py: FAR data builder and apply logic
  • onpanda/correcting_model/verifier.py: FAR parser/locator/reward computation
  • onpanda/correcting_model/correcting_model.py: iterative correction workflow
  • onpanda/server/iterative_correction_api.py: Flask wrapper for correction service
  • onpanda/arena/panda_battle.py: build battle-style comparison data

▮ Iterative Correction API

Launch a proxy API server that return response using iterative_correction

python -m onpanda.server.iterative_correction_api --help

▮ Data Assumptions

  • PandaTree is a parser for qualified, annotated Panda JSON.
  • PandaTree preprocessing currently assumes:
    • Top-level field dialogs exists
    • Top-level field update_time exists
    • At least one dialog ends with an assistant message
    • If annotate.is_good is missing, latest dialog is treated as default good

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onpanda-0.1.2.tar.gz (47.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

onpanda-0.1.2-py3-none-any.whl (51.8 kB view details)

Uploaded Python 3

File details

Details for the file onpanda-0.1.2.tar.gz.

File metadata

  • Download URL: onpanda-0.1.2.tar.gz
  • Upload date:
  • Size: 47.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for onpanda-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3c96250be4a1f70c877ba6b1d3a9876146d7cf69a1adae079a9011ad0fc86f6b
MD5 a9a8a2489e78718e69175d426dae8957
BLAKE2b-256 d614ce98bf322655acc7ceb2f8e7a54970da1bd91130aad2c0ae454a781bf4d4

See more details on using hashes here.

File details

Details for the file onpanda-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: onpanda-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 51.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for onpanda-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fd9256b7e796e92a86ac7c3d1ecd98ac3f2b382ca04575789fec91c567530ca2
MD5 afe500b1d60403d880aaa3721dca5cf6
BLAKE2b-256 241804e2793f5c3a1372142f4bfa744ffa7570bf1b5c7f97029d78f7bad7e06d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page