Skip to main content

onPanda Python package

Project description

onpanda: The Companion Python Package for onPanda

Contents: Features | Install | Example Data | Quick Start | Main Modules | Iterative Correction API | Data Assumptions

▮ Features

  • Parse .panda.json into SFT and preference-pair data (build_legacy_data_v1)
  • Build token-level supervision data (build_token_level_supervision_data_v1/v2)
  • Build Find-and-Replace correction training data (build_far_correction_data_v1)
  • Verify and score Find-and-Replace outputs (FindAndReplaceVerifier)
  • Run iterative correction as a Proxy API (onpanda.server.iterative_correction_api)
  • Build panda battle data from two arena result sets (build_panda_battle)

▮ Install

pip install onpanda -U

# Or want to run demos.
git clone https://github.com/on-panda/onpanda.git
cd onpanda
pip install -e .

If you want to use tokenizers, install transformers separately.

Example Data

on-panda-example-data is the example dataset repo for this project:

git clone https://github.com/on-panda/on-panda-example-data.git ../on-panda-example-data
ls ../on-panda-example-data/panda_json/

▮ Quick Start

import onpanda

panda_path = (
    "../on-panda-example-data/panda_json/"
    "2025-08-19_how-many-1s_tokenizer-Qwen2.5.panda.json"
)
tokenizer=onpanda.unicode_tokenizer
# Use built-in unicode_tokenizer for a minimal runnable flow.
tree = onpanda.PandaTree(panda_path, tokenizer)

# 1) SFT + preference pairs
legacy = tree.build_legacy_data_v1()
print("sfts:", len(legacy["sfts"]))
print("preferences:", len(legacy["preferences"]))

# 2) Token-level supervision
token_level_v1 = tree.build_token_level_supervision_data_v1(
    tokenizer
)
print("token_level_v1:", len(token_level_v1))

# 3) Find-and-Replace correction data
adapter = onpanda.FindAndReplaceCorrectionAdapter(
    tokenizer
)
correction_data = tree.build_far_correction_data_v1(adapter)
print("correction_data:", len(correction_data))

Build from plain chat messages:

import onpanda

messages = [
    {"role": "user", "content": "5+7=?"},
    {"role": "assistant", "content": "12"},
]
panda_json = onpanda.messages_to_panda_tree(messages, uuid="demo")
# dump to xxx.panda.json

▮ Main Modules

  • onpanda/parser.py: PandaTree and data conversion entrypoints
  • onpanda/token_level_supervision_utils.py: token-level patch extraction and masks
  • onpanda/correcting_model/far_correction_utils.py: FAR data builder and apply logic
  • onpanda/correcting_model/verifier.py: FAR parser/locator/reward computation
  • onpanda/correcting_model/correcting_model.py: iterative correction workflow
  • onpanda/server/iterative_correction_api.py: Flask wrapper for correction service
  • onpanda/arena/panda_battle.py: build battle-style comparison data

▮ Iterative Correction API

Launch a proxy API server that return response using iterative_correction

python -m onpanda.server.iterative_correction_api --help

▮ Data Assumptions

  • PandaTree is a parser for qualified, annotated Panda JSON.
  • PandaTree preprocessing currently assumes:
    • Top-level field dialogs exists
    • Top-level field update_time exists
    • At least one dialog ends with an assistant message
    • If annotate.is_good is missing, latest dialog is treated as default good

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onpanda-0.1.1.tar.gz (47.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

onpanda-0.1.1-py3-none-any.whl (51.4 kB view details)

Uploaded Python 3

File details

Details for the file onpanda-0.1.1.tar.gz.

File metadata

  • Download URL: onpanda-0.1.1.tar.gz
  • Upload date:
  • Size: 47.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for onpanda-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a3bc65680a5eb9143be1ea9223601bb69f27019081ff5633db134f606caad60e
MD5 ce6d6d0546cc6a3450adc4fadd0f0bcb
BLAKE2b-256 8a128d8daf58437064442318d601dfa6e35dcf3f66f62e59ca0b5f89ce467929

See more details on using hashes here.

File details

Details for the file onpanda-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: onpanda-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 51.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for onpanda-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9cf3d5598b9f83f0ebf9706ee05424941dd06c26e6f205795f02fbd984a429f0
MD5 6801750dce2b39a40cc42759bad4ca9e
BLAKE2b-256 1129b69b82f5c5427a9b76d2f297cc1239d80201ac0d4e7272efdb043f9eadc4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page