OpenPecha toolkit version 2

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

OpenPecha Toolkit V2

Toolkit V2

A Python package for working with stand-off text annotations in the OpenPecha framework, built around the Stand-off Text Annotation Model (STAM). Toolkit V2 features robust parsing, transformation, and serialization of annotated buddhist textual corpora.

Introduction
Installation
Key Concepts
Getting Started & Usage Guide
Tutorial
Serializer
API Reference
Diving Deeper
Contributing
License
Project Owners

Introduction

Toolkit V2 is the next-generation Python toolkit for managing annotated texts in the OpenPecha ecosystem. It provides:

Tools for creating, editing, and serializing annotated corpora using the STAM model.
Support for multiple annotation types (segmentation, alignment, pagination, language, etc.).
Parsers for various input formats (DOCX, OCR, Pedurma, etc.).
Serializers for exporting annotated data.

STAM (Stand-off Text Annotation Model) is a flexible data model for representing all information about a text as stand-off annotations, keeping the base text and annotations separate for maximum interoperability.

OpenPecha Backend hosted on Firebase, serves as the central storage system for texts and their corresponding annotations. While the toolkit handles parsing, editing, and serialization, all storage, access, and import operations are managed by the backend.

Installation

Stable version:

pip install openpecha

Development version:

pip install git+https://github.com/OpenPecha/toolkit-v2.git

Key Concepts

Pecha

A Pecha is the core data model representing a text corpus with its annotations and metadata. Each Pecha:

Has a unique ID (8-digit UUID)
Contains one or more base texts
Stores multiple annotation layers
Includes metadata (title, author, language, etc.)
Can be created from scratch or parsed from various formats (DOCX, OCR, etc.)

├── metadata.json
├── base/
│   ├── base1.txt
│   └── base2.txt
└── layers/
    ├── segmentation-1234.json
    ├── alignment-5678.json
    ├── pagination-9012.json
    └── footnote-3456.json

Example of a Pecha's internal structure:

├── metadata.json
│   ├── id: "P0001"
│   ├── title: {"en": "Sample Text", "bo": "དཔེ་ཚན།"}
│   ├── author: "Author Name"
│   └── language: "bo"
├── base/
│   └── base1.txt
│       └── "ཨོཾ་མ་ཎི་པདྨེ་ཧཱུྃ།..."
└── layers/
    ├── Segmentation-1234.json
    │   └── {"index": 1, "span": {"start": 0, "end": 10}, ...}
    ├── Alignment-5678.json
    │   └── {"alignment_index": "1-2", "span": {"start": 0, "end":   20}, ...}
    └── Pagination-9012.json
        └── {"page": 1, "span": {"start": 0, "end": 100}, ...}

Layer

A Layer is a collection of annotations of a specific type for a given base text. Key features:

Each layer has a specific type (e.g., Segmentation, Alignment, Pagination)
Layers are stored as JSON files in the STAM format
Common layer types include:
- Segmentation: Divides text into meaningful segments
- Alignment: Maps segments between different texts (e.g., root text and commentary)
- Pagination: Marks page boundaries
- Language: Indicates language of text segments
- Footnote: Contains footnote annotations

STAM (Stand-off Text Annotation Model)

STAM is the underlying data format for storing annotations. It:

Keeps base text and annotations separate
Uses a flexible JSON structure
Supports multiple annotation types
Enables interoperability between different systems
Allows for complex annotation relationships

Alignment Transfer

Alignment refers to mapping relationships between two or more texts. This process is crucial for creating parallel texts, which are widely used in translation, commentary analysis, and language learning. Alignments help link corresponding sections across different versions or types of texts—whether it's between a root text and its translation, a commentary, or other related materials.

Getting Started & Usage Guide

To get started and explore all features, see the Getting Started & Usage Guide.

Tutorial Guide

To see a story-driven walkthrough of parsing, annotating, and serializing a Tibetan text, with code and explanations., see the Tutorial Guide

Serializer

The JsonSerializer class provides utilities for extracting and serializing annotation data from a Pecha. Key methods include:

get_base(pecha): Returns the base text from the first base in the given Pecha.
to_dict(ann_store, ann_type): Converts an AnnotationStore to a list of annotation dictionaries for the given annotation type.
get_edition_base(pecha, edition_layer_path): Constructs a new base text by applying version variant operations (insertions/deletions) from an edition layer.
serialize(pecha, manifestation_info): Serializes a Pecha with its annotations based on manifestation information, returning base text and annotations.
serialize_edition_annotations(pecha, edition_layer_path, layer_path): Serializes annotations that are based on an edition base rather than the original base.

See the API Reference for full details and usage examples.

API Reference

For a detailed list of classes and methods, see the API Reference.

Diving Deeper

Contributing

We welcome contributions! Please open issues or pull requests. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project Owners

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

2.5.0

Oct 29, 2025

2.4.5

Sep 24, 2025

2.4.4

Sep 24, 2025

2.4.3

Sep 4, 2025

2.4.2

Sep 4, 2025

2.4.1

Sep 3, 2025

2.4.0

Aug 28, 2025

2.3.1

Aug 22, 2025

2.3.0

Aug 21, 2025

2.2.2

Aug 21, 2025

2.2.1

Aug 18, 2025

2.2.0

Aug 15, 2025

2.1.14

Aug 7, 2025

2.1.13

May 19, 2025

2.1.12

May 19, 2025

2.1.11

May 16, 2025

2.1.10

May 12, 2025

2.1.9

May 12, 2025

2.1.8

May 7, 2025

0.11.13

Jun 5, 2024

0.11.12

Jun 5, 2024

0.11.10

Feb 22, 2024

0.11.9

Dec 21, 2023

0.11.8

Oct 5, 2023

0.11.7

May 29, 2023

0.11.6

May 17, 2023

0.11.5

May 3, 2023

0.11.4

Mar 31, 2023

0.11.3

Mar 29, 2023

0.11.2

Mar 21, 2023

0.11.1

Mar 21, 2023

0.11.0

Mar 17, 2023

0.10.0

Mar 15, 2023

0.9.25

Mar 14, 2023

0.9.24

Feb 13, 2023

0.9.23

Jan 23, 2023

0.9.22

Jan 20, 2023

0.9.21

Jan 19, 2023

0.9.20

Jan 5, 2023

0.9.19

Dec 30, 2022

0.9.18

Dec 21, 2022

0.9.17

Dec 17, 2022

0.9.16

Dec 12, 2022

0.9.15

Dec 12, 2022

0.9.14

Nov 30, 2022

0.9.13

Nov 30, 2022

0.9.12

Nov 28, 2022

0.9.11

Nov 28, 2022

0.9.10

Nov 23, 2022

0.9.9

Nov 22, 2022

0.9.8

Nov 7, 2022

0.9.7

Oct 20, 2022

0.9.6

Oct 20, 2022

0.9.5

Oct 14, 2022

0.9.4

Oct 14, 2022

0.9.3

Oct 7, 2022

0.9.2

Oct 7, 2022

0.9.1

Oct 5, 2022

0.9.0

Oct 5, 2022

0.8.33

Sep 30, 2022

0.8.32

Sep 30, 2022

0.8.31

Sep 28, 2022

0.8.30

Sep 5, 2022

0.8.29

Aug 29, 2022

0.8.28

Aug 29, 2022

0.8.27

Aug 22, 2022

0.8.26

Aug 18, 2022

0.8.25

Aug 18, 2022

0.8.24

Aug 17, 2022

0.8.23

Aug 17, 2022

0.8.22

Aug 12, 2022

0.8.21

Aug 11, 2022

0.8.20

Aug 10, 2022

0.8.19

Aug 3, 2022

0.8.18

Aug 3, 2022

0.8.17

Jul 26, 2022

0.8.16

Jul 22, 2022

0.8.15

Jul 22, 2022

0.8.14

Jul 18, 2022

0.8.13

Jun 13, 2022

0.8.12

May 30, 2022

0.8.11

May 25, 2022

0.8.10

May 24, 2022

0.8.9

May 24, 2022

0.8.8

May 18, 2022

0.8.7

May 18, 2022

0.8.6

May 17, 2022

0.8.5

May 12, 2022

0.8.4

May 11, 2022

0.8.3

May 2, 2022

0.8.2

Apr 27, 2022

0.8.1

Apr 21, 2022

0.8.0

Apr 1, 2022

0.7.83

Apr 1, 2022

0.7.82

Mar 31, 2022

0.7.81

Mar 30, 2022

0.7.80

Mar 17, 2022

0.7.79

Mar 17, 2022

0.7.78

Mar 17, 2022

0.7.77

Mar 17, 2022

0.7.76

Feb 15, 2022

0.7.75

Dec 20, 2021

0.7.74

Dec 18, 2021

0.7.73

Dec 14, 2021

0.7.72

Dec 6, 2021

0.7.71

Dec 6, 2021

0.7.70

Dec 6, 2021

0.7.69

Dec 6, 2021

0.7.68

Nov 26, 2021

0.7.67

Nov 26, 2021

0.7.66

Nov 26, 2021

0.7.65

Nov 26, 2021

0.7.64

Nov 25, 2021

0.7.63

Nov 25, 2021

0.7.62

Nov 23, 2021

0.7.61

Oct 29, 2021

0.7.60

Oct 29, 2021

0.7.59

Oct 28, 2021

0.7.58

Oct 20, 2021

0.7.57

Sep 15, 2021

0.7.56

Sep 15, 2021

0.7.55

Sep 15, 2021

0.7.54

Sep 6, 2021

0.7.53

Aug 31, 2021

0.7.52

Aug 27, 2021

0.7.51

Aug 27, 2021

0.7.50

Aug 27, 2021

0.7.49

Aug 26, 2021

0.7.48

Aug 24, 2021

0.7.47

Aug 17, 2021

0.7.46

Aug 13, 2021

0.7.45

Aug 13, 2021

0.7.44

Aug 13, 2021

0.7.43

Aug 12, 2021

0.7.42

Aug 12, 2021

0.7.41

Aug 12, 2021

0.7.40

Jun 22, 2021

0.7.39

Jun 2, 2021

0.7.38

May 28, 2021

0.7.37

May 25, 2021

0.7.36

May 21, 2021

0.7.35

May 20, 2021

0.7.34

May 13, 2021

0.7.33

May 6, 2021

0.7.32

May 5, 2021

0.7.31

Apr 30, 2021

0.7.30

Apr 27, 2021

0.7.29

Apr 20, 2021

0.7.28

Apr 20, 2021

0.7.27

Apr 10, 2021

0.7.26

Apr 9, 2021

0.7.25

Apr 8, 2021

0.7.24

Apr 8, 2021

0.7.23

Apr 7, 2021

0.7.22

Mar 29, 2021

0.7.21

Mar 26, 2021

0.7.20

Mar 26, 2021

0.7.19

Mar 26, 2021

0.7.18

Mar 25, 2021

0.7.17

Mar 25, 2021

0.7.16

Mar 24, 2021

0.7.15

Mar 23, 2021

0.7.14

Mar 23, 2021

0.7.13

Mar 17, 2021

0.7.12

Mar 16, 2021

0.7.11

Mar 16, 2021

0.7.10

Mar 11, 2021

0.7.9

Mar 11, 2021

0.7.8

Mar 9, 2021

0.7.7

Mar 4, 2021

0.7.6

Mar 3, 2021

0.7.5

Mar 3, 2021

0.7.4

Mar 2, 2021

0.7.2

Mar 1, 2021

0.7.1

Mar 1, 2021

0.7.0

Mar 1, 2021

0.6.35

Mar 1, 2021

0.6.34

Feb 16, 2021

0.6.33

Feb 9, 2021

0.6.32

Feb 3, 2021

0.6.31

Jan 21, 2021

0.6.30

Jan 21, 2021

0.6.29

Jan 20, 2021

0.6.28

Jan 19, 2021

0.6.27

Jan 18, 2021

0.6.26

Jan 15, 2021

0.6.25

Jan 15, 2021

0.6.24

Jan 15, 2021

0.6.23

Jan 15, 2021

0.6.22

Jan 12, 2021

0.6.21

Dec 23, 2020

0.6.20

Nov 23, 2020

0.6.19

Nov 16, 2020

0.6.18

Oct 23, 2020

0.6.17

Oct 23, 2020

0.6.16

Oct 21, 2020

0.6.15

Oct 15, 2020

0.6.14

Oct 13, 2020

0.6.13

Oct 8, 2020

0.6.12

Oct 8, 2020

0.6.11

Aug 20, 2020

0.6.10

Aug 17, 2020

0.6.9

Aug 14, 2020

0.6.8

Aug 13, 2020

0.6.7

Aug 13, 2020

0.6.6

Aug 13, 2020

0.6.5

Aug 13, 2020

0.6.4

Aug 13, 2020

0.6.3

Aug 13, 2020

0.6.2

Aug 12, 2020

0.6.1

Aug 12, 2020

0.6.0

Aug 11, 2020

0.5.3

Jun 2, 2020

0.5.1

Apr 7, 2020

0.5.0

Apr 7, 2020

0.4.20

Mar 28, 2020

0.4.19

Mar 25, 2020

0.4.18

Mar 24, 2020

0.4.17

Mar 23, 2020

0.4.16

Mar 22, 2020

0.4.15

Mar 21, 2020

0.4.14

Mar 21, 2020

0.4.13

Mar 20, 2020

0.4.12

Mar 18, 2020

0.4.11

Mar 16, 2020

0.4.10

Mar 13, 2020

0.4.9

Mar 13, 2020

0.4.8

Mar 12, 2020

0.4.7

Mar 11, 2020

0.4.6

Feb 27, 2020

0.4.5

Feb 19, 2020

0.4.4

Feb 19, 2020

0.4.3

Feb 19, 2020

0.4.2

Feb 19, 2020

0.4.1

Jan 28, 2020

0.3.2

Dec 18, 2019

0.3.1

Dec 11, 2019

0.3.0

Dec 9, 2019

0.2.4

Nov 5, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openpecha-2.5.0.tar.gz (22.1 kB view details)

Uploaded Oct 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openpecha-2.5.0-py3-none-any.whl (21.4 kB view details)

Uploaded Oct 29, 2025 Python 3

File details

Details for the file openpecha-2.5.0.tar.gz.

File metadata

Download URL: openpecha-2.5.0.tar.gz
Upload date: Oct 29, 2025
Size: 22.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for openpecha-2.5.0.tar.gz
Algorithm	Hash digest
SHA256	`e13a2f2a4ce26c0f7378331ac0695b0d08c35f19af92f2b2f2c9bec4413494dd`
MD5	`80f859bfdec1aa8b52cd0588916c734b`
BLAKE2b-256	`e10ca7bef9dee4cb50f03fd2a55261764274755038abe9ba7bcd8afb702db922`

See more details on using hashes here.

File details

Details for the file openpecha-2.5.0-py3-none-any.whl.

File metadata

Download URL: openpecha-2.5.0-py3-none-any.whl
Upload date: Oct 29, 2025
Size: 21.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for openpecha-2.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`197518b853ffa6a8c97df10b07de4b79dc36c06d542e1f46df793ec46ced4089`
MD5	`1c376c036a614b87ab1fa3e624487b81`
BLAKE2b-256	`0a24bf345efb78df0d0c1908d3d942c2080452bbfda9d4d72fe1051bdcf70e1c`

See more details on using hashes here.

openpecha 2.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OpenPecha Toolkit V2

Toolkit V2

Table of Contents

Introduction

Installation

Key Concepts

Pecha

Layer

STAM (Stand-off Text Annotation Model)

Alignment Transfer

Getting Started & Usage Guide

Tutorial Guide

Serializer

API Reference

Diving Deeper

Contributing

License

Project Owners

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes