Skip to main content

A Python library for working with a corpus of texts canonically citable by CtsUrn.

Project description

citable_corpus

A Python library for working with a corpus of texts canonically citable by CTS URN references.

Overview

citable_corpus lets you work with texts citable by CTS URNs(Canonical Text Services URN).

Features

  • Multiple input formats: Create corpora from delimited strings, or from files or URLs with data in CEX format.
  • Retrieval based on URN logic: Querying passages by URN recognizes work and passage hierarchies, as well as passage ranges.
  • CEX support: Native support for the CEX (CITE Exchange) format
  • Type-safe: Built with Pydantic for robust data validation

Installation

pip install citable_corpus

Quick Start

Creating a Corpus

From a delimited string

from citable_corpus import CitableCorpus

text = """urn:cts:latinLit:phi0959.phi006:1.1|Lorem ipsum
urn:cts:latinLit:phi0959.phi006:1.2|Dolor sit amet."""

corpus = CitableCorpus.from_string(text)
print(f"Loaded {len(corpus.passages)} passages")

From a CEX file

corpus = CitableCorpus.from_cex_file("path/to/file.cex")

From a URL

url = "https://example.com/corpus.cex"
corpus = CitableCorpus.from_cex_url(url)

Working with Passages

Each passage in a corpus is a CitablePassage object with a URN and text:

passage = corpus.passages[0]
print(passage.urn)   # The CtsUrn object
print(passage.text)  # The text content
print(str(passage))  # "urn:...: text"

Retrieving Passages

Retrieve a single passage by exact URN

from urn_citation import CtsUrn

ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:pr.1")
results = corpus.retrieve(ref)

Retrieve all passages from a work section

# Get all passages from the preface (pr.) section
ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:pr")
results = corpus.retrieve(ref)

Retrieve all passages from a work

# Get all passages from the work (note the trailing colon)
ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:")
results = corpus.retrieve(ref)

Retrieve a range of passages

# Get passages from pr.1 through pr.5
ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:pr.1-pr.5")
results = corpus.retrieve_range(ref)

# Or use retrieve(), which automatically detects ranges
results = corpus.retrieve(ref)

API Reference

CitableCorpus

The main class for working with a corpus of citable texts.

Class Methods:

  • from_string(s: str, delimiter: str = "|") - Create from delimited text
  • from_cex_file(f: str, delimiter: str = "|") - Create from a CEX file
  • from_cex_url(url: str, delimiter: str = "|") - Create from a URL

Instance Methods:

  • retrieve(ref: CtsUrn) - Retrieve passages matching a URN reference
  • retrieve_range(ref: CtsUrn) - Retrieve passages in a URN range
  • len() - Get the number of passages in the corpus

Attributes:

  • passages: List[CitablePassage] - The list of passages in the corpus

CitablePassage

Represents a single citable passage of text.

Class Methods:

  • from_string(src: str, delimiter: str = "|") - Create from a delimited string

Attributes:

  • urn: CtsUrn - The CTS URN identifying this passage
  • text: str - The text content of the passage

Examples

Filtering and Processing

# Find all passages containing a specific word
matches = [p for p in corpus.passages if "Zeus" in p.text]

# Get URNs of all passages
urns = [p.urn for p in corpus.passages]

# Count passages by work
from collections import Counter
works = Counter(p.urn.work for p in corpus.passages)

Working with CEX Data

The library supports the CEX (CITE Exchange) format, commonly used in digital classics:

# Load a CEX file with Hyginus fables
corpus = CitableCorpus.from_cex_file("hyginus.cex")

# Retrieve text content of a specific passage
ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:1pr.1")
psg = corpus.retrieve(ref)[0]
print(psg.text)

Requirements

  • Python >= 3.14
  • pydantic
  • urn-citation >= 0.4.1
  • cite-exchange

Development

Running Tests

python -m unittest discover tests

or with uv from the project root:

uv run pytest

License

See the LICENSE file for details.

Related Projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

citable_corpus-0.2.0.tar.gz (184.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

citable_corpus-0.2.0-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file citable_corpus-0.2.0.tar.gz.

File metadata

  • Download URL: citable_corpus-0.2.0.tar.gz
  • Upload date:
  • Size: 184.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for citable_corpus-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9f914c93739846970284fbb20dd26b0c6750b4a8554d89f799c0e46d86a696f7
MD5 5c26b9ad5e1c6a5d6a5d7c4c93c667d1
BLAKE2b-256 c8cd1831322f11e094513ec953ea26473e06242ef78c96c02ff8bfba29d55539

See more details on using hashes here.

Provenance

The following attestation bundles were made for citable_corpus-0.2.0.tar.gz:

Publisher: publish.yml on neelsmith/citable_corpus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file citable_corpus-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: citable_corpus-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for citable_corpus-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a2ec46d004d704873c345623a0cd0b82c13e4f379c99eea9bb19e30c9cceb34b
MD5 5394af1b5635e5a63d7fa9c5377a243c
BLAKE2b-256 0359eb8fd5c60b2f60264026832dd90c9a36f76b0947378ff20f85b0ccf9e2cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for citable_corpus-0.2.0-py3-none-any.whl:

Publisher: publish.yml on neelsmith/citable_corpus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page