Skip to main content

A Python library for working with a corpus of texts canonically citable by CtsUrn.

Project description

citable_corpus

A Python library for working with a corpus of texts canonically citable by CTS URN references.

Overview

citable_corpus lets you work with texts citable by CTS URNs(Canonical Text Services URN).

Features

  • Multiple input formats: Create corpora from delimited strings, or from files or URLs with data in CEX format.
  • Retrieval based on URN logic: Querying passages by URN recognizes work and passage hierarchies, as well as passage ranges.
  • CEX support: Native support for the CEX (CITE Exchange) format
  • Type-safe: Built with Pydantic for robust data validation

Installation

pip install citable_corpus

Quick Start

Creating a Corpus

From a delimited string

from citable_corpus import CitableCorpus

text = """urn:cts:latinLit:phi0959.phi006:1.1|Lorem ipsum
urn:cts:latinLit:phi0959.phi006:1.2|Dolor sit amet."""

corpus = CitableCorpus.from_string(text)
print(f"Loaded {len(corpus.passages)} passages")

From a CEX file

corpus = CitableCorpus.from_cex_file("path/to/file.cex")

From a URL

url = "https://example.com/corpus.cex"
corpus = CitableCorpus.from_cex_url(url)

Working with Passages

Each passage in a corpus is a CitablePassage object with a URN and text:

passage = corpus.passages[0]
print(passage.urn)   # The CtsUrn object
print(passage.text)  # The text content
print(str(passage))  # "urn:...: text"

Retrieving Passages

Retrieve a single passage by exact URN

from urn_citation import CtsUrn

ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:pr.1")
results = corpus.retrieve(ref)

Retrieve all passages from a work section

# Get all passages from the preface (pr.) section
ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:pr")
results = corpus.retrieve(ref)

Retrieve all passages from a work

# Get all passages from the work (note the trailing colon)
ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:")
results = corpus.retrieve(ref)

Retrieve a range of passages

# Get passages from pr.1 through pr.5
ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:pr.1-pr.5")
results = corpus.retrieve_range(ref)

# Or use retrieve(), which automatically detects ranges
results = corpus.retrieve(ref)

API Reference

CitableCorpus

The main class for working with a corpus of citable texts.

Class Methods:

  • from_string(s: str, delimiter: str = "|") - Create from delimited text
  • from_cex_file(f: str, delimiter: str = "|") - Create from a CEX file
  • from_cex_url(url: str, delimiter: str = "|") - Create from a URL

Instance Methods:

  • retrieve(ref: CtsUrn) - Retrieve passages matching a URN reference
  • retrieve_range(ref: CtsUrn) - Retrieve passages in a URN range
  • len() - Get the number of passages in the corpus

Attributes:

  • passages: List[CitablePassage] - The list of passages in the corpus

CitablePassage

Represents a single citable passage of text.

Class Methods:

  • from_string(src: str, delimiter: str = "|") - Create from a delimited string

Attributes:

  • urn: CtsUrn - The CTS URN identifying this passage
  • text: str - The text content of the passage

Examples

Filtering and Processing

# Find all passages containing a specific word
matches = [p for p in corpus.passages if "Zeus" in p.text]

# Get URNs of all passages
urns = [p.urn for p in corpus.passages]

# Count passages by work
from collections import Counter
works = Counter(p.urn.work for p in corpus.passages)

Working with CEX Data

The library supports the CEX (CITE Exchange) format, commonly used in digital classics:

# Load a CEX file with Hyginus fables
corpus = CitableCorpus.from_cex_file("hyginus.cex")

# Retrieve text content of a specific passage
ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:1pr.1")
psg = corpus.retrieve(ref)[0]
print(psg.text)

Requirements

  • Python >= 3.14
  • pydantic
  • urn-citation >= 0.4.1
  • cite-exchange

Development

Running Tests

python -m unittest discover tests

or with uv from the project root:

uv run pytest

License

See the LICENSE file for details.

Related Projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

citable_corpus-0.3.0.tar.gz (185.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

citable_corpus-0.3.0-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file citable_corpus-0.3.0.tar.gz.

File metadata

  • Download URL: citable_corpus-0.3.0.tar.gz
  • Upload date:
  • Size: 185.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for citable_corpus-0.3.0.tar.gz
Algorithm Hash digest
SHA256 0bc1049efdbb3d4f474849d01b1424022b46e4e13708a31c37a073b06da2ac86
MD5 d4328c85b08c1bb9d65a3713fbbe361c
BLAKE2b-256 c8789133f601846556276ae4ac5fb5b34fb01c495f2c229a0566e35ea960c8ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for citable_corpus-0.3.0.tar.gz:

Publisher: publish.yml on neelsmith/citable_corpus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file citable_corpus-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: citable_corpus-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for citable_corpus-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e5f61700bf37085629090cfa43838c814fbf96c52e779f5de1e742d22371da63
MD5 48e50be5a0c4657c633e6bc62f1aa174
BLAKE2b-256 a7c425adecac3fa5591860be5a7e758966f19f364543a363ca86234fe9c09cab

See more details on using hashes here.

Provenance

The following attestation bundles were made for citable_corpus-0.3.0-py3-none-any.whl:

Publisher: publish.yml on neelsmith/citable_corpus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page