Skip to main content

A Python library for working with a corpus of texts canonically citable by CtsUrn.

Project description

citable_corpus

A Python library for working with a corpus of texts canonically citable by CTS URN references.

Overview

citable_corpus lets you work with texts citable by CTS URNs(Canonical Text Services URN).

Features

  • Multiple input formats: Create corpora from delimited strings, or from files or URLs with data in CEX format.
  • Retrieval based on URN logic: Querying passages by URN recognizes work and passage hierarchies, as well as passage ranges.
  • CEX support: Native support for the CEX (CITE Exchange) format
  • Type-safe: Built with Pydantic for robust data validation

Installation

pip install citable_corpus

Quick Start

Creating a Corpus

From a delimited string

from citable_corpus import CitableCorpus

text = """urn:cts:latinLit:phi0959.phi006:1.1|Lorem ipsum
urn:cts:latinLit:phi0959.phi006:1.2|Dolor sit amet."""

corpus = CitableCorpus.from_string(text)
print(f"Loaded {len(corpus.passages)} passages")

From a CEX file

corpus = CitableCorpus.from_cex_file("path/to/file.cex")

From a URL

url = "https://example.com/corpus.cex"
corpus = CitableCorpus.from_cex_url(url)

Working with Passages

Each passage in a corpus is a CitablePassage object with a URN and text:

passage = corpus.passages[0]
print(passage.urn)   # The CtsUrn object
print(passage.text)  # The text content
print(str(passage))  # "urn:...: text"

Retrieving Passages

Retrieve a single passage by exact URN

from urn_citation import CtsUrn

ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:pr.1")
results = corpus.retrieve(ref)

Retrieve all passages from a work section

# Get all passages from the preface (pr.) section
ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:pr")
results = corpus.retrieve(ref)

Retrieve all passages from a work

# Get all passages from the work (note the trailing colon)
ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:")
results = corpus.retrieve(ref)

Retrieve a range of passages

# Get passages from pr.1 through pr.5
ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:pr.1-pr.5")
results = corpus.retrieve_range(ref)

# Or use retrieve(), which automatically detects ranges
results = corpus.retrieve(ref)

API Reference

CitableCorpus

The main class for working with a corpus of citable texts.

Class Methods:

  • from_string(s: str, delimiter: str = "|") - Create from delimited text
  • from_cex_file(f: str, delimiter: str = "|") - Create from a CEX file
  • from_cex_url(url: str, delimiter: str = "|") - Create from a URL

Instance Methods:

  • retrieve(ref: CtsUrn) - Retrieve passages matching a URN reference
  • retrieve_range(ref: CtsUrn) - Retrieve passages in a URN range
  • len() - Get the number of passages in the corpus

Attributes:

  • passages: List[CitablePassage] - The list of passages in the corpus

CitablePassage

Represents a single citable passage of text.

Class Methods:

  • from_string(src: str, delimiter: str = "|") - Create from a delimited string

Attributes:

  • urn: CtsUrn - The CTS URN identifying this passage
  • text: str - The text content of the passage

Examples

Filtering and Processing

# Find all passages containing a specific word
matches = [p for p in corpus.passages if "Zeus" in p.text]

# Get URNs of all passages
urns = [p.urn for p in corpus.passages]

# Count passages by work
from collections import Counter
works = Counter(p.urn.work for p in corpus.passages)

Working with CEX Data

The library supports the CEX (CITE Exchange) format, commonly used in digital classics:

# Load a CEX file with Hyginus fables
corpus = CitableCorpus.from_cex_file("hyginus.cex")

# Retrieve text content of a specific passage
ref = CtsUrn.from_string("urn:cts:latinLit:stoa1263.stoa001.hc:1pr.1")
psg = corpus.retrieve(ref)[0]
print(psg.text)

Requirements

  • Python >= 3.13.7
  • pydantic
  • urn-citation >= 0.7.3
  • cite-exchange

Development

Running Tests

python -m unittest discover tests

or with uv from the project root:

uv run pytest

License

See the LICENSE file for details.

Related Projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

citable_corpus-0.3.1.tar.gz (188.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

citable_corpus-0.3.1-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file citable_corpus-0.3.1.tar.gz.

File metadata

  • Download URL: citable_corpus-0.3.1.tar.gz
  • Upload date:
  • Size: 188.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for citable_corpus-0.3.1.tar.gz
Algorithm Hash digest
SHA256 f20bf3e68b8810d24a47b1d0d2050a12aa54415424e0bca4c6be041944b27393
MD5 04785219217181eb31572dc4a2e40ff2
BLAKE2b-256 b1612d44598d762e734618e4092f4d9753c724fe0521db5dfa376a71cc93761e

See more details on using hashes here.

Provenance

The following attestation bundles were made for citable_corpus-0.3.1.tar.gz:

Publisher: publish.yml on neelsmith/citable_corpus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file citable_corpus-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: citable_corpus-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for citable_corpus-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0330ae770a59119e0eb0930fd0e8d7a28deeadfe7c64b5e7b64591cbf15b33b9
MD5 ec34bb3587294c03aafa18264feacdbc
BLAKE2b-256 9d784895998aa40e0ddd31d63693f590d84389a25619bc7069e1624175883729

See more details on using hashes here.

Provenance

The following attestation bundles were made for citable_corpus-0.3.1-py3-none-any.whl:

Publisher: publish.yml on neelsmith/citable_corpus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page