Skip to main content

A Python library for reading data in CITE EXchange (CEX) format.

Project description

cite_exchange

A Python library for parsing and working with CITE EXchange (CEX) format data.

Overview

cite_exchange provides tools for reading and processing data in CITE EXchange format, a line-oriented text format for managing structured data about scholarly resources. CEX files are organized into labeled blocks, where each block contains tabular data relevant to a specific resource type.

Installation

pip install cite_exchange

Requirements

  • Python 3.13+

Quick Start

from cite_exchange.blocks import CexBlock

# Parse from a string
with open('data.cex', 'r') as f:
    content = f.read()
all_blocks = CexBlock.from_text(content)

# Parse directly from a file
all_blocks = CexBlock.from_file('data.cex')

# Parse directly from a URL
all_blocks = CexBlock.from_url('https://example.com/data.cex')

# Filter by label
ctsdata_blocks = CexBlock.from_file('data.cex', label='ctsdata')

# Access block data
for block in ctsdata_blocks:
    print(f"Label: {block.label}")
    print(f"Data lines: {len(block.data)}")
    for line in block.data:
        print(f"  {line}")

API Reference

CexBlock

A dataclass representing a labeled block of text data from a CEX source.

marimo + WASM

The package is pure Python and can be used in marimo notebooks compiled to HTML/WASM.

In a marimo notebook, install from a wheel URL (or local wheel served over HTTP):

import micropip
await micropip.install("https://<your-host>/cite_exchange-0.2.0-py3-none-any.whl")

from cite_exchange import CexBlock

For browser environments, prefer CexBlock.from_text(...) with already-loaded content.

An example marimo notebook script is included at examples/marimo_wasm_notebook.py.

Run locally:

marimo edit examples/marimo_wasm_notebook.py

Then export with your installed marimo HTML/WASM export command (this can vary by version).

Attributes

  • label (str): The label identifier for this block (without the #! prefix)
  • data (list[str]): List of data lines in this block, excluding empty lines and comments

Methods

CexBlock.from_text(src: str, label: str = None) -> list[CexBlock]

Parse CEX-formatted text and create CexBlock instances.

Parameters:

  • src (str): The CEX-formatted text to parse
  • label (str, optional): If specified, only return blocks matching this label

Returns: A list of CexBlock instances

Parsing Rules:

  • Label lines begin with #! and define the start of a new block
  • Data lines are non-empty and don't start with // (comments are ignored)
  • Multiple blocks can have the same label type
  • Empty lines and comment lines are excluded from block data

Example:

# Parse all blocks
blocks = CexBlock.from_text(cex_content)

# Parse only specific label type
catalog_blocks = CexBlock.from_text(cex_content, label='ctscatalog')

Utility Functions

labels(s: str) -> list[str]

Extract all unique labels from a CEX-formatted string.

Returns: Sorted list of unique label names

from cite_exchange.blocks import labels

with open('data.cex', 'r') as f:
    content = f.read()

label_list = labels(content)
print(label_list)  # ['citecollections', 'citedata', 'citeproperties', ...]

valid_label(label: str) -> bool

Check if a label is valid according to CEX format specification.

Valid labels:

  • cexversion
  • citelibrary
  • ctsdata
  • ctscatalog
  • citecollections
  • citeproperties
  • citedata
  • imagedata
  • datamodels
  • citerelationset
  • relationsetcatalog
from cite_exchange.blocks import valid_label

print(valid_label('ctsdata'))    # True
print(valid_label('invalid'))    # False

CEX Format Overview

CEX (CITE EXchange) is a line-oriented format for exchanging data about scholarly resources. Key features:

  • Line-oriented structure: Data organized into lines and blocks
  • Labeled blocks: Each block starts with a #!label line
  • Tabular data: Blocks contain pipe-delimited (|) or other delimited data
  • Comments: Lines starting with // are comments and ignored
  • Empty lines: Empty lines are ignored

Example CEX Content

#!ctscatalog
urn|citationScheme|groupName|workTitle
urn:cts:greekLit:tlg0012.tlg001:|book|Homer|Iliad

#!ctsdata
urn:cts:greekLit:tlg0012.tlg001:1.1|Μῆνις ἀ εἴδε θεά
urn:cts:greekLit:tlg0012.tlg001:1.2|Πηληϊάδεω Ἀχιλῆος

Testing

The package includes comprehensive unit tests covering all functionality:

python -m pytest test/test_blocks.py

Test data files are included in test/data/:

  • burneysample.cex: Sample CEX data from the Homer Multitext project
  • laxlibrary1.cex: Sample CITE collection data

License

See LICENSE file for details.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cite_exchange-0.3.0.tar.gz (26.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cite_exchange-0.3.0-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file cite_exchange-0.3.0.tar.gz.

File metadata

  • Download URL: cite_exchange-0.3.0.tar.gz
  • Upload date:
  • Size: 26.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cite_exchange-0.3.0.tar.gz
Algorithm Hash digest
SHA256 7252dcc4eb80aa2d49b473c53d75bd08b345d4bb5a7cfdfdfff0e8ff16a2a51d
MD5 ee43c603538efef715c635889ccdf1ed
BLAKE2b-256 dbfadc91068e949335ebb4b3f03debf687d39b017834f02134384b2bcf6d2ac9

See more details on using hashes here.

Provenance

The following attestation bundles were made for cite_exchange-0.3.0.tar.gz:

Publisher: publish.yml on neelsmith/cite_exchange

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cite_exchange-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: cite_exchange-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cite_exchange-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2ee90234b43a977984cf58b4c50d8d8d179553cca809a2ed827e1c5ea2aab269
MD5 779eef405dabe2a38e90f70e135dddb2
BLAKE2b-256 65e7d29146d6ed44b633401f2b5b789d9afeca17680c01865e2e0d956adf05c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for cite_exchange-0.3.0-py3-none-any.whl:

Publisher: publish.yml on neelsmith/cite_exchange

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page