Skip to main content

A Python library for reading data in CITE EXchange (CEX) format.

Project description

cite_exchange

A Python library for parsing and working with CITE EXchange (CEX) format data.

Overview

cite_exchange provides tools for reading and processing data in CITE EXchange format, a line-oriented text format for managing structured data about scholarly resources. CEX files are organized into labeled blocks, where each block contains tabular data relevant to a specific resource type.

Installation

pip install cite_exchange

Requirements

  • Python 3.13.7+

Quick Start

from cite_exchange.blocks import CexBlock

# Parse from a string
with open('data.cex', 'r') as f:
    content = f.read()
all_blocks = CexBlock.from_text(content)

# Parse directly from a file
all_blocks = CexBlock.from_file('data.cex')

# Parse directly from a URL
all_blocks = CexBlock.from_url('https://example.com/data.cex')

# Filter by label
ctsdata_blocks = CexBlock.from_file('data.cex', label='ctsdata')

# Access block data
for block in ctsdata_blocks:
    print(f"Label: {block.label}")
    print(f"Data lines: {len(block.data)}")
    for line in block.data:
        print(f"  {line}")

API Reference

CexBlock

A dataclass representing a labeled block of text data from a CEX source.

marimo + WASM

The package is pure Python and can be used in marimo notebooks compiled to HTML/WASM.

In a marimo notebook, install from a wheel URL (or local wheel served over HTTP):

import micropip
await micropip.install("https://<your-host>/cite_exchange-0.2.0-py3-none-any.whl")

from cite_exchange import CexBlock

For browser environments, prefer CexBlock.from_text(...) with already-loaded content.

An example marimo notebook script is included at examples/marimo_wasm_notebook.py.

Run locally:

marimo edit examples/marimo_wasm_notebook.py

Then export with your installed marimo HTML/WASM export command (this can vary by version).

Attributes

  • label (str): The label identifier for this block (without the #! prefix)
  • data (list[str]): List of data lines in this block, excluding empty lines and comments

Methods

CexBlock.from_text(src: str, label: str = None) -> list[CexBlock]

Parse CEX-formatted text and create CexBlock instances.

Parameters:

  • src (str): The CEX-formatted text to parse
  • label (str, optional): If specified, only return blocks matching this label

Returns: A list of CexBlock instances

Parsing Rules:

  • Label lines begin with #! and define the start of a new block
  • Data lines are non-empty and don't start with // (comments are ignored)
  • Multiple blocks can have the same label type
  • Empty lines and comment lines are excluded from block data

Example:

# Parse all blocks
blocks = CexBlock.from_text(cex_content)

# Parse only specific label type
catalog_blocks = CexBlock.from_text(cex_content, label='ctscatalog')

Utility Functions

labels(s: str) -> list[str]

Extract all unique labels from a CEX-formatted string.

Returns: Sorted list of unique label names

from cite_exchange.blocks import labels

with open('data.cex', 'r') as f:
    content = f.read()

label_list = labels(content)
print(label_list)  # ['citecollections', 'citedata', 'citeproperties', ...]

valid_label(label: str) -> bool

Check if a label is valid according to CEX format specification.

Valid labels:

  • cexversion
  • citelibrary
  • ctsdata
  • ctscatalog
  • citecollections
  • citeproperties
  • citedata
  • imagedata
  • datamodels
  • citerelationset
  • relationsetcatalog
from cite_exchange.blocks import valid_label

print(valid_label('ctsdata'))    # True
print(valid_label('invalid'))    # False

CEX Format Overview

CEX (CITE EXchange) is a line-oriented format for exchanging data about scholarly resources. Key features:

  • Line-oriented structure: Data organized into lines and blocks
  • Labeled blocks: Each block starts with a #!label line
  • Tabular data: Blocks contain pipe-delimited (|) or other delimited data
  • Comments: Lines starting with // are comments and ignored
  • Empty lines: Empty lines are ignored

Example CEX Content

#!ctscatalog
urn|citationScheme|groupName|workTitle
urn:cts:greekLit:tlg0012.tlg001:|book|Homer|Iliad

#!ctsdata
urn:cts:greekLit:tlg0012.tlg001:1.1|Μῆνις ἀ εἴδε θεά
urn:cts:greekLit:tlg0012.tlg001:1.2|Πηληϊάδεω Ἀχιλῆος

Testing

The package includes comprehensive unit tests covering all functionality:

python -m pytest test/test_blocks.py

Test data files are included in test/data/:

  • burneysample.cex: Sample CEX data from the Homer Multitext project
  • laxlibrary1.cex: Sample CITE collection data

License

See LICENSE file for details.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cite_exchange-0.3.2.tar.gz (25.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cite_exchange-0.3.2-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file cite_exchange-0.3.2.tar.gz.

File metadata

  • Download URL: cite_exchange-0.3.2.tar.gz
  • Upload date:
  • Size: 25.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cite_exchange-0.3.2.tar.gz
Algorithm Hash digest
SHA256 7498280e80130fb65813ec6220100502b15a673039ef21d6297fc89df13bbb2f
MD5 b621ccd60ba36614ea6bade189d55d19
BLAKE2b-256 48a4f111e38b90719afe5f416a7b893c0a0e5358d9c6257007ffa86273d0fba3

See more details on using hashes here.

Provenance

The following attestation bundles were made for cite_exchange-0.3.2.tar.gz:

Publisher: publish.yml on neelsmith/cite_exchange

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cite_exchange-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: cite_exchange-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cite_exchange-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1ebb3e8e51120125d65b70e3a583f4f60d399c34d02b7de7ba9cd79b5afe0b55
MD5 ba33e9ee7d058247a317036fee5cb26a
BLAKE2b-256 596cffde024b13d574045e5cef02bc31816d217cdfb15d0c2a0a268e2638ef45

See more details on using hashes here.

Provenance

The following attestation bundles were made for cite_exchange-0.3.2-py3-none-any.whl:

Publisher: publish.yml on neelsmith/cite_exchange

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page