Skip to main content

A Python library for reading data in CITE EXchange (CEX) format.

Project description

cite_exchange

A Python library for parsing and working with CITE EXchange (CEX) format data.

Overview

cite_exchange provides tools for reading and processing data in CITE EXchange format, a line-oriented text format for managing structured data about scholarly resources. CEX files are organized into labeled blocks, where each block contains tabular data relevant to a specific resource type.

Installation

pip install cite_exchange

Requirements

  • Python 3.14+
  • pydantic >= 2.12.5
  • requests (for URL-based parsing)

Quick Start

from cite_exchange.blocks import CexBlock

# Parse from a string
with open('data.cex', 'r') as f:
    content = f.read()
all_blocks = CexBlock.from_text(content)

# Parse directly from a file
all_blocks = CexBlock.from_file('data.cex')

# Parse directly from a URL
all_blocks = CexBlock.from_url('https://example.com/data.cex')

# Filter by label
ctsdata_blocks = CexBlock.from_file('data.cex', label='ctsdata')

# Access block data
for block in ctsdata_blocks:
    print(f"Label: {block.label}")
    print(f"Data lines: {len(block.data)}")
    for line in block.data:
        print(f"  {line}")

API Reference

CexBlock

A Pydantic model representing a labeled block of text data from a CEX source.

Attributes

  • label (str): The label identifier for this block (without the #! prefix)
  • data (list[str]): List of data lines in this block, excluding empty lines and comments

Methods

CexBlock.from_text(src: str, label: str = None) -> list[CexBlock]

Parse CEX-formatted text and create CexBlock instances.

Parameters:

  • src (str): The CEX-formatted text to parse
  • label (str, optional): If specified, only return blocks matching this label

Returns: A list of CexBlock instances

Parsing Rules:

  • Label lines begin with #! and define the start of a new block
  • Data lines are non-empty and don't start with // (comments are ignored)
  • Multiple blocks can have the same label type
  • Empty lines and comment lines are excluded from block data

Example:

# Parse all blocks
blocks = CexBlock.from_text(cex_content)

# Parse only specific label type
catalog_blocks = CexBlock.from_text(cex_content, label='ctscatalog')

Utility Functions

labels(s: str) -> list[str]

Extract all unique labels from a CEX-formatted string.

Returns: Sorted list of unique label names

from cite_exchange.blocks import labels

with open('data.cex', 'r') as f:
    content = f.read()

label_list = labels(content)
print(label_list)  # ['citecollections', 'citedata', 'citeproperties', ...]

valid_label(label: str) -> bool

Check if a label is valid according to CEX format specification.

Valid labels:

  • cexversion
  • citelibrary
  • ctsdata
  • ctscatalog
  • citecollections
  • citeproperties
  • citedata
  • imagedata
  • datamodels
  • citerelationset
  • relationsetcatalog
from cite_exchange.blocks import valid_label

print(valid_label('ctsdata'))    # True
print(valid_label('invalid'))    # False

CEX Format Overview

CEX (CITE EXchange) is a line-oriented format for exchanging data about scholarly resources. Key features:

  • Line-oriented structure: Data organized into lines and blocks
  • Labeled blocks: Each block starts with a #!label line
  • Tabular data: Blocks contain pipe-delimited (|) or other delimited data
  • Comments: Lines starting with // are comments and ignored
  • Empty lines: Empty lines are ignored

Example CEX Content

#!ctscatalog
urn|citationScheme|groupName|workTitle
urn:cts:greekLit:tlg0012.tlg001:|book|Homer|Iliad

#!ctsdata
urn:cts:greekLit:tlg0012.tlg001:1.1|Μῆνις ἀ εἴδε θεά
urn:cts:greekLit:tlg0012.tlg001:1.2|Πηληϊάδεω Ἀχιλῆος

Testing

The package includes comprehensive unit tests covering all functionality:

python -m pytest test/test_blocks.py

Test data files are included in test/data/:

  • burneysample.cex: Sample CEX data from the Homer Multitext project
  • laxlibrary1.cex: Sample CITE collection data

License

See LICENSE file for details.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cite_exchange-0.1.0.tar.gz (32.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cite_exchange-0.1.0-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file cite_exchange-0.1.0.tar.gz.

File metadata

  • Download URL: cite_exchange-0.1.0.tar.gz
  • Upload date:
  • Size: 32.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cite_exchange-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6d4fec99d53aa3a73e9978a82ce118186d43c830ec664bdca74acbffc0502445
MD5 aebf32783aead8c97c6fa5fe06667f52
BLAKE2b-256 24db22498d29eef9a9af891fc816a35c7d7e1d9c53046586b8b9a5abe34e6610

See more details on using hashes here.

Provenance

The following attestation bundles were made for cite_exchange-0.1.0.tar.gz:

Publisher: publish.yml on neelsmith/cite_exchange

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cite_exchange-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cite_exchange-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cite_exchange-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d7f0091dfbc4daa6a112b433d3a03c58d6248e72563875fe7e6dfb53506da3a7
MD5 97e03cd52bae75f0b7fbb19ab1d9e8f5
BLAKE2b-256 19678c7279ff01347a111502d065ec3634a25b189ee26b117aea5d51c95a7e16

See more details on using hashes here.

Provenance

The following attestation bundles were made for cite_exchange-0.1.0-py3-none-any.whl:

Publisher: publish.yml on neelsmith/cite_exchange

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page