Skip to main content

A Python library for reading data in CITE EXchange (CEX) format.

Project description

cite_exchange

A Python library for parsing and working with CITE EXchange (CEX) format data.

Overview

cite_exchange provides tools for reading and processing data in CITE EXchange format, a line-oriented text format for managing structured data about scholarly resources. CEX files are organized into labeled blocks, where each block contains tabular data relevant to a specific resource type.

Installation

pip install cite_exchange

Requirements

  • Python 3.14+
  • pydantic >= 2.12.5
  • requests (for URL-based parsing)

Quick Start

from cite_exchange.blocks import CexBlock

# Parse from a string
with open('data.cex', 'r') as f:
    content = f.read()
all_blocks = CexBlock.from_text(content)

# Parse directly from a file
all_blocks = CexBlock.from_file('data.cex')

# Parse directly from a URL
all_blocks = CexBlock.from_url('https://example.com/data.cex')

# Filter by label
ctsdata_blocks = CexBlock.from_file('data.cex', label='ctsdata')

# Access block data
for block in ctsdata_blocks:
    print(f"Label: {block.label}")
    print(f"Data lines: {len(block.data)}")
    for line in block.data:
        print(f"  {line}")

API Reference

CexBlock

A Pydantic model representing a labeled block of text data from a CEX source.

Attributes

  • label (str): The label identifier for this block (without the #! prefix)
  • data (list[str]): List of data lines in this block, excluding empty lines and comments

Methods

CexBlock.from_text(src: str, label: str = None) -> list[CexBlock]

Parse CEX-formatted text and create CexBlock instances.

Parameters:

  • src (str): The CEX-formatted text to parse
  • label (str, optional): If specified, only return blocks matching this label

Returns: A list of CexBlock instances

Parsing Rules:

  • Label lines begin with #! and define the start of a new block
  • Data lines are non-empty and don't start with // (comments are ignored)
  • Multiple blocks can have the same label type
  • Empty lines and comment lines are excluded from block data

Example:

# Parse all blocks
blocks = CexBlock.from_text(cex_content)

# Parse only specific label type
catalog_blocks = CexBlock.from_text(cex_content, label='ctscatalog')

Utility Functions

labels(s: str) -> list[str]

Extract all unique labels from a CEX-formatted string.

Returns: Sorted list of unique label names

from cite_exchange.blocks import labels

with open('data.cex', 'r') as f:
    content = f.read()

label_list = labels(content)
print(label_list)  # ['citecollections', 'citedata', 'citeproperties', ...]

valid_label(label: str) -> bool

Check if a label is valid according to CEX format specification.

Valid labels:

  • cexversion
  • citelibrary
  • ctsdata
  • ctscatalog
  • citecollections
  • citeproperties
  • citedata
  • imagedata
  • datamodels
  • citerelationset
  • relationsetcatalog
from cite_exchange.blocks import valid_label

print(valid_label('ctsdata'))    # True
print(valid_label('invalid'))    # False

CEX Format Overview

CEX (CITE EXchange) is a line-oriented format for exchanging data about scholarly resources. Key features:

  • Line-oriented structure: Data organized into lines and blocks
  • Labeled blocks: Each block starts with a #!label line
  • Tabular data: Blocks contain pipe-delimited (|) or other delimited data
  • Comments: Lines starting with // are comments and ignored
  • Empty lines: Empty lines are ignored

Example CEX Content

#!ctscatalog
urn|citationScheme|groupName|workTitle
urn:cts:greekLit:tlg0012.tlg001:|book|Homer|Iliad

#!ctsdata
urn:cts:greekLit:tlg0012.tlg001:1.1|Μῆνις ἀ εἴδε θεά
urn:cts:greekLit:tlg0012.tlg001:1.2|Πηληϊάδεω Ἀχιλῆος

Testing

The package includes comprehensive unit tests covering all functionality:

python -m pytest test/test_blocks.py

Test data files are included in test/data/:

  • burneysample.cex: Sample CEX data from the Homer Multitext project
  • laxlibrary1.cex: Sample CITE collection data

License

See LICENSE file for details.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cite_exchange-0.2.0.tar.gz (34.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cite_exchange-0.2.0-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file cite_exchange-0.2.0.tar.gz.

File metadata

  • Download URL: cite_exchange-0.2.0.tar.gz
  • Upload date:
  • Size: 34.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cite_exchange-0.2.0.tar.gz
Algorithm Hash digest
SHA256 eb2af5653f0e0385d0afd33db80fa4315d7899a7d5fd8ea78e8146e67beded43
MD5 107bb475da37910cbf47478cba3e4362
BLAKE2b-256 9ec68f495883cf5282e76e6d293621ae8edf9ee2b86c2aadb17ff39d5a66be7c

See more details on using hashes here.

Provenance

The following attestation bundles were made for cite_exchange-0.2.0.tar.gz:

Publisher: publish.yml on neelsmith/cite_exchange

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cite_exchange-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: cite_exchange-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cite_exchange-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 45fdb4ade54f7c2271009b8f6aaf59c764baeb4eb053146df98f47ee118eb594
MD5 fc00f982220753717a420e82ad0b8734
BLAKE2b-256 43d7d0df2f6cc2dcb1710bfa0a2b8890baf96f01f82093966f865d6e334da33d

See more details on using hashes here.

Provenance

The following attestation bundles were made for cite_exchange-0.2.0-py3-none-any.whl:

Publisher: publish.yml on neelsmith/cite_exchange

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page