A Python library for reading data in CITE EXchange (CEX) format.
Project description
cite_exchange
A Python library for parsing and working with CITE EXchange (CEX) format data.
Overview
cite_exchange provides tools for reading and processing data in CITE EXchange format, a line-oriented text format for managing structured data about scholarly resources. CEX files are organized into labeled blocks, where each block contains tabular data relevant to a specific resource type.
Installation
pip install cite_exchange
Requirements
- Python 3.14+
- pydantic >= 2.12.5
- requests (for URL-based parsing)
Quick Start
from cite_exchange.blocks import CexBlock
# Parse from a string
with open('data.cex', 'r') as f:
content = f.read()
all_blocks = CexBlock.from_text(content)
# Parse directly from a file
all_blocks = CexBlock.from_file('data.cex')
# Parse directly from a URL
all_blocks = CexBlock.from_url('https://example.com/data.cex')
# Filter by label
ctsdata_blocks = CexBlock.from_file('data.cex', label='ctsdata')
# Access block data
for block in ctsdata_blocks:
print(f"Label: {block.label}")
print(f"Data lines: {len(block.data)}")
for line in block.data:
print(f" {line}")
API Reference
CexBlock
A Pydantic model representing a labeled block of text data from a CEX source.
Attributes
label(str): The label identifier for this block (without the#!prefix)data(list[str]): List of data lines in this block, excluding empty lines and comments
Methods
CexBlock.from_text(src: str, label: str = None) -> list[CexBlock]
Parse CEX-formatted text and create CexBlock instances.
Parameters:
src(str): The CEX-formatted text to parselabel(str, optional): If specified, only return blocks matching this label
Returns: A list of CexBlock instances
Parsing Rules:
- Label lines begin with
#!and define the start of a new block - Data lines are non-empty and don't start with
//(comments are ignored) - Multiple blocks can have the same label type
- Empty lines and comment lines are excluded from block data
Example:
# Parse all blocks
blocks = CexBlock.from_text(cex_content)
# Parse only specific label type
catalog_blocks = CexBlock.from_text(cex_content, label='ctscatalog')
Utility Functions
labels(s: str) -> list[str]
Extract all unique labels from a CEX-formatted string.
Returns: Sorted list of unique label names
from cite_exchange.blocks import labels
with open('data.cex', 'r') as f:
content = f.read()
label_list = labels(content)
print(label_list) # ['citecollections', 'citedata', 'citeproperties', ...]
valid_label(label: str) -> bool
Check if a label is valid according to CEX format specification.
Valid labels:
- cexversion
- citelibrary
- ctsdata
- ctscatalog
- citecollections
- citeproperties
- citedata
- imagedata
- datamodels
- citerelationset
- relationsetcatalog
from cite_exchange.blocks import valid_label
print(valid_label('ctsdata')) # True
print(valid_label('invalid')) # False
CEX Format Overview
CEX (CITE EXchange) is a line-oriented format for exchanging data about scholarly resources. Key features:
- Line-oriented structure: Data organized into lines and blocks
- Labeled blocks: Each block starts with a
#!labelline - Tabular data: Blocks contain pipe-delimited (
|) or other delimited data - Comments: Lines starting with
//are comments and ignored - Empty lines: Empty lines are ignored
Example CEX Content
#!ctscatalog
urn|citationScheme|groupName|workTitle
urn:cts:greekLit:tlg0012.tlg001:|book|Homer|Iliad
#!ctsdata
urn:cts:greekLit:tlg0012.tlg001:1.1|Μῆνις ἀ εἴδε θεά
urn:cts:greekLit:tlg0012.tlg001:1.2|Πηληϊάδεω Ἀχιλῆος
Testing
The package includes comprehensive unit tests covering all functionality:
python -m pytest test/test_blocks.py
Test data files are included in test/data/:
burneysample.cex: Sample CEX data from the Homer Multitext projectlaxlibrary1.cex: Sample CITE collection data
License
See LICENSE file for details.
References
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cite_exchange-0.1.0.tar.gz.
File metadata
- Download URL: cite_exchange-0.1.0.tar.gz
- Upload date:
- Size: 32.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d4fec99d53aa3a73e9978a82ce118186d43c830ec664bdca74acbffc0502445
|
|
| MD5 |
aebf32783aead8c97c6fa5fe06667f52
|
|
| BLAKE2b-256 |
24db22498d29eef9a9af891fc816a35c7d7e1d9c53046586b8b9a5abe34e6610
|
Provenance
The following attestation bundles were made for cite_exchange-0.1.0.tar.gz:
Publisher:
publish.yml on neelsmith/cite_exchange
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cite_exchange-0.1.0.tar.gz -
Subject digest:
6d4fec99d53aa3a73e9978a82ce118186d43c830ec664bdca74acbffc0502445 - Sigstore transparency entry: 821266544
- Sigstore integration time:
-
Permalink:
neelsmith/cite_exchange@6db6d965fc8a472c62445d0dc6692c443a5c5a12 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/neelsmith
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6db6d965fc8a472c62445d0dc6692c443a5c5a12 -
Trigger Event:
release
-
Statement type:
File details
Details for the file cite_exchange-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cite_exchange-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7f0091dfbc4daa6a112b433d3a03c58d6248e72563875fe7e6dfb53506da3a7
|
|
| MD5 |
97e03cd52bae75f0b7fbb19ab1d9e8f5
|
|
| BLAKE2b-256 |
19678c7279ff01347a111502d065ec3634a25b189ee26b117aea5d51c95a7e16
|
Provenance
The following attestation bundles were made for cite_exchange-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on neelsmith/cite_exchange
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cite_exchange-0.1.0-py3-none-any.whl -
Subject digest:
d7f0091dfbc4daa6a112b433d3a03c58d6248e72563875fe7e6dfb53506da3a7 - Sigstore transparency entry: 821266548
- Sigstore integration time:
-
Permalink:
neelsmith/cite_exchange@6db6d965fc8a472c62445d0dc6692c443a5c5a12 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/neelsmith
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6db6d965fc8a472c62445d0dc6692c443a5c5a12 -
Trigger Event:
release
-
Statement type: