Skip to main content

Parse, store, validate, and emit Crystallographic Information Files (CIF)

Project description

cifflow

Parse, store, validate, and emit Crystallographic Information Files (CIF).

PyPI CI PyPI - Python Version License


What it does

  • Parses CIF 1.1 and CIF 2.0 files, including all string types (triple-quoted, multiline text fields, embedded quotes) and save frames
  • Loads DDLm dictionaries with full _import.get resolution, producing a typed schema
  • Focused on multi-block powder CIF files
  • Ingests parsed CIF data into DuckDB using the dictionary-derived schema: one table per category, foreign keys enforced, unknown tags routed to a fallback tier
  • Emits valid CIF from a populated database in four modes: ORIGINAL, GROUPED, ONE_BLOCK, ALL_BLOCKS
  • Trusts the user — if you pass in multiple blocks, the program assumes they all belong together and, failing key value clashes, can be interpreted as a single database/experiment
  • Constructs CifFile objects programmatically from Python values (CifWriter), and performs arbitrary edits: add/remove/rename tags, loops, blocks, and save frames
  • Removes common parse-time artefacts automatically (clean): orphan error tags, duplicate blocks/save frames/tags, loop padding; for anything beyond these automatic fixes, use CifWriter
  • Visualises a schema as a Graphviz DOT string or a self-contained interactive HTML file
  • Returns data as Apache Arrow RecordBatch objects directly from the Rust parser (build_arrow, build_arrow_file)

Key properties

Error-tolerant. The parser never raises on malformed input. Every structural problem produces an explicit error event; parsing continues and all recoverable data is preserved.

No silent data loss. Duplicate tag values are preserved. Tags not mapped by the dictionary go to a fallback table, not a discard pile.

Round-trip fidelity. For well-formed input, emitted CIF re-parses to the same data. All values are stored and emitted as raw strings; ValueType provenance (placeholder . and ? vs quoted equivalents) is preserved throughout.

Canonical caseless names. Block names, save frame names, and tag names are stored in Unicode canonical caseless form (NFC(casefold(NFD(x)))). Lookups are automatically casefolded: cif["ABC"] finds a block stored as "abc".

Streaming parser. The parser is event-driven. CIF source is consumed in a single pass; the IR accumulates events incrementally. The Rust extension provides high-throughput Arrow output without any Python file objects.


Installation

pip install cifflow

duckdb and pyarrow are installed automatically.

To install from source (includes the Rust extension):

git clone https://github.com/rowlesmr/cifflow.git
cd cifflow
pip install -e ".[dev]"
maturin develop

Quick start

Parse a CIF file

from cifflow import build

text = open('structure.cif', encoding='utf-8').read()
cif, errors = build(text)   # never raises; errors is a list[ParseError]

for block_name in cif.blocks:          # block names are always lowercase
    block = cif[block_name]
    print(f'{block_name}: {len(block.tags)} tags, {len(block.loops)} loops')

The best way to resolve errors is to inspect the list of errors, edit the file accordingly, and try again. No assumptions are made about how to correct errors automatically.

Full pipeline: dictionary → DuckDB → CIF

import pathlib
from cifflow import (
    DictionaryLoader, directory_resolver,
    save_dictionary, load_dictionary,
    generate_schema,
    build, ingest, emit, EmitMode,
)
from cifflow.types import CifVersion

# 1. Load dictionary (with JSON cache to avoid re-parsing on every run)
cache = pathlib.Path('cif_pow_cache.json')
resolver = directory_resolver('data/dictionaries')
if cache.exists():
    dictionary = load_dictionary(cache)
else:
    dictionary = DictionaryLoader(resolver=resolver).load(
        open('data/dictionaries/cif_pow.dic', encoding='utf-8').read())
    save_dictionary(dictionary, cache)

# 2. Derive schema
schema = generate_schema(dictionary)

# 3. Parse CIF
cif, errors = build(open('all_the_data.cif', encoding='utf-8').read())

# 4. Ingest into an in-memory DuckDB database
#    Pass a file path string to persist: ingest(cif, 'output.db', schema=schema)
conn, warnings = ingest(cif, schema=schema)

# 5. Emit CIF
output = emit(conn, schema, mode=EmitMode.ORIGINAL, version=CifVersion.CIF_2_0)
open('output.cif', 'w', encoding='utf-8').write(output)

See example_workflow.py in the repository root for a fully annotated end-to-end demonstration covering all four emission modes, type-cast export, and fidelity checking.

The full API reference is in docs/api.md.


Architecture

Parser → Event Stream → IR → Dictionary-aware Mapping → DuckDB → Output/API
Layer Responsibility
Lexer Tokenisation, ValueType assignment
Parser Token sequence interpretation, error recovery, event emission
IR (CIFModel) Event accumulation, loop validation, multiline text transformation
Dictionary DDLm parsing, schema derivation
DuckDB Persistent storage: structured tables when a dictionary is present, fallback tier otherwise
Output Valid CIF regeneration; Python/NumPy/pandas API surface

Layer responsibilities are strictly separated. The parser does not know about the dictionary. The dictionary does not know about the IR. The output layer only reads from DuckDB.


Status

All stages are complete and tested.

Stage Feature
1–2 CIF 1.1 and 2.0 parser + IR (CIF model)
3 DDLm dictionary loading (_import.get, alias resolution, deprecation)
4 DuckDB schema generation (Set/Loop → tables, PKs, FKs, bridge columns, fallback tier)
5 DuckDB ingestion: structured tables + fallback tier; FK propagation; error recovery; canonical caseless name matching
6 CIF emission (ORIGINAL, GROUPED, ONE_BLOCK, ALL_BLOCKS); pretty-print; line-length enforcement; decimal alignment; schema visualisation; programmatic CifFile construction (CifWriter); cleaning parser artefacts (clean); type-cast export (convert_database); fidelity checking (check_fidelity); validation (validate)

Development

Run the fast test suite (excludes tests that load large real-world CIF files):

.venv/Scripts/python.exe -m pytest -m "not slow"

Run the full suite including slow tests:

.venv/Scripts/python.exe -m pytest

After modifying the Rust extension, recompile before running Python tests:

.venv/Scripts/maturin develop

License

Apache 2.0. See LICENSE.

The bundled JavaScript files (viz.js 2.1.2 and svg-pan-zoom 3.6.1) used by visualise_schema_html are MIT-licensed. Licence notices are in src/cifflow/dictionary/js/LICENSES.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cifflow-0.1.5.tar.gz (8.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cifflow-0.1.5-cp313-cp313-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.13Windows x86-64

cifflow-0.1.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

cifflow-0.1.5-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.13macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

cifflow-0.1.5-cp312-cp312-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.12Windows x86-64

cifflow-0.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

cifflow-0.1.5-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.12macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

cifflow-0.1.5-cp311-cp311-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.11Windows x86-64

cifflow-0.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

cifflow-0.1.5-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.11macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

cifflow-0.1.5-cp310-cp310-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.10Windows x86-64

cifflow-0.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

cifflow-0.1.5-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.10macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file cifflow-0.1.5.tar.gz.

File metadata

  • Download URL: cifflow-0.1.5.tar.gz
  • Upload date:
  • Size: 8.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.5.tar.gz
Algorithm Hash digest
SHA256 9c5b92beaf2935994c454b061e45a441395aa2bd64ff1a9634b5997c88666216
MD5 6ecbdf90a57b37a53f5cc394f24a3b48
BLAKE2b-256 dd592b9a8d089a89dd773d37aad4cdaf215b4c0fea1da7a0b422fc570f88db81

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.5.tar.gz:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.5-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.5-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.5-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 2ebd44c7e4475cc0f0a05832d69c9823066ee7b0ec0e8b3bfc4461a256088786
MD5 7f2a7f33220108292d4e2003fec493be
BLAKE2b-256 26eb46e706f435619c0447dfda1caebb74c794ede8379ff4da966ccc3410e328

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.5-cp313-cp313-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 340c9dffb0d957d4241a5b9e905635b394605c1c64c8c61cbfd02a7e70966258
MD5 0c08d5024454c0cebb3f6d1211756d38
BLAKE2b-256 7eddb1fb92fae6b7f09eaae75d94beda0c2dc5b6b0b298bfecbd0e86c4d92af0

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.5-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.5-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 92b0066a6514a65a25639b1adfced284cf1ffab9058a843dab646fd6369d4777
MD5 82dac41fcc5db7a5a5abd9fba26ac219
BLAKE2b-256 312c4520bf0c90ac250ab6f09224e9a6244368beddb067b89978cfb1eb63e81a

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.5-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.5-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.5-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.5-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 db56a0b0fccabfd396927729850bf2e18a32797042f0313677c0e9f4d79dd1ab
MD5 67603b26713ace8b6e26af70402922de
BLAKE2b-256 cd3363cb837574f1ec59d0124016618de7e94cc3073eb5a8ca42c208f32fcbd4

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.5-cp312-cp312-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 14dcf49332968c88b928e8ed02a9f9bc9d0754d245d552cac772de6d3c936e84
MD5 02334785066ec4c986318010d544910a
BLAKE2b-256 e1ce8c874ff92ff1fa93c416edc0e60dd2dc5f65f3d1aa7e7c0be1b2dd151fcf

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.5-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.5-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 b4536b972971a43940a9eaf673596ffeda00ec4d886e511e09d3ef4e152ec2a5
MD5 29b973149505ec71406eb93e56c4c20b
BLAKE2b-256 fe56ed6a2e2bd98f55ddb3c803f1da228b3d6d677e443448e1bc1bc7454a060a

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.5-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.5-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.5-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.5-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 7a4de149f59b4902124c2630066f14d3585eb8be7b44dbd725a66df7f299ba95
MD5 e6b65dda11dc1a2e7fd141aa821bb4d5
BLAKE2b-256 92d719770c3c77b07d1b3bd81e614123f13508a662852467897697f8f6b2812f

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.5-cp311-cp311-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8b6666eac8be9a7c44fef812152affe8c3f27a07ef34932f0096c7779e609b2d
MD5 8b7b71f50edcd90f8df8109ab82085ea
BLAKE2b-256 5e7637488612d9e9efe45dc3d0e493e146ba3c309fe5fb2921b1eee39357ec2d

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.5-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.5-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 9cdca63e9f33f2ea3bedc6da8d51baef787a7527d156c17a40934c70897c2b6d
MD5 c3961800c149d98882af7ec8de12084d
BLAKE2b-256 2c1545e43711c4058a5505c8416c0846e64275b30f8b03f950a4fffc87d9f309

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.5-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.5-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.5-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.5-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 8d3f64ce6ed95bfe9a8c8255f61224215811cab3f8521ab8b2f4baad25e000cc
MD5 99a6a0555aff2686154d537c4d24bded
BLAKE2b-256 b38da487628fe1e3cb9d4edfef76f7d29f8b75299d4d74643e5408aea52b9555

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.5-cp310-cp310-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 79e86e02efdd97e4601494acdb2d084d35d921e9bb0bd0c5faa258e542736df4
MD5 e5512c3cae338ed783196d8f7b772637
BLAKE2b-256 36321169045d614a0ff42c0fecd8443bf19041a8a20e519805f22a08176da74b

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.5-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.5-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 9ea0a89cc7f1101df9d60742047e7c586e652794f64c897d84bfd3b69f386e42
MD5 a16f3d97340cd7a3424503b208ff542d
BLAKE2b-256 7437f2fd20ea9a96c55416a237ced863f9516514696a174ad322997e0a11ff6b

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.5-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page