Skip to main content

Parse, store, validate, and emit Crystallographic Information Files (CIF)

Project description

cifflow

Parse, store, validate, and emit Crystallographic Information Files (CIF).

Python ≥ 3.10 · Apache 2.0 · v0.1.3 · PyPI


What it does

  • Parses CIF 1.1 and CIF 2.0 files, including all string types (triple-quoted, multiline text fields, embedded quotes) and save frames
  • Loads DDLm dictionaries with full _import.get resolution, producing a typed schema
  • Focused on multi-block powder CIF files
  • Ingests parsed CIF data into DuckDB using the dictionary-derived schema: one table per category, foreign keys enforced, unknown tags routed to a fallback tier
  • Emits valid CIF from a populated database in four modes: ORIGINAL, GROUPED, ONE_BLOCK, ALL_BLOCKS
  • Trusts the user — if you pass in multiple blocks, the program assumes they all belong together and, failing key value clashes, can be interpreted as a single database/experiment
  • Constructs CifFile objects programmatically from Python values (CifWriter), and performs arbitrary edits: add/remove/rename tags, loops, blocks, and save frames
  • Removes common parse-time artefacts automatically (clean): orphan error tags, duplicate blocks/save frames/tags, loop padding; for anything beyond these automatic fixes, use CifWriter
  • Visualises a schema as a Graphviz DOT string or a self-contained interactive HTML file
  • Returns data as Apache Arrow RecordBatch objects directly from the Rust parser (build_arrow, build_arrow_file)

Key properties

Error-tolerant. The parser never raises on malformed input. Every structural problem produces an explicit error event; parsing continues and all recoverable data is preserved.

No silent data loss. Duplicate tag values are preserved. Tags not mapped by the dictionary go to a fallback table, not a discard pile.

Round-trip fidelity. For well-formed input, emitted CIF re-parses to the same data. All values are stored and emitted as raw strings; ValueType provenance (placeholder . and ? vs quoted equivalents) is preserved throughout.

Canonical caseless names. Block names, save frame names, and tag names are stored in Unicode canonical caseless form (NFC(casefold(NFD(x)))). Lookups are automatically casefolded: cif["ABC"] finds a block stored as "abc".

Streaming parser. The parser is event-driven. CIF source is consumed in a single pass; the IR accumulates events incrementally. The Rust extension provides high-throughput Arrow output without any Python file objects.


Installation

pip install cifflow

duckdb and pyarrow are installed automatically.

To install from source (includes the Rust extension):

git clone https://github.com/rowlesmr/cifflow.git
cd cifflow
pip install -e ".[dev]"
maturin develop

Quick start

Parse a CIF file

from cifflow import build

text = open('structure.cif', encoding='utf-8').read()
cif, errors = build(text)   # never raises; errors is a list[ParseError]

for block_name in cif.blocks:          # block names are always lowercase
    block = cif[block_name]
    print(f'{block_name}: {len(block.tags)} tags, {len(block.loops)} loops')

The best way to resolve errors is to inspect the list of errors, edit the file accordingly, and try again. No assumptions are made about how to correct errors automatically.

Full pipeline: dictionary → DuckDB → CIF

import pathlib
from cifflow import (
    DictionaryLoader, directory_resolver,
    save_dictionary, load_dictionary,
    generate_schema,
    build, ingest, emit, EmitMode,
)
from cifflow.types import CifVersion

# 1. Load dictionary (with JSON cache to avoid re-parsing on every run)
cache = pathlib.Path('cif_pow_cache.json')
resolver = directory_resolver('data/dictionaries')
if cache.exists():
    dictionary = load_dictionary(cache)
else:
    dictionary = DictionaryLoader(resolver=resolver).load(
        open('data/dictionaries/cif_pow.dic', encoding='utf-8').read())
    save_dictionary(dictionary, cache)

# 2. Derive schema
schema = generate_schema(dictionary)

# 3. Parse CIF
cif, errors = build(open('all_the_data.cif', encoding='utf-8').read())

# 4. Ingest into an in-memory DuckDB database
#    Pass a file path string to persist: ingest(cif, 'output.db', schema=schema)
conn, warnings = ingest(cif, schema=schema)

# 5. Emit CIF
output = emit(conn, schema, mode=EmitMode.ORIGINAL, version=CifVersion.CIF_2_0)
open('output.cif', 'w', encoding='utf-8').write(output)

See example_workflow.py in the repository root for a fully annotated end-to-end demonstration covering all four emission modes, type-cast export, and fidelity checking.

The full API reference is in docs/api.md.


Architecture

Parser → Event Stream → IR → Dictionary-aware Mapping → DuckDB → Output/API
Layer Responsibility
Lexer Tokenisation, ValueType assignment
Parser Token sequence interpretation, error recovery, event emission
IR (CIFModel) Event accumulation, loop validation, multiline text transformation
Dictionary DDLm parsing, schema derivation
DuckDB Persistent storage: structured tables when a dictionary is present, fallback tier otherwise
Output Valid CIF regeneration; Python/NumPy/pandas API surface

Layer responsibilities are strictly separated. The parser does not know about the dictionary. The dictionary does not know about the IR. The output layer only reads from DuckDB.


Status

All stages are complete and tested.

Stage Feature
1–2 CIF 1.1 and 2.0 parser + IR (CIF model)
3 DDLm dictionary loading (_import.get, alias resolution, deprecation)
4 DuckDB schema generation (Set/Loop → tables, PKs, FKs, bridge columns, fallback tier)
5 DuckDB ingestion: structured tables + fallback tier; FK propagation; error recovery; canonical caseless name matching
6 CIF emission (ORIGINAL, GROUPED, ONE_BLOCK, ALL_BLOCKS); pretty-print; line-length enforcement; decimal alignment; schema visualisation; programmatic CifFile construction (CifWriter); cleaning parser artefacts (clean); type-cast export (convert_database); fidelity checking (check_fidelity); validation (validate)

Development

Run the fast test suite (excludes tests that load large real-world CIF files):

.venv/Scripts/python.exe -m pytest -m "not slow"

Run the full suite including slow tests:

.venv/Scripts/python.exe -m pytest

After modifying the Rust extension, recompile before running Python tests:

.venv/Scripts/maturin develop

License

Apache 2.0. See LICENSE.

The bundled JavaScript files (viz.js 2.1.2 and svg-pan-zoom 3.6.1) used by visualise_schema_html are MIT-licensed. Licence notices are in src/cifflow/dictionary/js/LICENSES.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cifflow-0.1.3-cp313-cp313-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.13Windows x86-64

cifflow-0.1.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

cifflow-0.1.3-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.13macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

cifflow-0.1.3-cp312-cp312-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.12Windows x86-64

cifflow-0.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

cifflow-0.1.3-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.12macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

cifflow-0.1.3-cp311-cp311-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.11Windows x86-64

cifflow-0.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

cifflow-0.1.3-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.11macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

cifflow-0.1.3-cp310-cp310-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.10Windows x86-64

cifflow-0.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

cifflow-0.1.3-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.10macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file cifflow-0.1.3-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.3-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.3-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 cfb9d27c9ef51e40546c4a8344773ef9ab60b753967cb6b136da1194ab8ed87d
MD5 39370037190fe4935d96e8c90de9ea19
BLAKE2b-256 6bc704f5124cc368d4bc11760357cc212f2465135843a76ef9332db3a771625d

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.3-cp313-cp313-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9aa48445c5a54bd9ec9ad0ea8bf8134975562005a99b9070b043e7e26b4f6960
MD5 1a9b4769577e4da76a87ae9edad46045
BLAKE2b-256 b7463c0e7ec3478af14120081c8f7c2934ac4ce82ea741301e52b6cca7be94ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.3-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.3-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 1416e6e2bd334bd3fc9a865a50e63872bca609a6ed1d88b0c7f8210b3d0ecc18
MD5 4413b0de2bec97e7269d6ef11b945728
BLAKE2b-256 0d0fe639c2f9760ebb1fcb791b483fad3bbba9a2baff9ff5aa609bdba1b50d77

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.3-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.3-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.3-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 ff95d4249b2406c866bf57c0481b9381d46069fcf9e78ebcbfe7a02f48fdb7f6
MD5 ee1e9595d42ccd21425449d3c5f85f17
BLAKE2b-256 9646cc56b9326f4dbdb5645f6e2520da5efd76830d7d2d669437da406addadd2

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.3-cp312-cp312-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 47058d2c24c62f3c04dbe834501dc2bf6f211ddf2064545ee23da246cfd6ca43
MD5 bcfb0278257a36c3aeaad2b8c8a657f1
BLAKE2b-256 16c0fba5cf215c14ad8fd15ce681a866165f59865e7df70585888637cc2bf4b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.3-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.3-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 e4301d6400d21d95b259ea6f65836ef3b96ee7169eabd580c8adb2f91e9efa5b
MD5 9738114c157782a8e6b01a3e0415756d
BLAKE2b-256 01772d9aaea83c756f7243969f2b37f632e2db237ac82cbc27728f2fd4a7b79f

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.3-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.3-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.3-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 9f43b07df8f60e3102cf3dd732ecc01bea02632b9d3d0aa28da0388fe7a67c42
MD5 9225a748f272965199afb3c392a64483
BLAKE2b-256 850ed0d0730606fa5e6a35d32a5481a04b5d138b5df21e7868b5410fc04151e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.3-cp311-cp311-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 31f433e85d9be5389ef73f5626c58529bac786a77023609fec827a8877ed3120
MD5 6da1d2a9794cb86c60567a7f934fc3e2
BLAKE2b-256 5bf616d96cf8dd4a46a61b1df5d054e2b275cd3d741f228b3ae1d02b4dba1dfd

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.3-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.3-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 ecacea326cbd82cb459c05c45e4273598c13889adfc7e7ee38c9d918a4c6a8e4
MD5 fd0b1856e2efd23adf4c4befbffc1630
BLAKE2b-256 010e984242398b27b5ea7902a66e6f909f6e02d139650646d9d9376ad56de328

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.3-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.3-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.3-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 c6866ab57472a34504e31f284021963242e31b8e1e9f03666007df656bf7acff
MD5 31ee08d056dd681f70072a0ce7406a2a
BLAKE2b-256 80c23ce2f0ca0bffe52c95c220f7195f31d39f157d37d703e8291347d5ed9abb

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.3-cp310-cp310-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2ef10491fc1d417e2b68258ebd42ceb7214a79eb09089035861f12cc7f1314d7
MD5 3b8167c885ba415806c0da6d768cf365
BLAKE2b-256 56300a799dc3672dbdb77b36ecb81549d456ca140e6800910bead566836721ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.3-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.3-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 0252820cecc5fb380fc0ca0700f1b674e1600b7840bed35ab2b118e52bdffdf2
MD5 2fb5bd04461b3bb78719a4fc181a2e85
BLAKE2b-256 0fb986c2c8a4f08a1df04a9bf780d450f741147e1d72ca175af4cec731e20838

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.3-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page