Skip to main content

Parse, store, validate, and emit Crystallographic Information Files (CIF)

Project description

cifflow

Parse, store, validate, and emit Crystallographic Information Files (CIF).

PyPI CI PyPI - Python Version License


What it does

  • Parses CIF 1.1 and CIF 2.0 files, including all string types (triple-quoted, multiline text fields, embedded quotes) and save frames
  • Loads DDLm dictionaries with full _import.get resolution, producing a typed schema
  • Focused on multi-block powder CIF files
  • Ingests parsed CIF data into DuckDB using the dictionary-derived schema: one table per category, foreign keys enforced, unknown tags routed to a fallback tier
  • Emits valid CIF from a populated database in four modes: ORIGINAL, GROUPED, ONE_BLOCK, ALL_BLOCKS
  • Trusts the user — if you pass in multiple blocks, the program assumes they all belong together and, failing key value clashes, can be interpreted as a single database/experiment
  • Constructs CifFile objects programmatically from Python values (CifWriter), and performs arbitrary edits: add/remove/rename tags, loops, blocks, and save frames
  • Removes common parse-time artefacts automatically (clean): orphan error tags, duplicate blocks/save frames/tags, loop padding; for anything beyond these automatic fixes, use CifWriter
  • Visualises a schema as a Graphviz DOT string or a self-contained interactive HTML file
  • Returns data as Apache Arrow RecordBatch objects directly from the Rust parser (build_arrow, build_arrow_file)

Key properties

Error-tolerant. The parser never raises on malformed input. Every structural problem produces an explicit error event; parsing continues and all recoverable data is preserved.

No silent data loss. Duplicate tag values are preserved. Tags not mapped by the dictionary go to a fallback table, not a discard pile.

Round-trip fidelity. For well-formed input, emitted CIF re-parses to the same data. All values are stored and emitted as raw strings; ValueType provenance (placeholder . and ? vs quoted equivalents) is preserved throughout.

Canonical caseless names. Block names, save frame names, and tag names are stored in Unicode canonical caseless form (NFC(casefold(NFD(x)))). Lookups are automatically casefolded: cif["ABC"] finds a block stored as "abc".

Streaming parser. The parser is event-driven. CIF source is consumed in a single pass; the IR accumulates events incrementally. The Rust extension provides high-throughput Arrow output without any Python file objects.


Installation

pip install cifflow

duckdb and pyarrow are installed automatically.

To install from source (includes the Rust extension):

git clone https://github.com/rowlesmr/cifflow.git
cd cifflow
pip install -e ".[dev]"
maturin develop

Quick start

Parse a CIF file

from cifflow import build

text = open('structure.cif', encoding='utf-8').read()
cif, errors = build(text)   # never raises; errors is a list[ParseError]

for block_name in cif.blocks:          # block names are always lowercase
    block = cif[block_name]
    print(f'{block_name}: {len(block.tags)} tags, {len(block.loops)} loops')

The best way to resolve errors is to inspect the list of errors, edit the file accordingly, and try again. No assumptions are made about how to correct errors automatically.

Full pipeline: dictionary → DuckDB → CIF

import pathlib
from cifflow import (
    DictionaryLoader, directory_resolver,
    save_dictionary, load_dictionary,
    generate_schema,
    build, ingest, emit, EmitMode,
)
from cifflow.types import CifVersion

# 1. Load dictionary (with JSON cache to avoid re-parsing on every run)
cache = pathlib.Path('cif_pow_cache.json')
resolver = directory_resolver('data/dictionaries')
if cache.exists():
    dictionary = load_dictionary(cache)
else:
    dictionary = DictionaryLoader(resolver=resolver).load(
        open('data/dictionaries/cif_pow.dic', encoding='utf-8').read())
    save_dictionary(dictionary, cache)

# 2. Derive schema
schema = generate_schema(dictionary)

# 3. Parse CIF
cif, errors = build(open('all_the_data.cif', encoding='utf-8').read())

# # 3.5 Edit CIF - to fix errors or alter content
# writer = CifWriter(cif.version, cif)
# for block in writer.blocks:
#    # Do alterations
# cif = writer.build() # raises ValueError if any errors present

# 4. Ingest into an in-memory DuckDB database
#    Pass a file path string to persist: ingest(cif, 'output.db', schema=schema)
conn, warnings = ingest(cif, schema=schema)

# 5. Emit CIF
output = emit(conn, schema, mode=EmitMode.ORIGINAL, version=CifVersion.CIF_2_0)
open('output.cif', 'w', encoding='utf-8').write(output)

See example_workflow.py in the repository root for a fully annotated end-to-end demonstration covering all four emission modes, type-cast export, and fidelity checking.

The full API reference is at rowlesmr.github.io/cifflow.


Architecture

Parser → Event Stream → IR → Dictionary-aware Mapping → DuckDB → Output/API
Layer Responsibility
Lexer Tokenisation, ValueType assignment
Parser Token sequence interpretation, error recovery, event emission
IR (CIFModel) Event accumulation, loop validation, multiline text transformation
Dictionary DDLm parsing, schema derivation
DuckDB Persistent storage: structured tables when a dictionary is present, fallback tier otherwise
Output Valid CIF regeneration; Python/NumPy/pandas API surface

Layer responsibilities are strictly separated. The parser does not know about the dictionary. The dictionary does not know about the IR. The output layer only reads from DuckDB.


Status

All stages are complete and tested.

Stage Feature
1–2 CIF 1.1 and 2.0 parser + IR (CIF model)
3 DDLm dictionary loading (_import.get, alias resolution, deprecation)
4 DuckDB schema generation (Set/Loop → tables, PKs, FKs, bridge columns, fallback tier)
5 DuckDB ingestion: structured tables + fallback tier; FK propagation; error recovery; canonical caseless name matching
6 CIF emission (ORIGINAL, GROUPED, ONE_BLOCK, ALL_BLOCKS); pretty-print; line-length enforcement; decimal alignment; schema visualisation; programmatic CifFile construction (CifWriter); cleaning parser artefacts (clean); type-cast export (convert_database); fidelity checking (check_fidelity); validation (validate)

Development

Run the fast test suite (excludes tests that load large real-world CIF files):

.venv/Scripts/python.exe -m pytest -m "not slow"

Run the full suite including slow tests:

.venv/Scripts/python.exe -m pytest

After modifying the Rust extension, recompile before running Python tests:

.venv/Scripts/maturin develop

License

Apache 2.0. See LICENSE.

The bundled JavaScript files (viz.js 2.1.2 and svg-pan-zoom 3.6.1) used by visualise_schema_html are MIT-licensed. Licence notices are in src/cifflow/dictionary/js/LICENSES.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cifflow-0.1.6.tar.gz (8.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cifflow-0.1.6-cp313-cp313-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.13Windows x86-64

cifflow-0.1.6-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

cifflow-0.1.6-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.13macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

cifflow-0.1.6-cp312-cp312-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.12Windows x86-64

cifflow-0.1.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

cifflow-0.1.6-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.12macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

cifflow-0.1.6-cp311-cp311-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.11Windows x86-64

cifflow-0.1.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

cifflow-0.1.6-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.11macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

cifflow-0.1.6-cp310-cp310-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.10Windows x86-64

cifflow-0.1.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

cifflow-0.1.6-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.10macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file cifflow-0.1.6.tar.gz.

File metadata

  • Download URL: cifflow-0.1.6.tar.gz
  • Upload date:
  • Size: 8.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.6.tar.gz
Algorithm Hash digest
SHA256 384e214a09015f2f8a03e6b31370ec3272dff863393b4a665e7fdfc86047c59f
MD5 28dd6109dd570bb0261d080c0846310c
BLAKE2b-256 db67850e327cb77a20f77a89d38a5724cb4ae78e7b295fe2385cab70f1b8dc5f

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.6.tar.gz:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.6-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.6-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.6-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 ecd4d53e6aef967ffa9440fa09986ba62e28b00405b0c128a3f5891eab673aca
MD5 788b58ded70c7ddb66d5a71b63d079a8
BLAKE2b-256 189127227093ce77d0b6f52560a9dc07ed1f49e568989bc11f36da9fa52b6889

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.6-cp313-cp313-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.6-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.6-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 305bd5ae1baa7e3d97f05efbf81a4531a78a876e3e6dbb19146e02437d665d2d
MD5 fc01c8438f60d10af86d377228ac19af
BLAKE2b-256 481e34b13f71566921a2b694bb5e91c4b4a549d5229f73d6b04e42154ae968ef

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.6-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.6-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.6-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 97e562942cb467bdb07a4b73cfa980f4c4c5fcb3c8fe87bc7d4a687d68591ced
MD5 74240e79c37078639dc0fc8b857277fc
BLAKE2b-256 cf9c77282f6a9990c6e503324285fbfdb3b11192de312ad04a06326572bd368e

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.6-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.6-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.6-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.6-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 b35d3c3aa42ea4edcfcf163080ad91ef2aba32760c3969981f1e55ca512f369c
MD5 9a7d432f33c73e2876c34221b777045c
BLAKE2b-256 d3ec4c155e4504bd77d3623954a52f9f83f096180f52aa9460de0d7b942d97ef

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.6-cp312-cp312-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 13c56019e0e8adba81c1aa603e7230e59447cb84c7177fab14f8fe0ddd7790c1
MD5 3b85997b788153bde9f1e1fb4cc6fe0a
BLAKE2b-256 43b0682dac9bb764da5a25ab85bcb80c0f1dfc9b6dfc462aeb7ea7f6bd131ed9

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.6-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.6-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 0a1b592a97d55838b77ecdc1d4c150606bc576183aaa62af17a506fead6bbd8f
MD5 326c88fca385bd8ae750a8862da459fc
BLAKE2b-256 7f308d8c0cac03791140f5ec53583641fa546e25be9cf9c7458b7aa93f9162f6

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.6-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.6-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.6-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.6-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 fdcf4afe20aa57eb729035bfd5277eda8e7cb6bf954914c6ee454cde93fe33fe
MD5 95ca1d6cde18d3cac4b6021d5fb650d8
BLAKE2b-256 31345ea0c3be9484914304184fdbaa1ed1570e7cd80cc5857baf968dc3c6d5ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.6-cp311-cp311-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9a5d0ee052e390985ea250088da11339d8158220b2756ec8e3bebdf05f37c925
MD5 82d6ca8ae27fc808b5ce6ff7da1c10ec
BLAKE2b-256 0f6e1d269888db760875db16f267317a602c9e59ca78d3e09ca995354b5e816a

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.6-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.6-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 78c4cd80cb734f121d9b750445e1c1b4f9d400bf9f92bf63e766e2ac559add9d
MD5 aa374d79f1ced3fc8c31d646e830a093
BLAKE2b-256 0498b811699d0a024862f6411f60a23f44e30d826e0885288626cf12d81d9a14

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.6-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.6-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.6-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.6-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 448677fc86ea1778a5231470320abb4054cd4bf04e4bd1997d3e24229c53b4c1
MD5 2235bffd01af8bb974c6cc2e9c61fa3c
BLAKE2b-256 bb987bce0d85803518088d0ac1cd61cb4785100f0dddf903bf9adfd0055d5985

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.6-cp310-cp310-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 72ebbad2be4f76d8c039cd248449e77a144780f812b095f84b7fdb3009dc12c4
MD5 bd5745d3ffe4399383c6c9de9bc66bf5
BLAKE2b-256 1517dd58fc3dea49ffa5a5dc25a9eba0af4542174f6f3e6f242eaecb93237ea3

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.6-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.6-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 7715a5ec4db329e5ce79c5bf55a4cf053098f103c9801df5e2a544873927ce74
MD5 7a320d0d38038568b9545c7e84c7e121
BLAKE2b-256 6d95c76a3aea22680474368cfcb965c1ade1ad9e971c8f64a2c95ccd527b8bfe

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.6-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page