Skip to main content

Parse, store, validate, and emit Crystallographic Information Files (CIF)

Project description

cifflow

Parse, store, validate, and emit Crystallographic Information Files (CIF).

Python ≥ 3.10 · Apache 2.0 · v0.1.4 · PyPI


What it does

  • Parses CIF 1.1 and CIF 2.0 files, including all string types (triple-quoted, multiline text fields, embedded quotes) and save frames
  • Loads DDLm dictionaries with full _import.get resolution, producing a typed schema
  • Focused on multi-block powder CIF files
  • Ingests parsed CIF data into DuckDB using the dictionary-derived schema: one table per category, foreign keys enforced, unknown tags routed to a fallback tier
  • Emits valid CIF from a populated database in four modes: ORIGINAL, GROUPED, ONE_BLOCK, ALL_BLOCKS
  • Trusts the user — if you pass in multiple blocks, the program assumes they all belong together and, failing key value clashes, can be interpreted as a single database/experiment
  • Constructs CifFile objects programmatically from Python values (CifWriter), and performs arbitrary edits: add/remove/rename tags, loops, blocks, and save frames
  • Removes common parse-time artefacts automatically (clean): orphan error tags, duplicate blocks/save frames/tags, loop padding; for anything beyond these automatic fixes, use CifWriter
  • Visualises a schema as a Graphviz DOT string or a self-contained interactive HTML file
  • Returns data as Apache Arrow RecordBatch objects directly from the Rust parser (build_arrow, build_arrow_file)

Key properties

Error-tolerant. The parser never raises on malformed input. Every structural problem produces an explicit error event; parsing continues and all recoverable data is preserved.

No silent data loss. Duplicate tag values are preserved. Tags not mapped by the dictionary go to a fallback table, not a discard pile.

Round-trip fidelity. For well-formed input, emitted CIF re-parses to the same data. All values are stored and emitted as raw strings; ValueType provenance (placeholder . and ? vs quoted equivalents) is preserved throughout.

Canonical caseless names. Block names, save frame names, and tag names are stored in Unicode canonical caseless form (NFC(casefold(NFD(x)))). Lookups are automatically casefolded: cif["ABC"] finds a block stored as "abc".

Streaming parser. The parser is event-driven. CIF source is consumed in a single pass; the IR accumulates events incrementally. The Rust extension provides high-throughput Arrow output without any Python file objects.


Installation

pip install cifflow

duckdb and pyarrow are installed automatically.

To install from source (includes the Rust extension):

git clone https://github.com/rowlesmr/cifflow.git
cd cifflow
pip install -e ".[dev]"
maturin develop

Quick start

Parse a CIF file

from cifflow import build

text = open('structure.cif', encoding='utf-8').read()
cif, errors = build(text)   # never raises; errors is a list[ParseError]

for block_name in cif.blocks:          # block names are always lowercase
    block = cif[block_name]
    print(f'{block_name}: {len(block.tags)} tags, {len(block.loops)} loops')

The best way to resolve errors is to inspect the list of errors, edit the file accordingly, and try again. No assumptions are made about how to correct errors automatically.

Full pipeline: dictionary → DuckDB → CIF

import pathlib
from cifflow import (
    DictionaryLoader, directory_resolver,
    save_dictionary, load_dictionary,
    generate_schema,
    build, ingest, emit, EmitMode,
)
from cifflow.types import CifVersion

# 1. Load dictionary (with JSON cache to avoid re-parsing on every run)
cache = pathlib.Path('cif_pow_cache.json')
resolver = directory_resolver('data/dictionaries')
if cache.exists():
    dictionary = load_dictionary(cache)
else:
    dictionary = DictionaryLoader(resolver=resolver).load(
        open('data/dictionaries/cif_pow.dic', encoding='utf-8').read())
    save_dictionary(dictionary, cache)

# 2. Derive schema
schema = generate_schema(dictionary)

# 3. Parse CIF
cif, errors = build(open('all_the_data.cif', encoding='utf-8').read())

# 4. Ingest into an in-memory DuckDB database
#    Pass a file path string to persist: ingest(cif, 'output.db', schema=schema)
conn, warnings = ingest(cif, schema=schema)

# 5. Emit CIF
output = emit(conn, schema, mode=EmitMode.ORIGINAL, version=CifVersion.CIF_2_0)
open('output.cif', 'w', encoding='utf-8').write(output)

See example_workflow.py in the repository root for a fully annotated end-to-end demonstration covering all four emission modes, type-cast export, and fidelity checking.

The full API reference is in docs/api.md.


Architecture

Parser → Event Stream → IR → Dictionary-aware Mapping → DuckDB → Output/API
Layer Responsibility
Lexer Tokenisation, ValueType assignment
Parser Token sequence interpretation, error recovery, event emission
IR (CIFModel) Event accumulation, loop validation, multiline text transformation
Dictionary DDLm parsing, schema derivation
DuckDB Persistent storage: structured tables when a dictionary is present, fallback tier otherwise
Output Valid CIF regeneration; Python/NumPy/pandas API surface

Layer responsibilities are strictly separated. The parser does not know about the dictionary. The dictionary does not know about the IR. The output layer only reads from DuckDB.


Status

All stages are complete and tested.

Stage Feature
1–2 CIF 1.1 and 2.0 parser + IR (CIF model)
3 DDLm dictionary loading (_import.get, alias resolution, deprecation)
4 DuckDB schema generation (Set/Loop → tables, PKs, FKs, bridge columns, fallback tier)
5 DuckDB ingestion: structured tables + fallback tier; FK propagation; error recovery; canonical caseless name matching
6 CIF emission (ORIGINAL, GROUPED, ONE_BLOCK, ALL_BLOCKS); pretty-print; line-length enforcement; decimal alignment; schema visualisation; programmatic CifFile construction (CifWriter); cleaning parser artefacts (clean); type-cast export (convert_database); fidelity checking (check_fidelity); validation (validate)

Development

Run the fast test suite (excludes tests that load large real-world CIF files):

.venv/Scripts/python.exe -m pytest -m "not slow"

Run the full suite including slow tests:

.venv/Scripts/python.exe -m pytest

After modifying the Rust extension, recompile before running Python tests:

.venv/Scripts/maturin develop

License

Apache 2.0. See LICENSE.

The bundled JavaScript files (viz.js 2.1.2 and svg-pan-zoom 3.6.1) used by visualise_schema_html are MIT-licensed. Licence notices are in src/cifflow/dictionary/js/LICENSES.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cifflow-0.1.4.tar.gz (8.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cifflow-0.1.4-cp313-cp313-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.13Windows x86-64

cifflow-0.1.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

cifflow-0.1.4-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.13macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

cifflow-0.1.4-cp312-cp312-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.12Windows x86-64

cifflow-0.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

cifflow-0.1.4-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.12macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

cifflow-0.1.4-cp311-cp311-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.11Windows x86-64

cifflow-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

cifflow-0.1.4-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.11macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

cifflow-0.1.4-cp310-cp310-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.10Windows x86-64

cifflow-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

cifflow-0.1.4-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.6 MB view details)

Uploaded CPython 3.10macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file cifflow-0.1.4.tar.gz.

File metadata

  • Download URL: cifflow-0.1.4.tar.gz
  • Upload date:
  • Size: 8.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.4.tar.gz
Algorithm Hash digest
SHA256 35bdf8e5e70442f7b5ecb42fe0a7cb45b2c74f800b25f1bf7119fb0d7b2ddc58
MD5 08d5b73923992b7a6de8ea09cd632473
BLAKE2b-256 62fb55691af81f2bfff97b48c4118a21596542b6ef4098e94916e7687fa80dfe

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.4.tar.gz:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.4-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.4-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.4-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 3a82947fa06a2b6ea9519a6aecdc6896ded9dd8212460c5a60f150fdcdd63810
MD5 d022dff93696ce7042af4ff826558404
BLAKE2b-256 000295fd5854f9a6f10cd9a9d393264265ccbecce71c393b2898c7655de0fc4c

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.4-cp313-cp313-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f29067b6e6b23d11e6a0a8d4d8b07039071e0ec2089c24771b6c9b73a37f0cec
MD5 cb4ffc027d007a00e82d2ceb106fc5fc
BLAKE2b-256 5edc6ccfd58eb4c7cb7a0b64e5742b91a6dcd7d87408584efc8a20ab3007504a

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.4-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.4-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 808a8592812fb0abdfa98272d97571215c32a1ca710e9800ac60891fef2051cb
MD5 0ef11716525ddcb2af536318bce863e3
BLAKE2b-256 7bbb43a8975edc6fce66fe84a2a0caf3127c81a35c3ce2ed379e017ba44b2102

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.4-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.4-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.4-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.4-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 8791915031882a31d9fd56539b9047f442ea76d24ab8cd1db150c2f7283bf726
MD5 3c3efbc5ba0cbb1a5d05a0db2cdc4ee9
BLAKE2b-256 1f257575168f00d41d243f0b4c6932efc2ac4da4aaba753fc81a441d9d59abc9

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.4-cp312-cp312-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a201fda7726d8011bc72c71c471ff9e54eba6b5495b7d6eebe2ba19688e456f8
MD5 f5299d69324cc6a4426e6cb85a7fdd71
BLAKE2b-256 93def7f13b79d480d678769188d5470521bcb32f034af2202dd6988fabf9104c

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.4-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.4-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 6dbcf866bd8247a70455b9923925287ab3dae91c7c513dd3a67c22904d3cc991
MD5 01c4d28b31651d0b68b8aeffb529d798
BLAKE2b-256 3b4be95974390bba2a183e42657724c69e55a22ef1e39a278e34da6fcb644105

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.4-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.4-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.4-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.4-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 3b3d30593dd5d8ad3507dfce3dd2a5ee9bfd31a8249e4e4568ddb179565ef30b
MD5 ecaa3365b4b7308d22272e8217cc55b8
BLAKE2b-256 e7cdf31f4dc4c48ea92e459eadf2e6fc5e146ad59bb66083244532c7edeed773

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.4-cp311-cp311-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 56e6c05904d310a330677287178270736800a10c2edd2dc874fa4b775d46b17d
MD5 0fa7e169d802ca7cf74242977d3fa774
BLAKE2b-256 edc66876b38aaaeb4f38d26c901fb9cb51f8227e472f8af05463b3c314c045b6

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.4-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.4-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 0b2761f4e74e685dda635b04204c884d8160e4275d58a679e6096bfcae427510
MD5 b9fc426e7fd73996f1b9ed69327ffd6a
BLAKE2b-256 54b47b526f30804049352bec09298099592721ddad47df95570c9c6b64b38717

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.4-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.4-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: cifflow-0.1.4-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cifflow-0.1.4-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 8037c3461d751944bd80ad9340008483f24deb997cd987464970e6a7d35fd827
MD5 aceb1d7d86ed353d143f683d66c4b6a9
BLAKE2b-256 5869078ca5e611a83e705771c05deab7716ce10db71c4a0c2f9cd0472cc1add3

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.4-cp310-cp310-win_amd64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cifflow-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b6a40e6b6cf5d1de9667e83ea7b83ec94031ebacfa4bb958e8010e05be47ad15
MD5 da4b00c39f10c5ca2d0172b5d6762d0c
BLAKE2b-256 9fe2a8f899a783e9863616e0c99bac0661c0d56a4ac20db89f8ddbfd42b7a83e

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cifflow-0.1.4-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for cifflow-0.1.4-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 37aaecb2219dad2fcfcafc73f605799c4014737164e423ae8f5ca329f2f401d8
MD5 00bf0d411f4940a1ecc23d5a9880fdd2
BLAKE2b-256 080d5e0a3d650a76024f3c4b10c7c4a57b080b318dfe1e8383799fc3efbf0cf7

See more details on using hashes here.

Provenance

The following attestation bundles were made for cifflow-0.1.4-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on rowlesmr/cifflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page