Skip to main content

Python library for high-throughput .cif analysis

Project description

cifkit

Integration Tests codecov Python 3.10 Python 3.11 Python 3.12 PyPi version License: MIT

Open Google Codelab

The documentation is available here: https://bobleesj.github.io/cifkit

Logo light mode Logo dark mode

cifkit is designed to provide a set of well-organized and fully-tested utility functions for handling large datasets, on the order of tens of thousands, of .cif files.

The current codebase and documentation are actively being improved as of July 8, 2024.

Motivation

In high-throughput analysis using .cif files, the research project has identified the folowing needs:

  • Preprocess files at once: .cif files parsed from databases often contain ill-formatted files. We need a tool to standardize, preprocess, and filter out bad files. We also need to copy, move, and sort .cif files based on specific attributes.

  • Visualize coordination geometry: We are interested in determining the coordination geometry and the best site in the supercell for analysis in a high-throughput manner. We need to identify the best site for each site label.

  • Visualize distribution of files: We want to easily identify and categorize a distribution of underlying .cif files based on supercell size, tags, coordination numbers, elements, etc.

Quotes

Here is a quote illustrating how cifkit addresses one of the challenges mentioned above.

"I am building an X-Ray diffraction analysis (XRD) pattern visualization script for my lab using pymatgen. I feel like cifkit integrated really well into my existing stable of libraries, while surpassing some alternatives in preprocessing and parsing. For example, it was often unclear at what stage an error occurred—whether during pre-processing with CifParser, or XRD plot generation with diffraction.core in pymatgen. The pre-processing logic in cifkit was communicated clearly, both in documentation and in actual outputs, allowing me to catch errors in my data before it was used in my visualizations. I now use cifkit by default for processing CIFs before they pass through the rest of my pipeline." - Alex Vtorov

Overview

Designed for individuals with minimal programming experience, cifkit provides two primary objects: Cif and CifEnsemble.

Cif

Cif is initialized with a .cif file path. It parses the .cif file, generates supercells, and computes nearest neighbors. It also determines coordination numbers using four different methods and generates polyhedrons for each site.

from cifkit import Cif
from cifkit import Example

# Initalize with the example file provided
cif = Cif(Example.Er10Co9In20_file_path)

# Print attributes
print("File name:", cif.file_name)
print("Formula:", cif.formula)
print("Unique element:", cif.unique_elements)

CifEnsemble

CifEnsemble is initialized with a folder path containing .cif files. It identifies unique attributes, such as space groups and elements, across the .cif files, moves and copies files based on these attributes. It generates histograms for all attributes.

from cifkit import CifEnsemble
from cifkit import Example

# Initialize
ensemble = CifEnsemble(Example.ErCoIn_folder_path)

# Get unique attributes
ensemble.unique_formulas
ensemble.unique_structures
ensemble.unique_elements
ensemble.unique_space_group_names
ensemble.unique_space_group_numbers
ensemble.unique_tags
ensemble.minimum_distances
ensemble_test.supercell_atom_counts

Tutorial and documentation

You may use example .cif files that can be easily imported, and you can visit the documentation page here.

Installation

To install

pip install cifkit

You may need to download other dependencies:

pip install cifkit pyvista gemmi

gemmi is used for parsing .cif files. pyvista is used for plotting polyhedrons.

Please check the pyproject.toml file for the full list of dependencies.

Testing

To run test locally.

# Install all dependencies in editable mode
pip install -e .

# Run test
pytest

Visuals

Polyhedron

You can visualize the polyhedron generated from each atomic site based on the coordination number geometry. In our research, the goal is to map the structure and coordination number with the physical property.

from cifkit import Cif

# Example usage
cif = Cif("your_cif_file_path")
site_labels = cif.site_labels

# Loop through each site
for label in site_labels:
    # Dipslay each polyhedron, a file saved for each
    cif.plot_polyhedron(label, is_displayed=True)

Polyhedron generation

Histograms

You can use CifEnsemble to visualize distributions of file counts based on specific attributes, etc. Learn all features from the documentation provided here.

By formulas:

Histogram

By structures:

Histogram

Open-source projects using cifkit

  • CIF Bond Analyzer (CBA) - extract and visualize bonding patterns - DOI | GitHub
  • CIF Cleaner - move, copy .cif files based on attributes - GitHub
  • Structure Analyzer/Featurizer (SAF) - extract physics-based features from .cif files - GitHub

How to ask for help

cifkit is also designed for experimental materials scientists and chemists.

  • If you have any issues or questions, please feel free to reach out or leave an issue.

How to contribute

Here is how you can contribute to the cifkit project if you found it helpful:

  • Star the repository on GitHub and recommend it to your colleagues who might find cifkit helpful as well. Star GitHub repository
  • Fork the repository and consider contributing changes via a pull request. Fork GitHub repository
  • If you have any suggestions or need further clarification on how to use cifkit, please feel free to reach out to Sangjoon Bob Lee (@bobleesj).

Contributors

cifkit has been greatly enhanced thanks to the contributions from a diverse group of researchers:

  • Anton Oliynyk: original ideation with .cif files
  • Alex Vtorov: tool recommendation for polyhedron visualization
  • Danila Shiryaev: testing as beta user
  • Fabian Zills (@PythonFZ): suggested tooling improvements
  • Emil Jaffal (@EmilJaffal): initial testing and bug report
  • Nikhil Kumar Barua: initial testing and bug report
  • Nishant Yadav (@sethisiddha1998): initial testing and bug report
  • Siddha Sankalpa Sethi (@runzsh): initial testing and bug report in initial testing and initial testing and bug report

We welcome all forms of contributions from the community. Your ideas and improvements are valued and appreciated.

Citation

Please consider citing cifkit if it has been useful for your research:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cifkit-1.0.4.tar.gz (52.4 kB view details)

Uploaded Source

Built Distribution

cifkit-1.0.4-py3-none-any.whl (82.6 kB view details)

Uploaded Python 3

File details

Details for the file cifkit-1.0.4.tar.gz.

File metadata

  • Download URL: cifkit-1.0.4.tar.gz
  • Upload date:
  • Size: 52.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for cifkit-1.0.4.tar.gz
Algorithm Hash digest
SHA256 0ddb5d97a5d784d4f8c161a53ccceab72d0f8c1016c393b71b38d286f1f9a72f
MD5 6ee648f034da2cc9a736324d51e04b0d
BLAKE2b-256 3b8c433859f4783b215d6f2d1f79ca3e55f9211c1fd2b0be9b24c662217e388e

See more details on using hashes here.

File details

Details for the file cifkit-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: cifkit-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 82.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for cifkit-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 763aab54b15f3aafe2a5b031e23ab3bb1dc6490098477a8c59fb4c66f2b15c3a
MD5 e5bc3aba3bc3572bf9be1a87e4434f27
BLAKE2b-256 35ab91b464ac3a703d6f8d8c626a9e9f5b44337cf55e44324e76e22d7d8906df

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page