Skip to main content

Python library for high-throuhgput .cif analysis

Project description

cifkit

Logo light mode Logo dark mode

cifkit is designed to provide a set of well-organized and fully-tested utility functions for handling a large set on the order of ten of thousands of .cif files.

The current codebase and documentation are actively improved. July 3, 2024

Motivation

Since Summer 2023, I have been building interactive tools that analyze .cif files. I have noticed the following needs:

  • Format files at once: .cif files parsed from databases often have ill-formatted files. We need a tool to standardize, preprocess, and filter bad files. I also need to copy, move, and sort .cif files based on specific attributes.
  • Visualize coordination geometry: We are interested in determining the coordination geometry and the best site in the supercell for analysis in a high-throughput manner. We need to identify the best site for each site label.
  • Visualize distribution of files: We want to easily identify and categorize a distribution of underlying .cif files based on supercell size, tags, coordination numbers, elements, etc.

Overview

Designed for people with minimal programming experience, cifkit provides two primary objects: Cif and CifEnsemble.

Cif

Cif is initialized with a .cif file path. It parses the .cif file, preprocesses ill-formatted files, generates supercells, and computes nearest neighbors. It also determines coordination numbers using four different methods and generates polyhedrons for each site.

from cifkit import Cif
from cifkit import Example

# Initalize with the example file provided
cif = Cif(Example.Er10Co9In20_file_path)

# Print attributes
print("File name:", cif.file_name)
print("Formula:", cif.formula)
print("Unique element:", cif.unique_elements)

CifEnsemble

CifEnsemble is initialized with a folder path containing .cif files. It identifies unique attributes, such as space groups and elements, across the .cif files, moves and copies files based on these attributes. It generates histograms for all attributes.

from cifkit import CifEnsemble
from cifkit import Example

# Initialize
ensemble = CifEnsemble(Example.ErCoIn_folder_path)

# Get unique attributes
ensemble.unique_formulas
ensemble.unique_structures
ensemble.unique_elements
ensemble.unique_space_group_names
ensemble.unique_space_group_numbers
ensemble.unique_tags
ensemble.minimum_distances
ensemble_test.supercell_atom_counts

Tutorial and documentation

I provide example .cif files that can be easily imported, and you can visit the documentation page here.

Installation

To install

pip install cifkit

You may need to download other dependencies:

pip install cifkit pyvista gemmi

gemmi is used for parsing .cif files. pyvista is used for plotting polyhedrons.

Visuals

Polyhedron

You can visualize the polyhedron generated from each atomic site based on the coordination number geoemtry. In our research, the goal is to map the structure and coordination number with the physical property.

from cifkit import Cif

# Example usage
cif = Cif("your_cif_file_path")
site_labels = cif.site_labels

# Loop through each site
for label in site_labels:
    # Dipslay each polyhedron, a file saved for each
    cif.plot_polyhedron(label, is_displayed=True)

Polyhedron generation

Histograms

You can use CifEnsemble to visualize distributions of file counts based on specific attributes, etc. Learn all features from the documentation provided here.

By formulas:

Histogram

By structures:

Histogram

Project using cifkit

  • CIF Bond Analyzer (CBA) - extract and visualize bonding patterns - DOI | GitHub

How to ask for help or contribute

cifkit is also designed for experimental materials scientists and chemists. If you encounter any issues or have questions, please feel free to reach out via the email listed on my GitHub profile. My goal is to ensure cifkit is accessible and easy to use for everyone.

Asking for Support

This is my first open-source project. If cifkit has been useful in your research, you could help me by taking 2-3 seconds to "star" this repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cifkit-0.31.tar.gz (46.8 kB view details)

Uploaded Source

Built Distribution

cifkit-0.31-py3-none-any.whl (78.3 kB view details)

Uploaded Python 3

File details

Details for the file cifkit-0.31.tar.gz.

File metadata

  • Download URL: cifkit-0.31.tar.gz
  • Upload date:
  • Size: 46.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for cifkit-0.31.tar.gz
Algorithm Hash digest
SHA256 c85740c9661236d24476388c27908a2507951318f1b413ee80504b0daf1fb58a
MD5 f6fc9f63b1fed5e8fcaec2317923c08c
BLAKE2b-256 0fb77fa9a011dd74f3089cc16046f22e47e07dc9c514696c0cc5b6371ef8ad82

See more details on using hashes here.

File details

Details for the file cifkit-0.31-py3-none-any.whl.

File metadata

  • Download URL: cifkit-0.31-py3-none-any.whl
  • Upload date:
  • Size: 78.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for cifkit-0.31-py3-none-any.whl
Algorithm Hash digest
SHA256 9343d46e2a14510dc041cfe8c9c6bbf61f9bb194b6b67fd825e9083e83bf3c76
MD5 8e5d30656bedfce4cb31f7025a790d28
BLAKE2b-256 52dcea6b473930df8e59c8938143890feadb83e9f25c5109a10390f7c96024cc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page