Python library for high-throughput .cif analysis
Project description
cifkit
The documentation is available here: https://bobleesj.github.io/cifkit
cifkit
is designed to provide a set of well-organized and fully-tested utility
functions for handling large datasets, on the order of tens of thousands, of
.cif
files.
The current codebase and documentation are actively being improved as of July 8, 2024.
Motivation
In high-throughput analysis using .cif
files, the research project has
identified the folowing needs:
-
Preprocess files at once:
.cif
files parsed from databases often contain ill-formatted files. We need a tool to standardize, preprocess, and filter out bad files. We also need to copy, move, and sort.cif
files based on specific attributes. -
Visualize coordination geometry: We are interested in determining the coordination geometry and the best site in the supercell for analysis in a high-throughput manner. We need to identify the best site for each site label.
-
Visualize distribution of files: We want to easily identify and categorize a distribution of underlying
.cif
files based on supercell size, tags, coordination numbers, elements, etc.
Quotes
Here is a quote illustrating how cifkit
addresses one of the challenges
mentioned above.
"I am building an X-Ray diffraction analysis (XRD) pattern visualization script for my lab using
pymatgen
. I feel likecifkit
integrated really well into my existing stable of libraries, while surpassing some alternatives in preprocessing and parsing. For example, it was often unclear at what stage an error occurred—whether during pre-processing withCifParser
, or XRD plot generation withdiffraction.core
inpymatgen
. The pre-processing logic incifkit
was communicated clearly, both in documentation and in actual outputs, allowing me to catch errors in my data before it was used in my visualizations. I now usecifkit
by default for processing CIFs before they pass through the rest of my pipeline." - Alex Vtorov
Overview
Designed for individuals with minimal programming experience, cifkit
provides
two primary objects: Cif
and CifEnsemble
.
Cif
Cif
is initialized with a .cif
file path. It parses the .cif
file,
generates supercells, and computes nearest neighbors. It also determines
coordination numbers using four different methods and generates polyhedrons for
each site.
from cifkit import Cif
from cifkit import Example
# Initalize with the example file provided
cif = Cif(Example.Er10Co9In20_file_path)
# Print attributes
print("File name:", cif.file_name)
print("Formula:", cif.formula)
print("Unique element:", cif.unique_elements)
CifEnsemble
CifEnsemble
is initialized with a folder path containing .cif
files. It
identifies unique attributes, such as space groups and elements, across the
.cif
files, moves and copies files based on these attributes. It generates
histograms for all attributes.
from cifkit import CifEnsemble
from cifkit import Example
# Initialize
ensemble = CifEnsemble(Example.ErCoIn_folder_path)
# Get unique attributes
ensemble.unique_formulas
ensemble.unique_structures
ensemble.unique_elements
ensemble.unique_space_group_names
ensemble.unique_space_group_numbers
ensemble.unique_tags
ensemble.minimum_distances
ensemble_test.supercell_atom_counts
Tutorial and documentation
You may use example .cif
files that can be easily imported, and you can visit
the documentation page here.
Installation
To install
pip install cifkit
You may need to download other dependencies:
pip install cifkit pyvista gemmi
gemmi
is used for parsing .cif
files. pyvista
is used for plotting
polyhedrons.
Please check the pyproject.toml
file for the full list of dependencies.
Testing
To run test locally.
# Install all dependencies in editable mode
pip install -e .
# Run test
pytest
Visuals
Polyhedron
You can visualize the polyhedron generated from each atomic site based on the coordination number geometry. In our research, the goal is to map the structure and coordination number with the physical property.
from cifkit import Cif
# Example usage
cif = Cif("your_cif_file_path")
site_labels = cif.site_labels
# Loop through each site
for label in site_labels:
# Dipslay each polyhedron, a file saved for each
cif.plot_polyhedron(label, is_displayed=True)
Histograms
You can use CifEnsemble
to visualize distributions of file counts based on
specific attributes, etc. Learn all features from the documentation provided
here.
By formulas:
By structures:
Open-source projects using cifkit
- CIF Bond Analyzer (CBA) - extract and visualize bonding patterns - DOI | GitHub
- CIF Cleaner - move, copy .cif files based on attributes - GitHub
- Structure Analyzer/Featurizer (SAF) - extract physics-based features from .cif files - GitHub
How to ask for help
cifkit
is also designed for experimental materials scientists and chemists.
- If you have any issues or questions, please feel free to reach out or leave an issue.
How to contribute
Here is how you can contribute to the cifkit
project if you found it helpful:
- Star the repository on GitHub and recommend it to your colleagues who might
find
cifkit
helpful as well. - Fork the repository and consider contributing changes via a pull request.
- If you have any suggestions or need further clarification on how to use
cifkit
, please feel free to reach out to Sangjoon Bob Lee (@bobleesj).
Contributors
cifkit
has been greatly enhanced thanks to the contributions from a diverse
group of researchers:
- Anton Oliynyk: original ideation with
.cif
files - Alex Vtorov: tool recommendation for polyhedron visualization
- Danila Shiryaev: testing as beta user
- Fabian Zills (@PythonFZ): suggested tooling improvements
- Emil Jaffal (@EmilJaffal): initial testing and bug report
- Nikhil Kumar Barua: initial testing and bug report
- Nishant Yadav (@sethisiddha1998): initial testing and bug report
- Siddha Sankalpa Sethi (@runzsh): initial testing and bug report in initial testing and initial testing and bug report
We welcome all forms of contributions from the community. Your ideas and improvements are valued and appreciated.
Citation
Please consider citing cifkit
if it has been useful for your research:
- cifkit – Python package for high-throughput .cif analysis, https://doi.org/10.5281/zenodo.12784259
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cifkit-1.0.3.tar.gz
.
File metadata
- Download URL: cifkit-1.0.3.tar.gz
- Upload date:
- Size: 52.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f66d2303be57f18cb4134cabb5c95b5cf3fbcf33fc959fd8b447e01b8e47ae03 |
|
MD5 | 14ec574e1f812eeffcbc9afdaffd7c55 |
|
BLAKE2b-256 | c202de0d8e4caa976fcb1f6d9b0149a63730c80eab09e5db169ff9baa9c3e599 |
File details
Details for the file cifkit-1.0.3-py3-none-any.whl
.
File metadata
- Download URL: cifkit-1.0.3-py3-none-any.whl
- Upload date:
- Size: 82.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e60be111c850818c4b532d852e6dc54bc97f0f425edde5f67469bf4ac97a4a3 |
|
MD5 | 414297b2f990d16f5a7ed0a4d2e3b943 |
|
BLAKE2b-256 | 19ba95dc4615d705c4b010569bc23353b600558d7533229840a9460361049ea8 |