Skip to main content

ECIF file format tools for Python.

Project description

ECIF

Extended Crystallographic Information File, which allows you to put multiple crystal structures into the same .ecif file and add additional properties to facilitate various usage scenarios, such as machine learning data. Inspired by the SDF(structure data files) and RDkit PandasTools.

ECIFPandasTools

This Python module provides some tools for handling the conversion between ECIF files and pandas dataframes. ECIF files are a file format used for storing formatted crystal structure information, while pandas dataframes are a data structure used for data analysis.

Installation

You can install the pyecif module via pip. To do this, you need to run the following command in your terminal:

pip install pyecif

Features

  • WriteEcif(df, out, idName='ID', cifColName='CIF', properties=None): Writes a pandas dataframe to an ECIF file. Each row in the dataframe is converted into an ECIF block, each block contains a CIF part and some additional properties.

  • LoadEcif(ecif_file, idName='ID', cifColName='CIF'): Loads data from an ECIF file into a pandas dataframe. Each ECIF block is converted into a row in the dataframe.

  • CifBlock: This is a class for handling ECIF blocks. It provides some methods for setting and getting properties, adding CIF lines, adding CIF from pymatgen structures, getting CIF, getting the entire block, getting pymatgen structures from CIF, adding the entire block, and writing to CIF files.

Usage

First, you need to have a pandas dataframe that contains some pymatgen Structure objects. Then, you can use the WriteEcif function to write this dataframe to an ECIF file. For example:

from pyecif import WriteEcif

# Assume you have a dataframe named df, which contains a column of Structure objects named 'CIF'
WriteEcif(df, 'output.ecif', cifColName='CIF', properties=df.columns)

Then, you can use the LoadEcif function to load data from the ECIF file into a new dataframe. For example:

from pyecif import LoadEcif

df = LoadEcif('output.ecif', cifColName='CIF')

Note that both of these functions accept some optional parameters for specifying the names of certain columns in the dataframe, as well as additional properties to be included in the ECIF file.

Below is a snapshot of our data frame (df). It contains the fields ID, exfoliation energy (exfoliation_en) and crystal structure (CIF).

ID exfoliation_en CIF
mb-jdft2d-001 63.593833 <gemmi.SmallStructure: SrSbSe2F>
mb-jdft2d-002 134.86375 <gemmi.SmallStructure: AgI>
mb-jdft2d-003 43.114667 <gemmi.SmallStructure: Mg(ReO4)2>
mb-jdft2d-004 240.715488 <gemmi.SmallStructure: Si3H>
mb-jdft2d-005 67.442833 <gemmi.SmallStructure: CoO2>
... ... ...
mb-jdft2d-632 26.426545 <gemmi.SmallStructure: HgBr2
mb-jdft2d-633 43.574286 <gemmi.SmallStructure: FeCl3
mb-jdft2d-634 88.808659 <gemmi.SmallStructure: HoBrO
mb-jdft2d-635 132.26525 <gemmi.SmallStructure: GaHO2
mb-jdft2d-636 63.564333 <gemmi.SmallStructure: TaCoTe2

The default is to use gemmi to load the data, one can also choose pymatgen to load the data.

from pyecif import LoadEcif

df = LoadEcif('output.ecif', cifColName='CIF', type='pymatgen')

To better understand the contents of the CIF field, the details of df['CIF'][0] is as below with pymatgen. This is an example describing the position of the elements Hf, Si and Te in the crystal structure, which is the pymatgen.core.Structure class:

Structure Summary
Lattice
    abc : 3.66730534 3.66730534 27.311209
 angles : 90.0 90.0 90.0
 volume : 367.31195815130786
      A : 3.66730534 0.0 2.245576873063498e-16
      B : -2.245576873063498e-16 3.66730534 2.245576873063498e-16
      C : 0.0 0.0 27.311209
    pbc : True True True
PeriodicSite: Hf0 (Hf) (1.493, 3.327, 7.263) [0.4072, 0.9072, 0.2659]
PeriodicSite: Hf1 (Hf) (3.327, 1.493, 3.049) [0.9072, 0.4072, 0.1116]
PeriodicSite: Si2 (Si) (3.327, 3.327, 5.156) [0.9072, 0.9072, 0.1888]
PeriodicSite: Si3 (Si) (1.493, 1.493, 5.156) [0.4072, 0.4072, 0.1888]
PeriodicSite: Te4 (Te) (3.327, 1.493, 8.659) [0.9072, 0.4072, 0.3171]
PeriodicSite: Te5 (Te) (1.493, 3.327, 1.652) [0.4072, 0.9072, 0.06049]

Matbench

Matbench is a benchmark dataset for materials science. You can easily obtain the ECIF format of the Matbench dataset using the example script scripts/get_matbench_jdft2d.py.

Time Cost

Total time: 27.1109 s for 106201 structure (matbench_mp_e_form:train)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyecif-0.1.2.post2.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

pyecif-0.1.2.post2-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file pyecif-0.1.2.post2.tar.gz.

File metadata

  • Download URL: pyecif-0.1.2.post2.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.13

File hashes

Hashes for pyecif-0.1.2.post2.tar.gz
Algorithm Hash digest
SHA256 4f11c5c9627332daffcb2f134a0d6c02bd10a85c9eec66145926ed3e4da3082c
MD5 dceb0ed290bd6a2aaa792d444eda2d92
BLAKE2b-256 7cd7fb983f4e3ff30149ece109cd7aa17eb454e5e81e549e23ee2bdec57d64c1

See more details on using hashes here.

File details

Details for the file pyecif-0.1.2.post2-py3-none-any.whl.

File metadata

File hashes

Hashes for pyecif-0.1.2.post2-py3-none-any.whl
Algorithm Hash digest
SHA256 04a6c49b9fbc1b1326bb4b5a746ca91adba5df21fb70ba5b7503d2dd642f0d56
MD5 5067a67da4c9ed3fedf9ef8a4904a5f3
BLAKE2b-256 1b88aa549506eaca70f5d8897389876e734f72e609acbf6d65bda2d541427377

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page