Skip to main content

For calculation and comparison of AMD/PDD isometric invariants. Includes .cif reading functionality.

Project description

amd: distance-based isometry invariants

For calculation and comparison of AMD/PDD isometric invariants. Includes functions for extracting periodic set representations of crystal structures from .cif files.

Requirements

  • numpy and scipy
  • ase or ccdc to read in .cif files (ase recommended).

Use pip to install average-minimum-distance and ase (if required):

pip install average-minimum-distance ase

average-minimum-distance is imported with import amd.

Guide

Reading .cifs

amd includes functionality to read .cif files and extract their motif and cell in Cartesian form. To do so requires either ase or ccdc. ase is the default and recommended, as it can be easily pip installed. ccdc is not recommended, but in some specific cases it provides useful options. Using ccdc requires a valid license.

All readers return PeriodicSet objects, which have attributes name, motif and cell. PeriodicSets are intended for easy use with the AMD/PDD calculators.

The following creates a CifReader object which can be iterated over to get all (valid) structures in a .cif:

import amd
reader = amd.CifReader('path/to/file.cif')

This can be used in a loop, comprehension or converted to a list:

for periodic_set in reader:
    print(periodic_set.motif.shape[0]) # prints number of motif points

periodic_sets = list(reader)

By default, the reader will skip structures that cannot be read and print a warning.

If your .cif contains just one structure, use:

periodic_set = list(amd.CifReader('path/to/one_structure.cif'))[0]

If you have a folder with many .cifs each with one structure:

import os
folder = 'path/to/cif_folder'
periodic_sets = [list(amd.CifReader(os.path.join(folder, filename)))[0] 
                 for filename os.listdir(folder)]

CifReader has several optional arguments:

reader = amd.CifReader(filename,
                       reader='ase',
                       remove_hydrogens=False,
                       allow_disorder=False,
                       dtype=np.float64,
                       heaviest_component=False)

Most useful (and stable) is remove_hydrogens. The rest are usually not needed and should be changed from the defaults with some caution.

reader ('ase' or 'ccdc') is the backend package used to read the .cif. ase is recommended and can be easily pip installed. Choosing ccdc allows setting heaviest_component to True, this is used to remove solvents by removing all but the heaviest connected component in the asymmetric unit. For some .cifs this can produce unintended results.

allow_disorder contols handling of disordered structures. By default they are skipped by the reader and a warning is printed. Disordered structures don't make sense under the periodic set model; there is no good way to interpret them. Setting this to True will ignore disorder, including every atomic site regardless. Note: when disorder information is missing on a site, the reader assumes there is no disorder there.

dtype is the numpy datatype of the motif and cell returned by the reader. The default np.float64 should be fine for most cases. If the size of the data is limiting it may help to set dtype=np.float32.

Calculating AMDs and PDDs

The functions amd.amd and amd.pdd are for AMD and PDD calculations respectively. They have 2 required arguments:

  • either a PeriodicSet given by a reader or a tuple (motif, cell) of numpy arrays,
  • an integer k > 0.

The following creates a list of AMDs (with k=100) for structures in a .cif:

from amd import CifReader, amd
amds = [amd(periodic_set, 100) for periodic_set in amd.CifReader('path/to/file.cif')]

The functions also accept a tuple (motif, cell) to allow quick tests without a .cif, for example this calculates PDD (k=100) for a simple cubic lattice:

import numpy as np
from amd import pdd
motif = np.array([[0,0,0]]) # one point at the origin
cell = np.identity(3)       # unit cell = identity
cubic_pdd = pdd((motif, cell), 100)

PDDs are returned as a concatenated matrix with weights in the first column.

Remember that amd and pdd always expect Cartesian forms of a motif and cell, with all points inside the cell. If you have unit cell parameters or fractional coodinates, then use amd.cellpar_to_cell to convert a,b,c,alpha,beta,gamma to a 3x3 Cartesian cell, then motif = np.matmul(frac_motif, cell) to get the motif in Cartesian form before passing to amd or pdd.

Comparing AMDs and PDDs

So far, AMDs are simply compared with l-infinity/chebyshev distance. To compare AMDs, it is recommended you use amd.compare. It has two required arguments, reference and comparison, which both may either by a single AMD, a list of AMDs, or a numpy array with AMDs in rows. In any case the length of all AMDs must be equal. For m reference AMDs and n comparison AMDs, amd.compare returns a distance matrix with shape (m, n), where the (i,j)-th entry is the AMD distance between reference i and comparison j. For example, this code compares all structures in one .cif to all structures in another by AMD100:

import amd

k = 100
set_1_amds = [amd.amd(s, k) for s in amd.CifReader('set_1.cif')]
set_2_amds = [amd.amd(s, k) for s in amd.CifReader('set_2.cif')]

distance_matrix = amd.compare(set_1_amds, set_2_amds)

To compare a collection pairwise, just pass it in twice, e.g.:

amds = [amd.amd(s, 100) for s in amd.CifReader('structures.cif')]
distance_matrix = amd.compare(amds, amds)

To compare two PDDs, use amd.emd. This gives the Earth mover's distance between crystals in two seperate .cifs:

pdd_1 = amd.pdd(list(amd.CifReader('crystal_1.cif'))[0])
pdd_2 = amd.pdd(list(amd.CifReader('crystal_2.cif'))[0])
dist = amd.emd(pdd_1, pdd_2)

A simple function for comparing many PDDs is not yet implimented, but is easy enough with a loop over two lists of PDDs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

average-minimum-distance-1.0.4.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

average_minimum_distance-1.0.4-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file average-minimum-distance-1.0.4.tar.gz.

File metadata

  • Download URL: average-minimum-distance-1.0.4.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.7.0

File hashes

Hashes for average-minimum-distance-1.0.4.tar.gz
Algorithm Hash digest
SHA256 ef9841f6b12b12a883e1a3b634ba338b1a4d7b84c91a7754f1d392869044b881
MD5 c0b4ef1a50e7badf41c9945d4dff264b
BLAKE2b-256 0e1be419e4644ba3e6567609e9a0207a041c5a266d73a927a35ad7395cbd5441

See more details on using hashes here.

File details

Details for the file average_minimum_distance-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: average_minimum_distance-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.7.0

File hashes

Hashes for average_minimum_distance-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 eaa5c44882a4d831e03311259e03c0a2ffdf2fa8c561ddd6b6e00abee5ebea5c
MD5 b63f90dd396138ae4aaf598f2691a52f
BLAKE2b-256 59d69ed139cb002747e4c8a4921dcabd4b23c5036b5adcec8c15d28aa92e9e1b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page