Skip to main content

Implements fingerprints (isometry invariants) of crystals based on geometry: average minimum distances (AMD) and point-wise distance distributions (PDD). Includes .cif reading tools.

Project description

average-minimum-distance: isometrically invariant crystal fingerprints

PyPI Status Read the Docs MATCH Paper CC-0 license

Implements fingerprints (isometry invariants) of crystals based on geometry: average minimum distances (AMD) and point-wise distance distributions (PDD). Includes .cif reading tools.

If you use our code in your work, please cite us. The bib reference is at the bottom of this page; click here jump to it.

What's amd?

A crystal is an arrangement of atoms which periodically repeats according to some lattice. The atoms and lattice defining a crystal are typically recorded in a .CIF file, but this representation is ambiguous, i.e. different .CIF files can define the same crystal. This package implements new isometric invariants called AMD (average minimum distance) and PDD (point-wise distance distribution) based on inter-point distances, which are guaranteed to take the same value for all equivalent representations of a crystal. They do this in a continuous way; crystals which are similar have similar AMDs and PDDs.

For a technical description of AMD, see our paper on arXiv. Detailed documentation of this package is available on readthedocs.

Use pip to install average-minimum-distance:

pip install average-minimum-distance

Then import average-minimum-distance with import amd.

Getting started

The central functions of this package are amd.AMD() and amd.PDD(), which take a crystal and a positive integer k, returning the crystal's AMD/PDD up to k. An AMD is a 1D numpy array, whereas PDDs are 2D arrays. The AMDs or PDDs can then be passed to functions to compare them.

Reading crystals

The following example reads a .CIF with amd.CifReader and computes the AMDs (k=100):

import amd

# read all structures in a .cif and put their amds (k=100) in a list
reader = amd.CifReader('path/to/file.cif')
amds = [amd.AMD(crystal, 100) for crystal in reader]

Note: CifReader accepts optional arguments, e.g. for removing hydrogen and handling disorder. See the documentation for details.

A crystal can also be read from the CSD using amd.CSDReader (if csd-python-api is installed), or created manually.

Comparing AMDs or PDDs

The package includes functions for comparing sets of AMDs or PDDs.

They behave like scipy's function scipy.distance.spatial.pdist, which takes a set of points and compares them pairwise, returning a condensed distance matrix, a 1D vector containing the distances. This vector is the upper half of the 2D distance matrix in one list, since for pairwise comparisons the matrix is symmetric. The function amd.AMD_pdist similarly takes a list of AMDs and compares them pairwise, returning the condensed distance matrix:

cdm = amd.AMD_pdist(amds)

The default metric for comparison is chebyshev (l-infinity), though it can be changed to anything accepted by scipy's pdist, e.g. euclidean.

It is preferable to store the condensed matrix, though if you want the symmetric 2D distance matrix, use scipy's squareform:

from scipy.distance.spatial import squareform
dm = squareform(cdm)
# now dm[i][j] is the AMD distance between amds[i] and amds[j].

The function amd.AMD_pdist has an equivalent for PDDs, amd.PDD_pdist. There are also the equivalents of scipy.distance.spatial.cdist, amd.AMD_cdist and amd.PDD_cdist, which take two sets and compares one vs the other, returning a 2D distance matrix.

Example: PDD-based dendrogram of crystals in a .CIF

This example reads crystals from a .CIF, compares them by PDD and plots a single linkage dendrogram:

import amd
import matplotlib.pyplot as plt
from scipy.cluster import hierarchy

crystals = list(amd.CifReader('crystals.cif'))
names = [crystal.name for crystal in crystals]
pdds = [amd.PDD(crystal, 100) for crystal in crystals]
cdm = amd.PDD_pdist(pdds)
Z = hierarchy.linkage(cdm, 'single')
dn = hierarchy.dendrogram(Z, labels=names)
plt.show()

Example: Finding n nearest neighbours in one set from another

Here is an example showing how to read two sets of crystals from .CIFs set1.cif and set2.cif and find the 10 nearest PDD-neighbours in set 2 for every crystal in set 1.

import numpy as np
import amd

n = 10
k = 100

set1 = list(amd.CifReader('set1.cif'))
set2 = list(amd.CifReader('set2.cif'))

set1_pdds = [amd.PDD(s, k) for s in set1]
set2_pdds = [amd.PDD(s, k) for s in set2]

dm = amd.PDD_cdist(set1_pdds, set2_pdds)

# the following uses np.argpartiton (like argsort but not for the whole list)
# and np.take_along_axis to find nearest neighbours of each item given the
# distance matrix.
# nn_dists[i][j] = distance from set1[i] to its (j+1)st nearest neighbour in set2 
# nn_inds[i][j] = index of set1[i]'s (j+1)st nearest neighbour in set2
# it's (j+1)st as index 0 refers to the first nearest neighbour

nn_inds = np.array([np.argpartition(row, n)[:n] for row in dm])
nn_dists = np.take_along_axis(dm, nn_inds, axis=-1)
sorted_inds = np.argsort(nn_dists, axis=-1)
nn_inds = np.take_along_axis(nn_inds, sorted_inds, axis=-1)
nn_dists = np.take_along_axis(nn_dists, sorted_inds, axis=-1)

# now to print the names of these nearest neighbours and their distances:
set1_names = [s.name for s in set1]
set2_names = [s.name for s in set2]

for i in range(len(set1)):
    print('neighbours of', set1_names[i])
    for j in range(n):
        jth_nn_index = nn_inds[i][j]
        print('neighbour', j+1, set2_names[jth_nn_index], 'dist:', nn_dists[i][j])

Cite us

The arXiv paper for this package is here. Use the following bib reference to cite us:

@article{amd2022,
  title = {Average Minimum Distances of periodic point sets - foundational invariants for mapping all periodic crystals},
  author = {Daniel Widdowson and Marco M Mosca and Angeles Pulido and Vitaliy Kurlin and Andrew I Cooper},
  journal = {MATCH Communications in Mathematical and in Computer Chemistry},
  doi = {10.46793/match.87-3.529W},
  volume = {87},
  number = {3},
  pages = {529-559},
  year = {2022}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

average-minimum-distance-1.2.0.tar.gz (34.2 kB view details)

Uploaded Source

Built Distribution

average_minimum_distance-1.2.0-py3-none-any.whl (34.9 kB view details)

Uploaded Python 3

File details

Details for the file average-minimum-distance-1.2.0.tar.gz.

File metadata

File hashes

Hashes for average-minimum-distance-1.2.0.tar.gz
Algorithm Hash digest
SHA256 3ec5eaf86d84186b8ca5e5fe0cb115c09de2486b4707e2728a5b378e48005ef8
MD5 294797d2aedc4d9929ac71d5bc747b66
BLAKE2b-256 f5ea68b3f5981b2a899aa3a57cd01593d8e5854ba2a43612d81af6c810b8d755

See more details on using hashes here.

File details

Details for the file average_minimum_distance-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for average_minimum_distance-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1a09654107bd6b43c8a3eb34146221305d1948269455fe2c46db7a44a980d3f1
MD5 f76bd8c8cd50ced5290a61eb295f0896
BLAKE2b-256 91924b7be226e361d21353731abb20b808001ec1ce1677a0caa482288f3a302c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page