Skip to main content

Python package to cluster molecular structures into groups of similar ones.

Project description

structure_clustering – Cluster Molecular Structures Into Groups of Similar Ones

structure_clustering is a Python package to cluster molecular structures into groups of similar ones. Our approach involves analysing the intermolecular distances to represent each structure's connectivity as an undirected, vertex-labelled graph. It then uses graph isomorphism to identify structures that belong to the same group. The package offers a command-line interface for clustering a multi-XYZ file or can be used within your Python code.

[^1]

[^1]: The figure shows exemplary clusters from Ag⁺(H₂O)₄ structures.

Installation

You can install structure_clustering via pip:

pip install structure_clustering

Prebuilt wheels are available for most platforms. If you prefer to compile and build the wheel yourself, ensure that the Boost Graph Library is installed system-wide.

Using the Command-Line Interface

You can invoke the structure_clustering script using the structure_clustering command.

Use this method if the command does not work

On some systems, scripts installed via pip are not added to the system's PATH. You can either add them to your PATH, or run the script directly by invoking python3 -m structure_clustering.

usage: structure_clustering <xyz_file> [--config CONFIG] [--output OUTPUT] [--disconnected]

Cluster molecular structures into groups.

positional arguments:
  xyz_file         path of the multi-xyz-file containing the structures

options:
  --config CONFIG  path of the config TOML file
  --output OUTPUT  path of the resulting output file, defaults to <xyz_file>.sc.dat
  --disconnected   if you want to include disconnected graphs
  -h, --help       show this help message and exit

For example, to cluster an xyz file:

structure_clustering my_structures.xyz

To specify a custom distance for recognising O-H connectivity (see the next section), use a TOML config file:

structure_clustering my_structures.xyz --config sc_config.toml

In both cases, a file named my_structures.xyz.sc.dat will be created, which you can import at https://photophys.github.io/cluster-vis/ to visualise the results of your clustering process.

The terminal output will look like this:

Loading configuration from demo_config.toml
Using covalent radius of 1.59 for Ag
Using pair distance of 2.3 for O-H
Clustering does not include disconnected graphs

Using 437 structures from structures.xyz
Clustering finished <structure_clustering._core.Result object at 0x7f7c949c37b0>
  14 clusters (total 318 structures)
  13 unique single structures
  132 (30.21%) structures sorted out (305 remaining)
  cluster size: Avg=22.7 Med=4.5 Q1=2.2 Q3=23.5
  connections/structure: Avg=12.2 Med=12.0 Q1=12.0 Q3=12.0 (all 437)
  connections/structure: Avg=12.4 Med=12.0 Q1=12.0 Q3=12.0 (remaining 305)
Writing output file to structures.xyz.sc.dat ...

🚀 Open https://photophys.github.io/cluster-vis/ to visualize your results

Configuration File

You can use a TOML file to control the parameters of the command-line interface. The [covalent] section allows you to override the algorithm's default covalent radii. In the [pair] section, you can specify a maximum distance for pairs of atoms.

[covalent]
He = 0.9
Ag = 1.59

[pair]
O-H = 2.3

[options]
only_connected_graphs = true

All settings are optional. Distances are given in Angstrom. Elements are case-sensitive. If you specify only_connected_graphs in the config file, this will overwrite your setting from the command-line switch.

Example Code

import structure_clustering
from structure_clustering import Structure, Atom

sc_machine = structure_clustering.Machine()

sc_machine.setCovalentRadius(1, 0.42)  # change hydrogen covalent radius to 0.42
sc_machine.addPairDistance(8, 1, 2.3)  # extend max distance for O-H pairs to 2.3 Ang

sc_machine.setOnlyConnectedGraphs(True)  # only include fully connected graphs (default)

# you will need some structures
population = structure_clustering.import_multi_xyz("structs.xyz")

# you can also create your structures programmatically
structure = Structure()
structure.addAtom(Atom(8, -1.674872668, 0.0, -0.984966492))
structure.addAtom(Atom(1, -1.674872668, 0.759337, -0.388923492))
structure.addAtom(Atom(1, -1.674872668, -0.759337, -0.388923492))
population += [structure]  # add this structure to our population

sc_result = sc_machine.cluster(population)

print("clusters", sc_result.clusters)
print("singles", sc_result.singles)

# Output (indices from the original structure list):
# clusters [[0, 11], [1, 2, 4, 6, 12, 13, 14, 15, 19], [3, 17, 18, 23]]
# singles [9, 16, 22]

License

The structure_clustering package is licensed under the MIT License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

structure_clustering-1.1.1.tar.gz (15.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

structure_clustering-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (143.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

structure_clustering-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (143.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

structure_clustering-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

structure_clustering-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

structure_clustering-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (141.9 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

structure_clustering-1.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (143.4 kB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.17+ x86-64

File details

Details for the file structure_clustering-1.1.1.tar.gz.

File metadata

  • Download URL: structure_clustering-1.1.1.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for structure_clustering-1.1.1.tar.gz
Algorithm Hash digest
SHA256 1fae83432a372c74c4c4bc486902a5122c6bebf809d0357ce7aa3c612a0eb9dd
MD5 40d54f7b6a84648ae1b167c6057df8e7
BLAKE2b-256 365c4b4a70a2a687f520e67876f45ffe6ccf10afd881b2a15da4a3dc523d49f5

See more details on using hashes here.

File details

Details for the file structure_clustering-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for structure_clustering-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b94e1fc297e6b4ca69271cecbe7e8d3c8affd925e85c18d3f561ef6bf23c4d62
MD5 878768b2a72c65eb1b7984883a6ee27a
BLAKE2b-256 0e4d5b288ac203f82c10bfb41a454684471c6f4107d3688493f0e2f218a533d9

See more details on using hashes here.

File details

Details for the file structure_clustering-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for structure_clustering-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9f9a66fa31a73b6869b7ace52f2bb784e7d963e9dd1bdc4392f7967bf4725e8a
MD5 24848c3abc1476818fceecceb96603f6
BLAKE2b-256 8e7c73548e4eb5c6320d3ef02105895b2d8f16c57b2e9828a15155762caea2c9

See more details on using hashes here.

File details

Details for the file structure_clustering-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for structure_clustering-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a8da8504057407b2a8c0bc1b059d6fc146d1b86e786f6e5ce8e0f567a611e5a6
MD5 58891bed872463271861600ad63d41eb
BLAKE2b-256 ecf6559329deb8ecddabf3a99c0a16cd4adee706ed6f6a3b1810a4bc054417e3

See more details on using hashes here.

File details

Details for the file structure_clustering-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for structure_clustering-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 21e5ba261a2c00ad80cf66eabdb7e8d49ca58b610fddb67d57b9e68c57a8ccab
MD5 9204184c89355b8b91b100fdca5c3518
BLAKE2b-256 eb28cc04106e18bd14ef475bd4029bbd76c383cf857f4bfadc743267824c620d

See more details on using hashes here.

File details

Details for the file structure_clustering-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for structure_clustering-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 068e7dee265def7e6bbf4c54daab8eb84e7020c679572c03eb61bb235d02e9d4
MD5 99165afbda74480a9953354331fcaca6
BLAKE2b-256 cef0a6d05987df5eb450dda97e95d2d7ba2489822fc0daaf9e222d0340ba73d5

See more details on using hashes here.

File details

Details for the file structure_clustering-1.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for structure_clustering-1.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 33788a0bac5adda02201df9c1f5f9ab10dff60577f38422e882325d7a67f8dd7
MD5 da1834d0e3daecde7c8d3fdc288c40cf
BLAKE2b-256 d146f84e8846de298e468cb0423d9dd943c8777ef4e46587891e9714adf52f56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page