Python package to cluster molecular structures into groups of similar ones.
Project description
structure_clustering – Cluster Molecular Structures Into Groups of Similar Ones
structure_clustering is a Python package to cluster molecular structures into groups of similar ones. Our approach involves analysing the intermolecular distances to represent each structure's connectivity as an undirected, vertex-labelled graph. It then uses graph isomorphism to identify structures that belong to the same group. The package offers a command-line interface for clustering a multi-XYZ file or can be used within your Python code.
[^1]
[^1]: The figure shows exemplary clusters from Ag⁺(H₂O)₄ structures.
Installation
You can install structure_clustering via pip:
pip install structure_clustering
Prebuilt wheels are available for most platforms. If you prefer to compile and build the wheel yourself, ensure that the Boost Graph Library is installed system-wide.
Using the Command-Line Interface
You can invoke the structure_clustering script using the structure_clustering command.
Use this method if the command does not work
On some systems, scripts installed via pip are not added to the system's PATH. You can either add them to your PATH, or run the script directly by invoking python3 -m structure_clustering.
usage: structure_clustering <xyz_file> [--config CONFIG] [--output OUTPUT] [--disconnected]
Cluster molecular structures into groups.
positional arguments:
xyz_file path of the multi-xyz-file containing the structures
options:
--config CONFIG path of the config TOML file
--output OUTPUT path of the resulting output file, defaults to <xyz_file>.sc.dat
--disconnected if you want to include disconnected graphs
-h, --help show this help message and exit
For example, to cluster an xyz file:
structure_clustering my_structures.xyz
To specify a custom distance for recognising O-H connectivity (see the next section), use a TOML config file:
structure_clustering my_structures.xyz --config sc_config.toml
In both cases, a file named my_structures.xyz.sc.dat will be created, which you can import at https://photophys.github.io/cluster-vis/ to visualise the results of your clustering process.
The terminal output will look like this:
Loading configuration from demo_config.toml
Using covalent radius of 1.59 for Ag
Using pair distance of 2.3 for O-H
Clustering does not include disconnected graphs
Using 437 structures from structures.xyz
Clustering finished <structure_clustering._core.Result object at 0x7f7c949c37b0>
14 clusters (total 318 structures)
13 unique single structures
132 (30.21%) structures sorted out (305 remaining)
cluster size: Avg=22.7 Med=4.5 Q1=2.2 Q3=23.5
connections/structure: Avg=12.2 Med=12.0 Q1=12.0 Q3=12.0 (all 437)
connections/structure: Avg=12.4 Med=12.0 Q1=12.0 Q3=12.0 (remaining 305)
Writing output file to structures.xyz.sc.dat ...
🚀 Open https://photophys.github.io/cluster-vis/ to visualize your results
Configuration File
You can use a TOML file to control the parameters of the command-line interface. The [covalent] section allows you to override the algorithm's default covalent radii. In the [pair] section, you can specify a maximum distance for pairs of atoms.
[covalent]
He = 0.9
Ag = 1.59
[pair]
O-H = 2.3
[options]
only_connected_graphs = true
All settings are optional. Distances are given in Angstrom. Elements are case-sensitive. If you specify only_connected_graphs in the config file, this will overwrite your setting from the command-line switch.
Example Code
import structure_clustering
from structure_clustering import Structure, Atom
sc_machine = structure_clustering.Machine()
sc_machine.setCovalentRadius(1, 0.42) # change hydrogen covalent radius to 0.42
sc_machine.addPairDistance(8, 1, 2.3) # extend max distance for O-H pairs to 2.3 Ang
sc_machine.setOnlyConnectedGraphs(True) # only include fully connected graphs (default)
# you will need some structures
population = structure_clustering.import_multi_xyz("structs.xyz")
# you can also create your structures programmatically
structure = Structure()
structure.addAtom(Atom(8, -1.674872668, 0.0, -0.984966492))
structure.addAtom(Atom(1, -1.674872668, 0.759337, -0.388923492))
structure.addAtom(Atom(1, -1.674872668, -0.759337, -0.388923492))
population += [structure] # add this structure to our population
sc_result = sc_machine.cluster(population)
print("clusters", sc_result.clusters)
print("singles", sc_result.singles)
# Output (indices from the original structure list):
# clusters [[0, 11], [1, 2, 4, 6, 12, 13, 14, 15, 19], [3, 17, 18, 23]]
# singles [9, 16, 22]
License
The structure_clustering package is licensed under the MIT License. See the LICENSE file for more details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file structure_clustering-1.1.1.tar.gz.
File metadata
- Download URL: structure_clustering-1.1.1.tar.gz
- Upload date:
- Size: 15.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fae83432a372c74c4c4bc486902a5122c6bebf809d0357ce7aa3c612a0eb9dd
|
|
| MD5 |
40d54f7b6a84648ae1b167c6057df8e7
|
|
| BLAKE2b-256 |
365c4b4a70a2a687f520e67876f45ffe6ccf10afd881b2a15da4a3dc523d49f5
|
File details
Details for the file structure_clustering-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: structure_clustering-1.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 143.1 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b94e1fc297e6b4ca69271cecbe7e8d3c8affd925e85c18d3f561ef6bf23c4d62
|
|
| MD5 |
878768b2a72c65eb1b7984883a6ee27a
|
|
| BLAKE2b-256 |
0e4d5b288ac203f82c10bfb41a454684471c6f4107d3688493f0e2f218a533d9
|
File details
Details for the file structure_clustering-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: structure_clustering-1.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 143.5 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f9a66fa31a73b6869b7ace52f2bb784e7d963e9dd1bdc4392f7967bf4725e8a
|
|
| MD5 |
24848c3abc1476818fceecceb96603f6
|
|
| BLAKE2b-256 |
8e7c73548e4eb5c6320d3ef02105895b2d8f16c57b2e9828a15155762caea2c9
|
File details
Details for the file structure_clustering-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: structure_clustering-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 142.0 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8da8504057407b2a8c0bc1b059d6fc146d1b86e786f6e5ce8e0f567a611e5a6
|
|
| MD5 |
58891bed872463271861600ad63d41eb
|
|
| BLAKE2b-256 |
ecf6559329deb8ecddabf3a99c0a16cd4adee706ed6f6a3b1810a4bc054417e3
|
File details
Details for the file structure_clustering-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: structure_clustering-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 142.3 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21e5ba261a2c00ad80cf66eabdb7e8d49ca58b610fddb67d57b9e68c57a8ccab
|
|
| MD5 |
9204184c89355b8b91b100fdca5c3518
|
|
| BLAKE2b-256 |
eb28cc04106e18bd14ef475bd4029bbd76c383cf857f4bfadc743267824c620d
|
File details
Details for the file structure_clustering-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: structure_clustering-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 141.9 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
068e7dee265def7e6bbf4c54daab8eb84e7020c679572c03eb61bb235d02e9d4
|
|
| MD5 |
99165afbda74480a9953354331fcaca6
|
|
| BLAKE2b-256 |
cef0a6d05987df5eb450dda97e95d2d7ba2489822fc0daaf9e222d0340ba73d5
|
File details
Details for the file structure_clustering-1.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: structure_clustering-1.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 143.4 kB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33788a0bac5adda02201df9c1f5f9ab10dff60577f38422e882325d7a67f8dd7
|
|
| MD5 |
da1834d0e3daecde7c8d3fdc288c40cf
|
|
| BLAKE2b-256 |
d146f84e8846de298e468cb0423d9dd943c8777ef4e46587891e9714adf52f56
|