Skip to main content

No project description provided

Project description

PyPI version Python versions codecov

Graph ID

Graph ID is a universal identifier system for atomistic structures including crystals and molecules. It generates unique, deterministic identifiers based on the topological and compositional properties of atomic structures, enabling efficient structure comparison, database indexing, and materials discovery.

Overview

Graph ID works by:

  1. Converting atomic structures into graph representations where atoms are nodes and bonds are edges
  2. Analyzing the local chemical environment around each atom using compositional sequences
  3. Computing a hash-based identifier that captures both topology and composition
  4. Supporting various modes including topology-only comparisons and Wyckoff position analysis

Features

  • Universal Structure Identification: Generate unique IDs for any crystal or molecular structure
  • Topological Analysis: Option to generate topology-only IDs for structure type comparison
  • Wyckoff Position Support: Include crystallographic symmetry information in ID generation
  • Distance Clustering: Advanced clustering-based analysis for complex structures
  • C++ Performance: High-performance C++ backend with Python bindings
  • Multiple Neighbor Detection: Support for various neighbor-finding algorithms (MinimumDistanceNN, CrystalNN, etc.)

Installation

From PyPI

pip install graph-id-core
pip install graph-id-db  # optional database component

From Source

git clone https://github.com/kmu/graph-id-core.git
cd graph-id-core
git submodule update --init --recursive
pip install -e .

Quick Start

Basic Usage

from pymatgen.core import Structure, Lattice
from graph_id import GraphIDMaker

# Create a structure (NaCl)
structure = Structure.from_spacegroup(
    "Fm-3m",
    Lattice.cubic(5.692),
    ["Na", "Cl"],
    [[0, 0, 0], [0.5, 0.5, 0.5]]
)

# Generate Graph ID
maker = GraphIDMaker()
graph_id = maker.get_id(structure)
print(graph_id)  # Output: NaCl-88c8e156db1b0fd9

Loading from Files

from pymatgen.core import Structure
from graph_id_cpp import GraphIDGenerator

# Load structure from file
structure = Structure.from_file("path/to/structure.cif")
generator = GraphIDGenerator()
graph_id = generator.get_id(structure)

Advanced Configuration

from graph_id_cpp import GraphIDGenerator
from pymatgen.analysis.local_env import CrystalNN

# Topology-only comparison (ignores composition)
topo_gen = GraphIDGenerator(topology_only=True)
topo_id = topo_gen.get_id(structure)

# Include Wyckoff positions
wyckoff_gen = GraphIDGenerator(wyckoff=True)
wyckoff_id = wyckoff_gen.get_id(structure)

# Use different neighbor detection
crystal_gen = GraphIDGenerator(nn=CrystalNN())  # Faster CrystalNN using C++ is also available
crystal_id = crystal_gen.get_id(structure)

Search Structures from Database

Use graph-id-db to search structures in the Materials Project using precomputed Graph ID stored in graph-id-db

# pip install graph-id-db
from graph_id_cpp import GraphIDGenerator

from pymatgen.core import Structure, Lattice

structure = Structure.from_spacegroup(
    "Fm-3m",
    Lattice.cubic(5.692),
    ["Na", "Cl"],
    [[0, 0, 0], [0.5, 0.5, 0.5]]
).get_primitive_structure()
gen = GraphIDGenerator()
graph_id = gen.get_id(structure)
print(f"Graph ID of NaCl is {graph_id}")

from graph_id_db import Finder

# Search for structures in graph-id-db using GraphID
finder = Finder()
finder.find(graph_id)

Examples

More comprehensive examples can be found in the tests/ and examples/ directories.

Applications

Graph ID is particularly useful for:

  • Materials Databases: Efficient indexing and deduplication of structure databases
  • High-throughput Screening: Rapid identification of unique structures in computational workflows
  • Polymorph Identification: Distinguishing between different polymorphs of the same composition

Web Service (experimental)

You can search materials using Graph ID at matfinder.net.

Developer's notes

This repo is managed by poetry.

Installation

  1. Clone the repository:
git clone https://github.com/kmu/graph-id-core.git
cd graph-id-core
  1. Initialize git submodules (required for the C++ build):
git submodule update --init --recursive
  1. Install the package and dependencies using Poetry:
poetry install
  1. Install pre-commit
pre-commit install

Note: The git submodules (library/pybind11, library/eigen, library/gtl) are required for building the C++ extension. Without them, the installation will fail during the CMake build step.

Testing

poetry run pytest

If you have made changes to the C++ code, run poetry run pip install -e --force-reinstall to apply the changes before running the tests.

Releasing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graph_id_core-0.1.15.tar.gz (5.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graph_id_core-0.1.15-cp310-cp310-manylinux_2_28_x86_64.whl (7.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file graph_id_core-0.1.15.tar.gz.

File metadata

  • Download URL: graph_id_core-0.1.15.tar.gz
  • Upload date:
  • Size: 5.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for graph_id_core-0.1.15.tar.gz
Algorithm Hash digest
SHA256 594c630a133928c034ab7ca1f55af327499abbf77b45874977a407c311fa0f79
MD5 ba8bbbc942d12c5da53316e4e275b990
BLAKE2b-256 2aecc39f9a8f0bcc8a94e29a84f089aa6df7acb12333f6e916110cb67c76d70d

See more details on using hashes here.

File details

Details for the file graph_id_core-0.1.15-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for graph_id_core-0.1.15-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9c89d9c3db8dd4dad83f16d0ff90f2a543cd5189a6b1dd2fc55dd8cea4db7197
MD5 e01abf84ea0ab1f3487c8e9aaa0d4857
BLAKE2b-256 fd9985fe087d556f9d376e8100e6821ecf2f1c8aee43b9b9aa36910bc0b0e032

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page