Performs peak-to-structure matching of GID patterns

Project description

mlgidMATCH

mlgidMATCH performs peak-to-structure matching of GID patterns.

mlgidMATCH

The package performs crystal phase identification from (generally, multi-phase) GID patterns based on Bragg peak positions and their intensities. To validate a measured pattern, the framework requires a set of candidate crystal structures (generally, the entire crystal database can be supplied).

The framework returns (generally, multiple) set(s) of crystal structures that explain all or most of the measured peaks. A full description of the matching algorithm process could be found in (here will be the link to the paper)

Installation

Install from PyPi

pip install mlgidmatch

Install from source

First, clone the repository:

git clone https://github.com/mlgid-project/mlgidMATCH.git

Then, to install all required modules, navigate to the cloned directory and execute:

cd pygidMATCH
pip install -e .

Usage

Preprocessing

Before validation, a preprocessing step is required to convert candidate crystal structures into a neural network-friendly format. It is recommended to perform this step in advance (e.g., before the experiment), as the preprocessing may take several minutes.

To preprocess candidate structures, use the mlgidmatch.preprocess.cif_preprocess.CifPattern class. This class prepares all data required for the neural matching stage and the subsequent peak-to-structure matching.

The class requires a folder containing CIF files, specified by the folder_path argument. If only a subset of CIF files from the folder should be used, the cifs argument can be provided.

The argument create_all=True enables precomputation of patterns for all unique crystal orientations. This option is recommended only when the number of candidate structures is small (up to ~1000), as it may otherwise lead to excessive memory usage.

The class also requires experimental parameters for correct preprocessing. These parameters can be created using the experiment.ExpParameters class from the pygidsim package, which is available on PyPi.

from mlgidmatch.preprocess.cif_preprocess import CifPattern
from pygidsim.experiment import ExpParameters

# path to the folder with CIF files
folder_path = './cifs/'

# list of CIF files to preprocess (if not provided, all CIFs from the folder will be used)
all_cifs = ['struct1.cif', 'struct2.cif', ...]  # optional

params = ExpParameters(q_xy_max=5, q_z_max=5, en=18_000)  # experimental parameters
cif_prepr = CifPattern(
    params=params,
    folder_path=folder_path,
    cifs=all_cifs,  # optional
    create_all=True,  # optional, default: False
)

For future use, it is recommended to save the preprocessed data using pickle format:

import pickle

with open('./mlgidmatch/data/prepr_cifs.pickle', 'wb') as file:
    pickle.dump(cif_prepr, file)

To load the preprocessed data later use the following code:

with open('./mlgidmatch/data/prepr_cifs.pickle', 'rb') as file:
    cif_prepr = pickle.load(file)

Neural Matching

To receive only probabilities for the candidate structures from the neural matching stage, use the following example:

from mlgidmatch.matching import Match

match_class = Match(
    model_path='./cif_matching/models/ResNet18_newimage_14ch_state99999.pt',
    cif_prepr=cif_prepr,
    device='cuda',
)

probabilities = match_class.match_cifs(
    peaks=q_2d,  # np.ndarray, shape (peaks_num, 2)
    q_range=(q_xy_max, q_z_max),  # upper limits of q-range
    candidates=[struct1.cif, struct5.cif],  # candidate structures for the measurement (optional)
)

Peak-to-structure matching

To perform full matching, including neural matching, phase identification and peak-to-structure assignment, use the following example:

from mlgidmatch.matching import Match

match_class = Match(
    cif_prepr=cif_prepr,
    model_path='./cif_matching/models/ResNet18_newimage_14ch_state99999.pt',  # optional
    device='cuda',  # optional
)

# names of the measurements
measurements = ['meas1', 'meas2', ...]

# Peak positions and intensities (own ArrayLike per measurement)
peak_list = [q_2d_1, q_2d_2, ...]
intensities_real_list = [intens1, intens2, ...]

# Upper limits of the q-range (q_xy, q_z)
q_range_list = [(2.7, 2.7), (3.1, 2.5), ...]

# type of the peaks - 'segments' or 'rings'
peaks_type = 'segments'

# Probability threshold (optional)
threshold = 0.5

# Candidate structures for each measurement (optional)
candidates_list = [
    [struct1.cif, struct5.cif],
    [struct2.cif, struct3.cif, struct7.cif],
    ...
],  # Leave empty to use all structures from cif_prepr.cifs

# Matching process
data_matched = match_class.match_all(
    measurements=measurements,
    peak_list=peak_list,
    intensities_real_list=intensities_real_list,
    q_range_list=q_range_list,
    threshold=threshold,  # optional, default: 0.5
    candidates_list=candidates_list,  # optional, Leave empty to use all structures from cif_prepr.cifs
    peaks_type=peaks_type,
)

# Make user-friendly output by removing duplicated solutions (e.g. [DIP + HATCH] and [HATCH + DIP]), 
# description is below in the Output section.
unique_solutions = match_class.unique_solutions(data_matched)

To avoid the neural matching stage and perform only peak-to-structure assignment, use threshold = 0.

Output

After the matching process, data_matched is a dictionary with the following hierarchical structure:

data_matched/
├── <measurement_name>/ # e.g. "meas_1"
│    └── peaks/                        # list of peak positions
│    ├── <phase_1_option_id>/          # integer, first phase index (option 1)
│    │   ├── orient                    # crystal orientation
│    │   ├── probability               # phase probability
│    │   ├── indices_real_matched_all  # indices of the peaks matched to the structure
│    │   │
│    │   ├── <phase_2_option_id>/...   # integer, second phase index (option 1)
│    │   │           └──...
│    │   ├── <phase_2_option_id>/...   # integer, second phase index (option 2)
│    │   │           └──...
... ... ...
│    ├── <phase_1_option_id>/          # integer, first phase index (option 2)
│    │   └──...
│    │   
│    └── ...
│    
└──

This output contains complete information about the peak-to-structure matching process. If no valid solutions are found, the output tree contains only the peaks entry.

An example output is shown below:

data_matched = {
    'meas_1': {
        'peaks': np.array(
            [[0.0310, 0.7514],
             [1.7270, 0.9246],
             [0.3772, 2.5963],
             ...]
        ),
        '0': {
            'cif': 'DIP.cif',
            'orient': np.array([0, 0, 1]),
            'probability': 0.985,
            'indices_real_matched_all': np.array([...]),
            '0': {
                'cif': 'HATCH.cif',
                'orient': np.array([1, 0, 1]),
                'probability': 0.685,
                'indices_real_matched_all': np.array([...])
            },
            '1': {
                'cif': 'ZnPc.cif',
                'orient': np.array([1, 1, 1]),
                'probability': 0.792,
                'indices_real_matched_all': np.array([...]),
                '0': {
                    'cif': 'HATCH.cif',
                    'orient': np.array([1, 0, 1]),
                    'probability': 0.582,
                    'indices_real_matched_all': np.array([...])
                }
            }
        },
        '1': {
            'cif': 'HATCH.cif',
            'orient': np.array([1, 0, 1]),
            'probability': 0.991,
            'indices_real_matched_all': np.array([...]),
            '0': {
                'cif': 'DIP.cif',
                'orient': np.array([0, 0, 1]),
                'probability': 0.911,
                'indices_real_matched_all': np.array([...])
            }
        }
    },
    'meas_2': {
        ...
    }
}

This result indicates that the framework found three valid phase combinations:

00: DIP + HATCH
010: DIP + ZnPc + HATCH
10: HATCH + DIP

Finally, duplicated solutions (e.g. [DIP + HATCH] and [HATCH + DIP]) can be removed using the unique_solutions() method:

unique_solutions = match_class.unique_solutions(data_matched)

An example of the final output where 'meas_1' contains two unique solutions (DIP + HATCH and DIP + ZnPc + HATCH) is shown below:'

unique_solutions = {
    'meas_1': {
        0: [
            {
                'cif': 'DIP.cif',
                'orientation': np.array([0, 0, 1]),
                'matched_peaks': np.array([0.985, 0, 0, ..., 0.985, 0]),
            },
            {
                'cif': 'HATCH.cif',
                'orientation': np.array([1, 0, 1]),
                'matched_peaks': np.array([0, 0.685, 0, ..., 0.685, 0]),
            },
        ],
        1: [
            {
                'cif': 'DIP.cif',
                'orientation': np.array([0, 0, 1]),
                'matched_peaks': np.array([0.985, 0, 0, ..., 0.985, 0]),
            },
            {
                'cif': 'ZnPc.cif',
                'orientation': np.array([1, 1, 1]),
                'matched_peaks': np.array([0.792, 0, 0.792, ..., 0.792, 0]),
            },
            {
                'cif': 'HATCH.cif',
                'orientation': np.array([1, 0, 1]),
                'matched_peaks': np.array([0, 0, 0.582, ..., 0.582, 0.582]),
            }
        ],
    },

    'meas_2': {[...]}
}

Project details

Release history Release notifications | RSS feed

0.1.2

Apr 7, 2026

This version

0.1.1

Mar 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlgidmatch-0.1.1.tar.gz (41.7 MB view details)

Uploaded Mar 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlgidmatch-0.1.1-py3-none-any.whl (41.7 MB view details)

Uploaded Mar 20, 2026 Python 3

File details

Details for the file mlgidmatch-0.1.1.tar.gz.

File metadata

Download URL: mlgidmatch-0.1.1.tar.gz
Upload date: Mar 20, 2026
Size: 41.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for mlgidmatch-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e82e2314e845308385a4ac92e7e962076adaeb874aa120610cec8708fba71541`
MD5	`b5587e97503f4c43dc0673945d338dab`
BLAKE2b-256	`b88d6a1b2f5a6e1de2e7bded60bfc3513d6407b3b89f5a5f622442af70710ecf`

See more details on using hashes here.

File details

Details for the file mlgidmatch-0.1.1-py3-none-any.whl.

File metadata

Download URL: mlgidmatch-0.1.1-py3-none-any.whl
Upload date: Mar 20, 2026
Size: 41.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for mlgidmatch-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`30cf04841d9d746596963a215ec902e537095eb8457ab1b86efcccf2c1b7b956`
MD5	`24361597cba3da42def9cf27cc480130`
BLAKE2b-256	`7ffa02397fd33c3d06306c891d003f4ceec7745ab050e1d2da4153d1125058c8`

See more details on using hashes here.

mlgidmatch 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

mlgidMATCH

Installation

Install from PyPi

Install from source

Usage

Preprocessing

Neural Matching

Peak-to-structure matching

Output

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes