Chemistry evaluation metrics (SMILES, Markush) for Docling metrics

These details have been verified by PyPI

Project links

Owner

Docling Project

GitHub Statistics

These details have not been verified by PyPI

Project description

docling-metrics-chemistry

Chemistry evaluation metrics (SMILES and Markush structures) for the Docling metrics framework.

Installation

pip install docling-metrics-chemistry

Package Structure

docling_metrics_chemistry/
├── __init__.py            # Public API exports
├── smiles_metric.py       # BaseMetric implementation (SmilesMetric)
├── molecule_scores.py     # Score computation (Tanimoto, InChI, Markush)
├── smiles_utils.py        # Molecule parsing, canonicalization, wildcard replacement
└── cxsmiles_parser.py     # CXSMILES section parsing (M-sections, Sg-sections)

Key Classes

`SmilesMetric`

The main metric class implementing the BaseMetric interface. Provides three methods:

evaluate_sample(sample) — evaluate a single predicted/ground-truth SMILES pair
aggregate(results) — compute summary statistics from multiple sample results
evaluate_dataset(samples) — evaluate an entire dataset (calls both of the above)

`SmilesInputSample`

Input model for a single evaluation sample.

Field	Type	Default	Description
`id`	`str`	—	Unique sample identifier
`predicted_smiles`	`str`	—	Predicted SMILES or CXSMILES string
`gt_smiles`	`str`	—	Ground truth SMILES or CXSMILES string
`is_markush`	`bool`	`False`	Use Markush/CXSMILES evaluation mode
`remove_stereo`	`bool`	`True`	Remove stereochemistry before comparison

`SmilesSampleResult`

Per-sample result with all computed scores.

Field	Type	Description
`valid`	`bool`	Whether the predicted SMILES is chemically valid
`tanimoto`	`float`	Tanimoto fingerprint similarity (0–1)
`tanimoto1`	`bool`	Whether Tanimoto equals 1.0
`inchi_equality`	`bool`	Whether InChI representations match
`string_equality`	`bool`	Whether SMILES strings are identical
`r`	`Optional[float]`	R-group label accuracy (Markush only)
`m`	`Optional[float]`	M-section accuracy (Markush only)
`sg`	`Optional[float]`	Sg-section accuracy (Markush only)
`num_fragments_gt`	`int`	Fragment count in ground truth
`num_fragments_pred`	`int`	Fragment count in prediction
`num_fragments_equal`	`bool`	Whether fragment counts match
`cxsmi_equality`	`bool`	Overall CXSMILES equality (Markush only)

`SmilesAggregateResult`

Aggregated statistics across a dataset.

Field	Type	Description
`sample_count`	`int`	Number of evaluated samples
`mean_tanimoto`	`float`	Mean Tanimoto similarity
`validity_rate`	`float`	Fraction of valid predictions
`inchi_equality_rate`	`float`	Fraction with matching InChI
`string_equality_rate`	`float`	Fraction with exact string match
`mean_r`	`Optional[float]`	Mean R-group accuracy (Markush samples only)
`mean_m`	`Optional[float]`	Mean M-section accuracy (Markush samples only)
`mean_sg`	`Optional[float]`	Mean Sg-section accuracy (Markush samples only)
`cxsmi_equality_rate`	`Optional[float]`	Fraction with full CXSMILES equality

Metrics

Tanimoto similarity: RDKit fingerprint-based molecular similarity (0–1)
InChI equality: International Chemical Identifier comparison
String equality: Exact canonical SMILES string match
Validity: Whether a SMILES string parses into a valid molecule
Markush evaluation: R-group, M-section, Sg-section accuracy for CXSMILES

Usage

Simple molecule evaluation

from docling_metrics_chemistry import SmilesMetric, SmilesInputSample

metric = SmilesMetric()

# Evaluate a single pair
sample = SmilesInputSample(
    id="sample_1",
    predicted_smiles="CCO",
    gt_smiles="CCO",
)
result = metric.evaluate_sample(sample)
print(result.tanimoto)        # 1.0
print(result.inchi_equality)  # True
print(result.valid)           # True

Dataset evaluation

samples = [
    SmilesInputSample(id="1", predicted_smiles="CCO", gt_smiles="CCO"),
    SmilesInputSample(id="2", predicted_smiles="c1ccccc1", gt_smiles="C1=CC=CC=C1"),
    SmilesInputSample(id="3", predicted_smiles="INVALID", gt_smiles="CCO"),
]

aggregate = metric.evaluate_dataset(samples)
print(aggregate.mean_tanimoto)        # Mean Tanimoto across all samples
print(aggregate.validity_rate)        # Fraction of valid predictions
print(aggregate.inchi_equality_rate)  # Fraction with matching InChI

Markush structure evaluation

# Evaluate CXSMILES with R-groups, M-sections, and Sg-sections
sample = SmilesInputSample(
    id="markush_1",
    predicted_smiles="*C(=O)O.*Cl |m:4:10.11.12.9|",
    gt_smiles="*C(=O)O.*Cl |m:4:10.11.12.9|",
    is_markush=True,
)
result = metric.evaluate_sample(sample)
print(result.r)               # R-group label accuracy (0-1 or None)
print(result.m)               # M-section accuracy (0-1 or None)
print(result.sg)              # Sg-section accuracy (0-1 or None)
print(result.cxsmi_equality)  # Overall CXSMILES match

Project details

These details have been verified by PyPI

Project links

Owner

Docling Project

GitHub Statistics

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.10.0

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docling_metrics_chemistry-0.10.0.tar.gz (98.8 kB view details)

Uploaded Apr 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

docling_metrics_chemistry-0.10.0-py3-none-any.whl (14.5 kB view details)

Uploaded Apr 24, 2026 Python 3

File details

Details for the file docling_metrics_chemistry-0.10.0.tar.gz.

File metadata

Download URL: docling_metrics_chemistry-0.10.0.tar.gz
Upload date: Apr 24, 2026
Size: 98.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docling_metrics_chemistry-0.10.0.tar.gz
Algorithm	Hash digest
SHA256	`952fa487d98f4cc6468d80fd1986ddf5318a680c921f5c6416b523d68cca56d8`
MD5	`6e558d34716dba63787e2064ab4368a6`
BLAKE2b-256	`23dc677a0891cb97a562218283220a9382c3bb40d8ddf54f139c1d3f86fe50bd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for docling_metrics_chemistry-0.10.0.tar.gz:

Publisher: pypi.yml on docling-project/docling-metrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: docling_metrics_chemistry-0.10.0.tar.gz
- Subject digest: 952fa487d98f4cc6468d80fd1986ddf5318a680c921f5c6416b523d68cca56d8
- Sigstore transparency entry: 1367728495
- Sigstore integration time: Apr 24, 2026
Source repository:
- Permalink: docling-project/docling-metrics@eb1bd72fdbc44f41c4da6d56ad3313129b199364
- Branch / Tag: refs/tags/docling-metrics-chemistry-v0.10.0
- Owner: https://github.com/docling-project
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@eb1bd72fdbc44f41c4da6d56ad3313129b199364
- Trigger Event: release

File details

Details for the file docling_metrics_chemistry-0.10.0-py3-none-any.whl.

File metadata

Download URL: docling_metrics_chemistry-0.10.0-py3-none-any.whl
Upload date: Apr 24, 2026
Size: 14.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docling_metrics_chemistry-0.10.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dbb2135245d3ebb179af0a347970ee8ad73e0c0c47b0bbddef38fa62999b59cf`
MD5	`1f50d6aff6577e6829249cc191dac350`
BLAKE2b-256	`26ece9dc74e11363f3ac6d24420f330fa9505f821343ebf31b3c5e2f34041701`

See more details on using hashes here.

Provenance

The following attestation bundles were made for docling_metrics_chemistry-0.10.0-py3-none-any.whl:

Publisher: pypi.yml on docling-project/docling-metrics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: docling_metrics_chemistry-0.10.0-py3-none-any.whl
- Subject digest: dbb2135245d3ebb179af0a347970ee8ad73e0c0c47b0bbddef38fa62999b59cf
- Sigstore transparency entry: 1367728542
- Sigstore integration time: Apr 24, 2026
Source repository:
- Permalink: docling-project/docling-metrics@eb1bd72fdbc44f41c4da6d56ad3313129b199364
- Branch / Tag: refs/tags/docling-metrics-chemistry-v0.10.0
- Owner: https://github.com/docling-project
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@eb1bd72fdbc44f41c4da6d56ad3313129b199364
- Trigger Event: release

docling-metrics-chemistry 0.10.0

Navigation

Verified details

Project links

Owner

GitHub Statistics

Unverified details

Meta

Classifiers

Project description

docling-metrics-chemistry

Installation

Package Structure

Key Classes

SmilesMetric

SmilesInputSample

SmilesSampleResult

SmilesAggregateResult

Metrics

Usage

Simple molecule evaluation

Dataset evaluation

Markush structure evaluation

Project details

Verified details

Project links

Owner

GitHub Statistics

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`SmilesMetric`

`SmilesInputSample`

`SmilesSampleResult`

`SmilesAggregateResult`