Skip to main content

Modular pipeline for fetching, curating, and encoding molecular datasets using PubChem data and RDKit's Morgan fingerprinting algorithm.

Project description

MOLRAPTOR: Molecular Learning via Rapid Processing of Topological Representations

CI License: LGPL v3 Version Python

MOLRAPTOR is a pre-stable modular pipeline for fetching, curating, and encoding molecular datasets using PubChem data and RDKit's Morgan fingerprinting algorithm, designed for cheminformatics workflows and phase 1 machine learning applications in computational drug discovery.

Project Structure

MOLRAPTOR/
├── .github/workflows/
│   ├── ci.yml
│   ├── docs.yml
│   └── publish-to-pypi.yml
├── docs/
│   ├── stylesheets/
│   │   └── extra.css
│   ├── api.md
│   ├── cli.md
│   ├── configuration.md
│   ├── index.md
│   ├── installation.md
│   ├── quickstart.md
│   └── release.md
├── examples/
│   └── example_config.yaml
├── molraptor/
│   ├── __init__.py
│   ├── cli.py
│   ├── config.py
│   ├── curate.py
│   ├── fetch.py
│   ├── fingerprint.py
│   ├── fp_integrity.py
│   ├── pipeline.py
│   ├── pubchem.py
│   ├── result_manager.py
│   ├── validators.py
│   └── version.py
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── test_public_api.py
│   └── test_version.py
├── .gitignore
├── CHANGELOG.md
├── CITATION.cff
├── COPYING
├── COPYING.LESSER
├── environment.yml
├── LICENSE
├── mkdocs.yml
├── pyproject.toml
└── README.md

Project Identity

Project: MOLRAPTOR
PyPI distribution: molraptor
Import package: molraptor
CLI: molraptor
Version: 0.1.1
License: LGPL-3.0-or-later
Status: alpha / pre-stable

Documentation

The live documentation is published at:

https://nanobiostructuresrg.github.io/molraptor/

Key pages:

Installation

After PyPI publication:

python -m pip install molraptor

For local development:

git clone https://github.com/NanoBiostructuresRG/molraptor.git
cd molraptor
python -m pip install -e .

For development and documentation tools:

python -m pip install -e ".[dev]"
python -m pip install -e ".[docs]"

Quick Start

Run the pipeline with the bundled example configuration:

molraptor run --config examples/example_config.yaml

Run from Python:

from molraptor import MolraptorConfig, run

config = MolraptorConfig.load("examples/example_config.yaml")
run(config)

Scope

MOLRAPTOR does MOLRAPTOR does not
Fetch molecular properties from PubChem. Train machine learning models.
Curate and validate chemical datasets. Perform dimensionality reduction.
Generate Morgan fingerprints via RDKit. Support non-PubChem data sources (yet).
Output ML-ready .npy and .csv artifacts. Handle 3D molecular structures.
Log failed CIDs for reproducibility. Support alternative fingerprint types (yet).

CLI

molraptor --help
molraptor run --help
molraptor --version

Common commands:

molraptor run
molraptor run --config examples/example_config.yaml
molraptor run --config examples/example_config.yaml --verbose

Public API

from molraptor import MolraptorConfig
from molraptor import validate_config
from molraptor import run
from molraptor import DataValidator
from molraptor import __version__

Modules not listed above are importable directly but are not part of the public contract and may change before 1.0.

Input Format

data/
└── dataset.csv      <- CSV with PubChem CIDs and labels

Minimum required columns: PubChem CID, Label.

Outputs

artifacts/
├── morgan_fp.csv          # Morgan fingerprints (human-readable)
├── morgan_db_*.npy        # Morgan fingerprints (NumPy array, shape: N×size)
├── labels.npy             # Target labels (NumPy array, shape: N,)
└── summary.txt            # Execution report

Local inputs and generated artifacts such as data/, artifacts/, and logs/ are intentionally ignored by Git.

Validation

The current dev/v0.1.1 branch targets:

python -m pytest tests/ -v
mkdocs build --strict
python -m build --no-isolation
python -m twine check dist/*
molraptor --help
molraptor run --help
molraptor --version

Citation

If you use MOLRAPTOR in your research, please cite it using the metadata in CITATION.cff.

Author

Developed by Flavio F. Contreras-Torres. Tecnologico de Monterrey

License

This project is licensed under the terms of the GNU Lesser General Public License v3.0 or later.

SPDX identifier: LGPL-3.0-or-later

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molraptor-0.1.1.tar.gz (29.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

molraptor-0.1.1-py3-none-any.whl (32.2 kB view details)

Uploaded Python 3

File details

Details for the file molraptor-0.1.1.tar.gz.

File metadata

  • Download URL: molraptor-0.1.1.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for molraptor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1195dd34ff4889dd403e0b8d24e834a71d0d8f45408d6bf5ca3c9c71a66738c6
MD5 c30947908c23e33941d2db309449adad
BLAKE2b-256 1838af187e7ee827bb209cf64eefd301579a9108de0e33e5581703421eab046a

See more details on using hashes here.

Provenance

The following attestation bundles were made for molraptor-0.1.1.tar.gz:

Publisher: publish-to-pypi.yml on NanoBiostructuresRG/molraptor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file molraptor-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: molraptor-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 32.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for molraptor-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f3b21de23b3d97f6c5dabc7836577203c563983d7206174167acdd809ba18c12
MD5 f22ce1b9252927f516ba964c59d39454
BLAKE2b-256 b5b9991cf7ae24628264ccbe328e2689a814acdc95ed1b309a970feb2a61a9c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for molraptor-0.1.1-py3-none-any.whl:

Publisher: publish-to-pypi.yml on NanoBiostructuresRG/molraptor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page