Skip to main content

Molecular Machine Learning for Chemical Applications - A comprehensive Python package for molecular representation learning and property prediction using Graph Neural Networks

Project description

MoML-CA: Molecular Machine Learning for Chemical Applications

MoML-CA is a Python package for molecular representation learning and property prediction using Graph Neural Networks. The package provides a comprehensive set of tools for converting molecular structures to graph representations, training GNN models, and predicting molecular properties.

Features

  • Molecular Graph Creation: Convert SMILES and RDKit molecules to graph representations with extensive feature extraction
  • Hierarchical Graph Representations: Create multi-level graph representations for improved model performance
  • Modular Model Architecture: Flexible and extensible GNN architectures with easy configuration
  • Training Utilities: Comprehensive training pipelines with callbacks and monitoring
  • Evaluation Tools: Metrics calculation and visualization of predictions
  • Example Scripts: Ready-to-use examples for common molecular machine learning tasks
  • Command-Line Tools: Easy-to-use CLI for model training and prediction
  • Data Processing: Efficient batch processing of molecular datasets
  • Visualization: Tools for visualizing molecular graphs and model predictions

Large Files Handling

Large data files (>100MB) like training datasets and models are not stored in the Git repository. These files are ignored by Git via the .gitignore file and should be shared via alternative methods (cloud storage, direct transfer, etc.).

Large files in the data/qm9/processed/ directory (particularly *.pt files) are automatically excluded from Git.

Installation

# Clone the repository (choose HTTPS or SSH)
git clone https://github.com/SAKETH11111/MoML-CA.git
# or, if you have SSH keys configured:
# git clone git@github.com:SAKETH11111/MoML-CA.git
cd MoML-CA

# Create a conda environment
conda env create -f environment.yml

# Activate the environment
conda activate moml-ca

# Install dependencies
pip install -r requirements.txt

# Install the package in development mode
pip install -e .

Quick Start

import torch
from rdkit import Chem
from moml.core import create_graph_processor
from moml.models.mgnn.training import initialize_model, MGNNConfig, create_trainer
from moml.models.mgnn.evaluation.predictor import create_predictor

# Create molecular graph
processor = create_graph_processor({'use_partial_charges': True})
smiles = "C(C(F)(F)F)(C(F)(F)F)(F)F"  # Perfluorobutane
graph = processor.smiles_to_graph(smiles)

# Initialize model with configuration
config = MGNNConfig({
    'model_type': 'multi_task_djmgnn',
    'hidden_dim': 64,
    'n_blocks': 3
})
model = initialize_model(config, graph.x.shape[1], graph.edge_attr.shape[1])

# Train model with dataloaders
trainer = create_trainer(config=config, train_loader=train_loader, val_loader=val_loader)
# Note: train_loader and val_loader should be PyTorch DataLoader objects containing your training and validation datasets.
# See the examples directory (examples/training_examples or examples/quickstart_examples) for how to create these dataloaders.
# Example:
# from torch.utils.data import DataLoader
# train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
# val_loader = DataLoader(val_dataset, batch_size=32)
history = trainer.train(epochs=50)

# Make predictions
predictor = create_predictor(model_path="path/to/saved_model.pt")  # Or pass model directly
predictions = predictor.predict_from_dataloader(val_loader)  # Or predictor.predict([graph])

See the examples directory for more comprehensive examples.

Generating force field labels

After running ORCA calculations you can generate a JSON file containing atom types, partial charges and other force field parameters for each PFAS molecule:

python scripts/generate_force_field_labels.py

The output force_field_labels.json will be placed in orca_results_b3lyp_sto3g/.

Project Structure

MoML-CA/
├── moml/                        # Main package directory
│   ├── core/                    # Core functionality
│   │   ├── graph_coarsening.py      # Graph coarsening algorithms
│   │   └── molecular_graph.py       # Molecular graph representation
│   ├── models/                  # Model implementations
│   │   ├── mgnn/                    # MGNN models
│   │   │   ├── djmgnn.py               # DJMGNN implementation
│   │   │   ├── training/               # Training utilities
│   │   │   └── evaluation/             # Evaluation utilities
│   │   └── lstm/                    # LSTM models
│   ├── data/                    # Data handling utilities
│   │   ├── dataset.py               # Dataset implementations
│   │   └── processors.py            # Data processors
│   ├── utils/                   # Utility functions
│   │   ├── visualization/           # Visualization tools
│   │   ├── molecular/               # Molecular utilities
│   │   └── graph/                   # Graph utilities
│   ├── pipeline/                # Pipeline orchestration
│   ├── simulation/              # Simulation utilities
│   └── __init__.py              # Package initialization
├── examples/                    # Example scripts
│   ├── quickstart/              # Quickstart examples
│   ├── training/                # Training examples
│   ├── prediction/              # Prediction examples
│   ├── molecular_graph/         # Molecular graph examples
│   └── preprocess/              # Preprocessing examples
└── tests/                       # Test directory

Recent Improvements

  • Enhanced Model Architecture: Improved hierarchical graph representations and attention mechanisms
  • Streamlined API: Simplified interface with factory functions and better error handling
  • Advanced Training Features: Added support for mixed precision training and gradient accumulation
  • Improved Data Processing: Enhanced batch processing and memory efficiency
  • Better Visualization: New tools for visualizing molecular graphs and model attention
  • Command-Line Interface: Added CLI tools for common tasks
  • Documentation: Comprehensive documentation with examples and tutorials

Documentation

See the docs directory for comprehensive documentation.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For guidelines on contributing, see CONTRIBUTING.md.

License

This project is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moml_ca-0.1.0.tar.gz (261.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

moml_ca-0.1.0-py3-none-any.whl (197.3 kB view details)

Uploaded Python 3

File details

Details for the file moml_ca-0.1.0.tar.gz.

File metadata

  • Download URL: moml_ca-0.1.0.tar.gz
  • Upload date:
  • Size: 261.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for moml_ca-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e8224aa92571c111e13e3cb83b5c806b490877ad16d01768c544f1db634b53cb
MD5 85f9280bf5512e7d1d3509e20ff34627
BLAKE2b-256 3ee6ae327f88794fb8a63cc17e770d91f3b8af6d70e871bdd54d890359146f47

See more details on using hashes here.

File details

Details for the file moml_ca-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: moml_ca-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 197.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for moml_ca-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ee32529f8e7bfc1627553f4278f8aaaf48820acc6316f39f17af48f66bf5846b
MD5 d370141592d6eebb45c19a528a132505
BLAKE2b-256 1a42d129390b3b358a38c5aa4b899025a09624b5a6bd19e7396e58fb087e59f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page