Skip to main content

Deep learning framework for predicting enzyme-catalyzed reactions from protein sequences

Project description

RXNRECer

Python 3.10+ PyTorch License: MIT PyPI

RXNRECer v1.3.4 is a deep learning framework for predicting enzyme-catalyzed reactions from protein sequences. It is the official implementation of "RXNRECer: Active Learning with Protein Language Models for Fine-Grained Enzyme Reaction Prediction."

๐ŸŽ‰ Now available on PyPI for easy installation!

๐Ÿš€ Features

  • Multi-Stage Prediction: S1 (reaction prediction), S2 (reaction integration), S3 (LLM reasoning)
  • Protein Sequence Analysis: Process protein sequences in FASTA format
  • Deep Learning Models: ESM-2 embeddings with advanced neural architectures
  • GPU Acceleration: CUDA support for faster inference
  • Easy-to-use CLI: Simple command-line interface with comprehensive options
  • Smart Caching: Automatic result caching for faster repeated predictions

๐Ÿ“‹ Requirements

  • Python 3.10+
  • PyTorch 2.0+
  • CUDA 11.0+ (recommended)
  • 32GB+ RAM
  • 40GB+ disk space

๐Ÿš€ Quick Start

1. Install (Recommended)

# Install from PyPI (recommended)
pip install rxnrecer

# Or install from GitHub
pip install git+https://github.com/kingstdio/RXNRECer.git

2. Download Data

# Download required data and model files (~35.8GB total)
rxnrecer-download-data

# Or download separately
rxnrecer-download-data --data-only      # ~8.8GB
rxnrecer-download-data --models-only    # ~14GB
rxnrecer-download-data --extools-only   # ~13GB

3. Run Prediction

# Basic prediction (S1 mode)
rxnrecer -i input.fasta -o output.tsv -m s1

# Detailed prediction (S2 mode)
rxnrecer -i input.fasta -o output.tsv -m s2

# LLM reasoning (S3 mode, requires API key)
rxnrecer -i input.fasta -o output.json -m s3 -f json

๐Ÿ”ง Usage

Command Line Options

rxnrecer [OPTIONS]

Options:
  -i, --input_fasta    Input FASTA file path (required)
  -o, --output_file    Output file path
  -f, --format         Output format: tsv or json (default: tsv)
  -m, --mode           Prediction mode: s1, s2, or s3 (default: s1)
  -b, --batch_size     Batch size for processing (default: 100)
  -v, --version        Show version

Examples

# Basic usage
rxnrecer -i proteins.fasta -o results.tsv

# Custom batch size
rxnrecer -i proteins.fasta -o results.tsv -b 50

# JSON output
rxnrecer -i proteins.fasta -o results.json -f json

# Use default output path
rxnrecer -i proteins.fasta -m s1

Input Format

FASTA file with protein sequences:

>P12345|Sample protein 1
MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG
>P67890|Sample protein 2
MKLIVWALLLLAAWAVERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG

Output Formats

TSV Output (S1/S2):

input_id	RXNRECer	RXNRECer_with_prob	rxn_details
P12345	RHEA:24076;RHEA:14709	0.9999;0.9999	[reaction details]

JSON Output (S3):

[
  {
    "reaction_id": "RHEA:24076",
    "prediction_confidence": 0.9999,
    "reaction_details": {...}
  }
]

๐Ÿ†• Advanced Features

Smart Caching

Results are automatically cached for faster repeated predictions:

# Check cache status
rxnrecer-cache status

# Clear cache
rxnrecer-cache clear --all

Data Management

Easy data and model file management:

# Download data
rxnrecer-download-data

# Force re-download
rxnrecer-download-data --force

๐Ÿ“ Project Structure

RXNRECer/                               # Project root (release)
โ”œโ”€โ”€ .github/                            # CI/CD workflows
โ”‚   โ””โ”€โ”€ workflows/
โ”œโ”€โ”€ rxnrecer/                           # Main Python package
โ”‚   โ”œโ”€โ”€ cli/                            # Command-line interface
โ”‚   โ”œโ”€โ”€ config/                         # Configuration
โ”‚   โ”œโ”€โ”€ lib/                            # Core libraries
โ”‚   โ”‚   โ”œโ”€โ”€ datasource/                 # Data source handling
โ”‚   โ”‚   โ”œโ”€โ”€ embedding/                  # Protein embeddings
โ”‚   โ”‚   โ”œโ”€โ”€ evaluation/                 # Evaluation helpers
โ”‚   โ”‚   โ”œโ”€โ”€ llm/                        # Language model integration
โ”‚   โ”‚   โ”œโ”€โ”€ ml/                         # Machine learning utilities
โ”‚   โ”‚   โ”œโ”€โ”€ model/                      # Model architectures
โ”‚   โ”‚   โ”œโ”€โ”€ rxn/                        # Reaction processing
โ”‚   โ”‚   โ””โ”€โ”€ smi/                        # SMILES handling
โ”‚   โ”œโ”€โ”€ models/                         # Model wrappers
โ”‚   โ””โ”€โ”€ utils/                          # Utility functions
โ”‚
โ”œโ”€โ”€ extools/                            # External tools (downloaded)
โ”‚   โ”œโ”€โ”€ ec/                             # EC-related resources
โ”‚   โ””โ”€โ”€ msa/                            # MSA binaries (e.g., diamond)
โ”‚
โ”œโ”€โ”€ data/                               # Data files (download required)
โ”‚   โ”œโ”€โ”€ chebi/                          # ChEBI database
โ”‚   โ”œโ”€โ”€ cpd_svg/                        # Compound SVG files
โ”‚   โ”œโ”€โ”€ datasets/                       # Training datasets
โ”‚   โ”œโ”€โ”€ dict/                           # Dictionary files
โ”‚   โ”œโ”€โ”€ feature_bank/                   # Feature bank
โ”‚   โ”œโ”€โ”€ rhea/                           # RHEA database
โ”‚   โ”œโ”€โ”€ rxn_json/                       # Reaction JSON files
โ”‚   โ”œโ”€โ”€ sample/                         # Sample data
โ”‚   โ””โ”€โ”€ uniprot/                        # UniProt database
โ”‚
โ”œโ”€โ”€ ckpt/                              # Model checkpoints (download required)
โ”‚   โ”œโ”€โ”€ esm/                           # ESM models
โ”‚   โ”œโ”€โ”€ prostt5/                       # ProSTT5 models
โ”‚   โ””โ”€โ”€ rxnrecer/                      # RXNRECer model files
โ”‚
โ”œโ”€โ”€ results/                            # Output results
โ”‚   โ”œโ”€โ”€ cache/                          # Prediction cache
โ”‚   โ”œโ”€โ”€ logs/                           # Log files
โ”‚   โ”œโ”€โ”€ predictions/                    # Prediction outputs
โ”‚   โ””โ”€โ”€ sample/                         # Sample results
โ”‚
โ”œโ”€โ”€ docs/                               # Documentation
โ”œโ”€โ”€ scripts/                            # Build and utility scripts
โ”œโ”€โ”€ MANIFEST.in                         # Package data manifest
โ”œโ”€โ”€ pyproject.toml                      # Build and dependencies for PyPI
โ”œโ”€โ”€ environment_rxnrecer-release.yml    # Conda environment
โ”œโ”€โ”€ LICENSE                             # MIT License
โ”œโ”€โ”€ README.md                           # This file
โ””โ”€โ”€ .gitignore                          # Git ignore rules

๐Ÿ”ง Configuration

For S3 mode (LLM reasoning), set your API key:

export LLM_API_KEY="your_api_key_here"
export LLM_API_URL="your_api_url_here"

Examples:

# OpenRouter
export LLM_API_KEY="sk-or-v1-your_openrouter_key_here"
export LLM_API_URL="https://openrouter.ai/api/v1"

# OpenAI
export LLM_API_KEY="sk-your_openai_key_here"
export LLM_API_URL="https://api.openai.com/v1"

# Anthropic
export LLM_API_KEY="sk-ant-your_anthropic_key_here"
export LLM_API_URL="https://api.anthropic.com"

Jupyter Notebook Setup

import os
from rxnrecer.config import config as cfg

# Set your API credentials
cfg.LLM_API_KEY = "your_api_key_here"
cfg.LLM_API_URL = "your_api_url_here"

๐Ÿ“ฆ Installation Options

PyPI Installation (Recommended)

pip install rxnrecer

GitHub Installation (Latest)

pip install git+https://github.com/kingstdio/RXNRECer.git
  • ๐Ÿ”ง Development: Latest development version
  • ๐Ÿ”ง Custom: For advanced users

๐Ÿ“š Documentation

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Open a Pull Request

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ“ž Contact


๐ŸŽฏ Get started now with: pip install rxnrecer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rxnrecer-1.3.4.tar.gz (69.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rxnrecer-1.3.4-py3-none-any.whl (77.5 kB view details)

Uploaded Python 3

File details

Details for the file rxnrecer-1.3.4.tar.gz.

File metadata

  • Download URL: rxnrecer-1.3.4.tar.gz
  • Upload date:
  • Size: 69.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for rxnrecer-1.3.4.tar.gz
Algorithm Hash digest
SHA256 03249890704caad09bbe9e718d5eb5fb12c646bfc0d7a3d524740ec1baabdda3
MD5 27eeb3927112377a9f785179d122b46b
BLAKE2b-256 340faec1eebc942996e628c196931d21fb2ebd3ab2111c1d220d19c8b4feef39

See more details on using hashes here.

File details

Details for the file rxnrecer-1.3.4-py3-none-any.whl.

File metadata

  • Download URL: rxnrecer-1.3.4-py3-none-any.whl
  • Upload date:
  • Size: 77.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for rxnrecer-1.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 50c3c8eb21160697e351c7b7b091a1ea2645f2e20f936fc286c328664e759001
MD5 2fab87d6e2712bee21806dbdf7b99920
BLAKE2b-256 72f4b90b724afd046680cf486f847fa522ca42e99ea6c095a988e813dc190dfc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page