Deep learning framework for predicting enzyme-catalyzed reactions from protein sequences
Project description
RXNRECer
RXNRECer v1.3.2 is a deep learning framework for predicting enzyme-catalyzed reactions from protein sequences. It is the official implementation of "RXNRECer: Active Learning with Protein Language Models for Fine-Grained Enzyme Reaction Prediction."
๐ Now available on PyPI for easy installation!
๐ Features
- Multi-Stage Prediction: S1 (reaction prediction), S2 (reaction integration), S3 (LLM reasoning)
- Protein Sequence Analysis: Process protein sequences in FASTA format
- Deep Learning Models: ESM-2 embeddings with advanced neural architectures
- GPU Acceleration: CUDA support for faster inference
- Easy-to-use CLI: Simple command-line interface with comprehensive options
- Smart Caching: Automatic result caching for faster repeated predictions
๐ Requirements
- Python 3.10+
- PyTorch 2.0+
- CUDA 11.0+ (recommended)
- 32GB+ RAM
- 40GB+ disk space
๐ Quick Start
1. Install (Recommended)
# Install from PyPI (recommended)
pip install rxnrecer
# Or install from GitHub
pip install git+https://github.com/kingstdio/RXNRECer.git
2. Download Data
# Download required data and model files (~35.8GB total)
rxnrecer-download-data
# Or download separately
rxnrecer-download-data --data-only # ~8.8GB
rxnrecer-download-data --models-only # ~14GB
rxnrecer-download-data --extools-only # ~13GB
3. Run Prediction
# Basic prediction (S1 mode)
rxnrecer -i input.fasta -o output.tsv -m s1
# Detailed prediction (S2 mode)
rxnrecer -i input.fasta -o output.tsv -m s2
# LLM reasoning (S3 mode, requires API key)
rxnrecer -i input.fasta -o output.json -m s3 -f json
๐ง Usage
Command Line Options
rxnrecer [OPTIONS]
Options:
-i, --input_fasta Input FASTA file path (required)
-o, --output_file Output file path
-f, --format Output format: tsv or json (default: tsv)
-m, --mode Prediction mode: s1, s2, or s3 (default: s1)
-b, --batch_size Batch size for processing (default: 100)
-v, --version Show version
Examples
# Basic usage
rxnrecer -i proteins.fasta -o results.tsv
# Custom batch size
rxnrecer -i proteins.fasta -o results.tsv -b 50
# JSON output
rxnrecer -i proteins.fasta -o results.json -f json
# Use default output path
rxnrecer -i proteins.fasta -m s1
Input Format
FASTA file with protein sequences:
>P12345|Sample protein 1
MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG
>P67890|Sample protein 2
MKLIVWALLLLAAWAVERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG
Output Formats
TSV Output (S1/S2):
input_id RXNRECer RXNRECer_with_prob rxn_details
P12345 RHEA:24076;RHEA:14709 0.9999;0.9999 [reaction details]
JSON Output (S3):
[
{
"reaction_id": "RHEA:24076",
"prediction_confidence": 0.9999,
"reaction_details": {...}
}
]
๐ Advanced Features
Smart Caching
Results are automatically cached for faster repeated predictions:
# Check cache status
rxnrecer-cache status
# Clear cache
rxnrecer-cache clear --all
Data Management
Easy data and model file management:
# Download data
rxnrecer-download-data
# Force re-download
rxnrecer-download-data --force
๐ Project Structure
RXNRECer/ # Project root (release)
โโโ .github/ # CI/CD workflows
โ โโโ workflows/
โโโ rxnrecer/ # Main Python package
โ โโโ cli/ # Command-line interface
โ โโโ config/ # Configuration
โ โโโ lib/ # Core libraries
โ โ โโโ datasource/ # Data source handling
โ โ โโโ embedding/ # Protein embeddings
โ โ โโโ evaluation/ # Evaluation helpers
โ โ โโโ llm/ # Language model integration
โ โ โโโ ml/ # Machine learning utilities
โ โ โโโ model/ # Model architectures
โ โ โโโ rxn/ # Reaction processing
โ โ โโโ smi/ # SMILES handling
โ โโโ models/ # Model wrappers
โ โโโ utils/ # Utility functions
โ
โโโ extools/ # External tools (downloaded)
โ โโโ ec/ # EC-related resources
โ โโโ msa/ # MSA binaries (e.g., diamond)
โ
โโโ data/ # Data files (download required)
โ โโโ chebi/ # ChEBI database
โ โโโ cpd_svg/ # Compound SVG files
โ โโโ datasets/ # Training datasets
โ โโโ dict/ # Dictionary files
โ โโโ feature_bank/ # Feature bank
โ โโโ rhea/ # RHEA database
โ โโโ rxn_json/ # Reaction JSON files
โ โโโ sample/ # Sample data
โ โโโ uniprot/ # UniProt database
โ
โโโ ckpt/ # Model checkpoints (download required)
โ โโโ esm/ # ESM models
โ โโโ prostt5/ # ProSTT5 models
โ โโโ rxnrecer/ # RXNRECer model files
โ
โโโ results/ # Output results
โ โโโ cache/ # Prediction cache
โ โโโ logs/ # Log files
โ โโโ predictions/ # Prediction outputs
โ โโโ sample/ # Sample results
โ
โโโ docs/ # Documentation
โโโ scripts/ # Build and utility scripts
โโโ MANIFEST.in # Package data manifest
โโโ pyproject.toml # Build and dependencies for PyPI
โโโ environment_rxnrecer-release.yml # Conda environment
โโโ LICENSE # MIT License
โโโ README.md # This file
โโโ .gitignore # Git ignore rules
๐ง Configuration
For S3 mode (LLM reasoning), set your API key:
export LLM_API_KEY="your_api_key_here"
export LLM_API_URL="your_api_url_here"
Examples:
# OpenRouter
export LLM_API_KEY="sk-or-v1-your_openrouter_key_here"
export LLM_API_URL="https://openrouter.ai/api/v1"
# OpenAI
export LLM_API_KEY="sk-your_openai_key_here"
export LLM_API_URL="https://api.openai.com/v1"
# Anthropic
export LLM_API_KEY="sk-ant-your_anthropic_key_here"
export LLM_API_URL="https://api.anthropic.com"
Jupyter Notebook Setup
import os
from rxnrecer.config import config as cfg
# Set your API credentials
cfg.LLM_API_KEY = "your_api_key_here"
cfg.LLM_API_URL = "your_api_url_here"
๐ฆ Installation Options
PyPI Installation (Recommended)
pip install rxnrecer
GitHub Installation (Latest)
pip install git+https://github.com/kingstdio/RXNRECer.git
- ๐ง Development: Latest development version
- ๐ง Custom: For advanced users
๐ Documentation
- Installation Guide - Detailed setup instructions
- Release Notes - Version information
๐ค Contributing
- Fork the repository
- Create a feature branch
- Commit your changes
- Open a Pull Request
๐ License
MIT License - see LICENSE file for details.
๐ Contact
- Author: Zhenkun Shi
- Email: zhenkun.shi@tib.cas.cn
- Project: https://github.com/kingstdio/RXNRECer
- PyPI: https://pypi.org/project/rxnrecer/
๐ฏ Get started now with: pip install rxnrecer
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rxnrecer-1.3.2.tar.gz.
File metadata
- Download URL: rxnrecer-1.3.2.tar.gz
- Upload date:
- Size: 69.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5813ea9c54dc13c309b28289ec99877a8fb3d3f74fdd3214860db3144370a9b6
|
|
| MD5 |
009a4408c8887cd15ccc2d8d88d23077
|
|
| BLAKE2b-256 |
c75e6d5bf2009f96b83c406ac0aea64210559007f7baf8ef12a40601060fa225
|
File details
Details for the file rxnrecer-1.3.2-py3-none-any.whl.
File metadata
- Download URL: rxnrecer-1.3.2-py3-none-any.whl
- Upload date:
- Size: 77.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3b18c0fe704434674b6b9b25559368ead383d19f55d751eb6d4e2a270243fd5
|
|
| MD5 |
9195e4f6b7c671db1d71bc79962c97e7
|
|
| BLAKE2b-256 |
ab7cd0e2c9c0593a78c8a2d7d48bce01f9839fd808b4b5a84238cf1961ac9658
|