Deep learning framework for predicting enzyme-catalyzed reactions from protein sequences
Project description
RXNRECer
RXNRECer v1.2.0 is a deep learning framework for predicting enzyme-catalyzed reactions from protein sequences. It is the official implementation of "RXNRECer: Active Learning with Protein Language Models for Fine-Grained Enzyme Reaction Prediction."
๐ Features
- Multi-Stage Prediction: S1 (reaction prediction), S2 (reaction integration), S3 (LLM reasoning)
- Protein Sequence Analysis: Process protein sequences in FASTA format
- Deep Learning Models: ESM-2 embeddings with advanced neural architectures
- GPU Acceleration: CUDA support for faster inference
- Easy-to-use CLI: Simple command-line interface with comprehensive options
๐ Requirements
- Python 3.10+
- PyTorch 2.0+
- CUDA 11.0+ (recommended)
- 32GB+ RAM
- 40GB+ disk space
๐ Quick Start
1. Install
# Install from PyPI (recommended)
pip install rxnrecer
# Or install from GitHub
pip install git+https://github.com/kingstdio/RXNRECer.git
2. Download Data
# Download required data and model files (~20.5GB total)
rxnrecer-download-data
# Or download separately
rxnrecer-download-data --data-only # ~8.6GB
rxnrecer-download-data --models-only # ~11.9GB
3. Run Prediction
# Basic prediction (S1 mode)
rxnrecer -i input.fasta -o output.tsv -m s1
# Detailed prediction (S2 mode)
rxnrecer -i input.fasta -o output.tsv -m s2
# LLM reasoning (S3 mode, requires API key)
rxnrecer -i input.fasta -o output.json -m s3 -f json
๐ง Usage
Command Line Options
rxnrecer [OPTIONS]
Options:
-i, --input_fasta Input FASTA file path (required)
-o, --output_file Output file path
-f, --format Output format: tsv or json (default: tsv)
-m, --mode Prediction mode: s1, s2, or s3 (default: s1)
-b, --batch_size Batch size for processing (default: 100)
-c, --cache Enable caching (default: enabled)
-v, --version Show version
Examples
# Basic usage
rxnrecer -i proteins.fasta -o results.tsv
# Custom batch size
rxnrecer -i proteins.fasta -o results.tsv -b 50
# JSON output
rxnrecer -i proteins.fasta -o results.json -f json
# Disable cache (by default, caching is enabled)
rxnrecer -i proteins.fasta -o results.tsv
Input Format
FASTA file with protein sequences:
>P12345|Sample protein 1
MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG
>P67890|Sample protein 2
MKLIVWALLLLAAWAVERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG
Output Formats
TSV Output (S1/S2):
input_id RXNRECer RXNRECer_with_prob rxn_details
P12345 RHEA:24076;RHEA:14709 0.9999;0.9999 [reaction details]
JSON Output (S3):
[
{
"reaction_id": "RHEA:24076",
"prediction_confidence": 0.9999,
"reaction_details": {...}
}
]
๐ Features
Smart Caching
Results are automatically cached for faster repeated predictions:
# Check cache status
rxnrecer-cache status
# Clear cache
rxnrecer-cache clear --all
Data Management
Easy data and model file management:
# Download data
rxnrecer-download-data
# Force re-download
rxnrecer-download-data --force
๐ Project Structure
rxnrecer/ # Main Python package
โโโ cli/ # Command-line interface
โโโ config/ # Configuration
โโโ lib/ # Core libraries
โ โโโ datasource/ # Data source handling
โ โโโ embedding/ # Protein embeddings
โ โโโ llm/ # Language model integration
โ โโโ ml/ # Machine learning utilities
โ โโโ model/ # Model architectures
โ โโโ rxn/ # Reaction processing
โ โโโ smi/ # SMILES handling
โโโ models/ # Neural network models
โโโ utils/ # Utility functions
data/ # Data files (download required)
โโโ chebi/ # ChEBI database
โโโ cpd_svg/ # Compound SVG files
โโโ datasets/ # Training datasets
โโโ dict/ # Dictionary files
โโโ feature_bank/ # Feature bank
โโโ rhea/ # RHEA database
โโโ rxn_json/ # Reaction JSON files
โโโ sample/ # Sample data
โโโ uniprot/ # UniProt database
ckpt/ # Model checkpoints (download required)
โโโ prostt5/ # ProSTT5 model files
โโโ rxnrecer/ # RXNRECer model files
results/ # Output results
โโโ cache/ # Prediction cache
โโโ logs/ # Log files
โโโ predictions/ # Prediction outputs
โโโ sample/ # Sample results
docs/ # Documentation
scripts/ # Build and utility scripts
๐ง Configuration
For S3 mode (LLM reasoning), set your API key:
export LLM_API_KEY="your_api_key_here"
export LLM_API_URL="https://openrouter.ai/api/v1"
๐ Documentation
- Installation Guide - Detailed setup instructions
- Release Notes - Version information
๐ค Contributing
- Fork the repository
- Create a feature branch
- Commit your changes
- Open a Pull Request
๐ License
MIT License - see LICENSE file for details.
๐ Contact
- Author: Zhenkun Shi
- Email: zhenkun.shi@tib.cas.cn
- Project: https://github.com/kingstdio/RXNRECer
- PyPI: https://pypi.org/project/rxnrecer/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rxnrecer-1.2.0.tar.gz.
File metadata
- Download URL: rxnrecer-1.2.0.tar.gz
- Upload date:
- Size: 74.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a7942ee813ba489273655f766bf101af3a427827645ab7e6cc857747217512c
|
|
| MD5 |
4b8fa211bc0c353cfb1a751901b6ed22
|
|
| BLAKE2b-256 |
58e8027a3c9b21d08acb204e7862cb833d1cca8321f5cafd287ff50466994091
|
File details
Details for the file rxnrecer-1.2.0-py3-none-any.whl.
File metadata
- Download URL: rxnrecer-1.2.0-py3-none-any.whl
- Upload date:
- Size: 73.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97133cc47adee18acf32be52d739992f15502ac19bc81baa75bba7f05770a9c7
|
|
| MD5 |
d99bd90e673d4952ea396a66d402c9c1
|
|
| BLAKE2b-256 |
a3aba00085647373af9dc01b8492c14d1d76440d5ab30031fd00508974858936
|