Deep learning framework for predicting enzyme-catalyzed reactions from protein sequences

Project description

RXNRECer

RXNRECer v1.2.0 is a deep learning framework for predicting enzyme-catalyzed reactions from protein sequences. It is the official implementation of "RXNRECer: Active Learning with Protein Language Models for Fine-Grained Enzyme Reaction Prediction."

🚀 Features

Multi-Stage Prediction: S1 (reaction prediction), S2 (reaction integration), S3 (LLM reasoning)
Protein Sequence Analysis: Process protein sequences in FASTA format
Deep Learning Models: ESM-2 embeddings with advanced neural architectures
GPU Acceleration: CUDA support for faster inference
Easy-to-use CLI: Simple command-line interface with comprehensive options

📋 Requirements

Python 3.10+
PyTorch 2.0+
CUDA 11.0+ (recommended)
32GB+ RAM
40GB+ disk space

🚀 Quick Start

1. Install

# Install from PyPI (recommended)
pip install rxnrecer

# Or install from GitHub
pip install git+https://github.com/kingstdio/RXNRECer.git

2. Download Data

# Download required data and model files (~20.5GB total)
rxnrecer-download-data

# Or download separately
rxnrecer-download-data --data-only      # ~8.6GB
rxnrecer-download-data --models-only    # ~11.9GB

3. Run Prediction

# Basic prediction (S1 mode)
rxnrecer -i input.fasta -o output.tsv -m s1

# Detailed prediction (S2 mode)
rxnrecer -i input.fasta -o output.tsv -m s2

# LLM reasoning (S3 mode, requires API key)
rxnrecer -i input.fasta -o output.json -m s3 -f json

🔧 Usage

Command Line Options

rxnrecer [OPTIONS]

Options:
  -i, --input_fasta    Input FASTA file path (required)
  -o, --output_file    Output file path
  -f, --format         Output format: tsv or json (default: tsv)
  -m, --mode           Prediction mode: s1, s2, or s3 (default: s1)
  -b, --batch_size     Batch size for processing (default: 100)
  -c, --cache          Enable caching (default: enabled)
  -v, --version        Show version

Examples

# Basic usage
rxnrecer -i proteins.fasta -o results.tsv

# Custom batch size
rxnrecer -i proteins.fasta -o results.tsv -b 50

# JSON output
rxnrecer -i proteins.fasta -o results.json -f json

# Disable cache (by default, caching is enabled)
rxnrecer -i proteins.fasta -o results.tsv

Input Format

FASTA file with protein sequences:

>P12345|Sample protein 1
MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG
>P67890|Sample protein 2
MKLIVWALLLLAAWAVERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG

Output Formats

TSV Output (S1/S2):

input_id	RXNRECer	RXNRECer_with_prob	rxn_details
P12345	RHEA:24076;RHEA:14709	0.9999;0.9999	[reaction details]

JSON Output (S3):

[
  {
    "reaction_id": "RHEA:24076",
    "prediction_confidence": 0.9999,
    "reaction_details": {...}
  }
]

🆕 Features

Smart Caching

Results are automatically cached for faster repeated predictions:

# Check cache status
rxnrecer-cache status

# Clear cache
rxnrecer-cache clear --all

Data Management

Easy data and model file management:

# Download data
rxnrecer-download-data

# Force re-download
rxnrecer-download-data --force

📁 Project Structure

rxnrecer/                    # Main Python package
├── cli/                     # Command-line interface
├── config/                  # Configuration
├── lib/                     # Core libraries
│   ├── datasource/          # Data source handling
│   ├── embedding/           # Protein embeddings
│   ├── llm/                 # Language model integration
│   ├── ml/                  # Machine learning utilities
│   ├── model/               # Model architectures
│   ├── rxn/                 # Reaction processing
│   └── smi/                 # SMILES handling
├── models/                  # Neural network models
└── utils/                   # Utility functions

data/                        # Data files (download required)
├── chebi/                   # ChEBI database
├── cpd_svg/                 # Compound SVG files
├── datasets/                # Training datasets
├── dict/                    # Dictionary files
├── feature_bank/            # Feature bank
├── rhea/                    # RHEA database
├── rxn_json/                # Reaction JSON files
├── sample/                  # Sample data
└── uniprot/                 # UniProt database

ckpt/                        # Model checkpoints (download required)
├── prostt5/                 # ProSTT5 model files
└── rxnrecer/                # RXNRECer model files

results/                     # Output results
├── cache/                   # Prediction cache
├── logs/                    # Log files
├── predictions/             # Prediction outputs
└── sample/                  # Sample results

docs/                        # Documentation
scripts/                     # Build and utility scripts

🔧 Configuration

For S3 mode (LLM reasoning), set your API key:

export LLM_API_KEY="your_api_key_here"
export LLM_API_URL="https://openrouter.ai/api/v1"

📚 Documentation

Installation Guide - Detailed setup instructions
Release Notes - Version information

🤝 Contributing

Fork the repository
Create a feature branch
Commit your changes
Open a Pull Request

📄 License

MIT License - see LICENSE file for details.

📞 Contact

Author: Zhenkun Shi
Email: zhenkun.shi@tib.cas.cn
Project: https://github.com/kingstdio/RXNRECer
PyPI: https://pypi.org/project/rxnrecer/

Project details

Release history Release notifications | RSS feed

1.3.7

Nov 6, 2025

1.3.4

Sep 26, 2025

1.3.3

Sep 26, 2025

1.3.2

Sep 26, 2025

1.3.1

Sep 26, 2025

1.3.0

Sep 26, 2025

This version

1.2.0

Aug 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rxnrecer-1.2.0.tar.gz (74.6 kB view details)

Uploaded Aug 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rxnrecer-1.2.0-py3-none-any.whl (73.6 kB view details)

Uploaded Aug 29, 2025 Python 3

File details

Details for the file rxnrecer-1.2.0.tar.gz.

File metadata

Download URL: rxnrecer-1.2.0.tar.gz
Upload date: Aug 29, 2025
Size: 74.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for rxnrecer-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`5a7942ee813ba489273655f766bf101af3a427827645ab7e6cc857747217512c`
MD5	`4b8fa211bc0c353cfb1a751901b6ed22`
BLAKE2b-256	`58e8027a3c9b21d08acb204e7862cb833d1cca8321f5cafd287ff50466994091`

See more details on using hashes here.

File details

Details for the file rxnrecer-1.2.0-py3-none-any.whl.

File metadata

Download URL: rxnrecer-1.2.0-py3-none-any.whl
Upload date: Aug 29, 2025
Size: 73.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for rxnrecer-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`97133cc47adee18acf32be52d739992f15502ac19bc81baa75bba7f05770a9c7`
MD5	`d99bd90e673d4952ea396a66d402c9c1`
BLAKE2b-256	`a3aba00085647373af9dc01b8492c14d1d76440d5ab30031fd00508974858936`

See more details on using hashes here.

rxnrecer 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

RXNRECer

🚀 Features

📋 Requirements

🚀 Quick Start

1. Install

2. Download Data

3. Run Prediction

🔧 Usage

Command Line Options

Examples

Input Format

Output Formats

🆕 Features

Smart Caching

Data Management

📁 Project Structure

🔧 Configuration

📚 Documentation

🤝 Contributing

📄 License

📞 Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes