Skip to main content

Deep learning framework for predicting enzyme-catalyzed reactions from protein sequences

Project description

RXNRECer

Python 3.10+ PyTorch License: MIT

RXNRECer v1.2.0 is a deep learning framework for predicting enzyme-catalyzed reactions from protein sequences. It is the official implementation of "RXNRECer: Active Learning with Protein Language Models for Fine-Grained Enzyme Reaction Prediction."

๐Ÿš€ Features

  • Multi-Stage Prediction: S1 (reaction prediction), S2 (reaction integration), S3 (LLM reasoning)
  • Protein Sequence Analysis: Process protein sequences in FASTA format
  • Deep Learning Models: ESM-2 embeddings with advanced neural architectures
  • GPU Acceleration: CUDA support for faster inference
  • Easy-to-use CLI: Simple command-line interface with comprehensive options

๐Ÿ“‹ Requirements

  • Python 3.10+
  • PyTorch 2.0+
  • CUDA 11.0+ (recommended)
  • 32GB+ RAM
  • 40GB+ disk space

๐Ÿš€ Quick Start

1. Install

# Install from PyPI (recommended)
pip install rxnrecer

# Or install from GitHub
pip install git+https://github.com/kingstdio/RXNRECer.git

2. Download Data

# Download required data and model files (~20.5GB total)
rxnrecer-download-data

# Or download separately
rxnrecer-download-data --data-only      # ~8.6GB
rxnrecer-download-data --models-only    # ~11.9GB

3. Run Prediction

# Basic prediction (S1 mode)
rxnrecer -i input.fasta -o output.tsv -m s1

# Detailed prediction (S2 mode)
rxnrecer -i input.fasta -o output.tsv -m s2

# LLM reasoning (S3 mode, requires API key)
rxnrecer -i input.fasta -o output.json -m s3 -f json

๐Ÿ”ง Usage

Command Line Options

rxnrecer [OPTIONS]

Options:
  -i, --input_fasta    Input FASTA file path (required)
  -o, --output_file    Output file path
  -f, --format         Output format: tsv or json (default: tsv)
  -m, --mode           Prediction mode: s1, s2, or s3 (default: s1)
  -b, --batch_size     Batch size for processing (default: 100)
  -c, --cache          Enable caching (default: enabled)
  -v, --version        Show version

Examples

# Basic usage
rxnrecer -i proteins.fasta -o results.tsv

# Custom batch size
rxnrecer -i proteins.fasta -o results.tsv -b 50

# JSON output
rxnrecer -i proteins.fasta -o results.json -f json

# Disable cache (by default, caching is enabled)
rxnrecer -i proteins.fasta -o results.tsv

Input Format

FASTA file with protein sequences:

>P12345|Sample protein 1
MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG
>P67890|Sample protein 2
MKLIVWALLLLAAWAVERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG

Output Formats

TSV Output (S1/S2):

input_id	RXNRECer	RXNRECer_with_prob	rxn_details
P12345	RHEA:24076;RHEA:14709	0.9999;0.9999	[reaction details]

JSON Output (S3):

[
  {
    "reaction_id": "RHEA:24076",
    "prediction_confidence": 0.9999,
    "reaction_details": {...}
  }
]

๐Ÿ†• Features

Smart Caching

Results are automatically cached for faster repeated predictions:

# Check cache status
rxnrecer-cache status

# Clear cache
rxnrecer-cache clear --all

Data Management

Easy data and model file management:

# Download data
rxnrecer-download-data

# Force re-download
rxnrecer-download-data --force

๐Ÿ“ Project Structure

rxnrecer/                    # Main Python package
โ”œโ”€โ”€ cli/                     # Command-line interface
โ”œโ”€โ”€ config/                  # Configuration
โ”œโ”€โ”€ lib/                     # Core libraries
โ”‚   โ”œโ”€โ”€ datasource/          # Data source handling
โ”‚   โ”œโ”€โ”€ embedding/           # Protein embeddings
โ”‚   โ”œโ”€โ”€ llm/                 # Language model integration
โ”‚   โ”œโ”€โ”€ ml/                  # Machine learning utilities
โ”‚   โ”œโ”€โ”€ model/               # Model architectures
โ”‚   โ”œโ”€โ”€ rxn/                 # Reaction processing
โ”‚   โ””โ”€โ”€ smi/                 # SMILES handling
โ”œโ”€โ”€ models/                  # Neural network models
โ””โ”€โ”€ utils/                   # Utility functions

data/                        # Data files (download required)
โ”œโ”€โ”€ chebi/                   # ChEBI database
โ”œโ”€โ”€ cpd_svg/                 # Compound SVG files
โ”œโ”€โ”€ datasets/                # Training datasets
โ”œโ”€โ”€ dict/                    # Dictionary files
โ”œโ”€โ”€ feature_bank/            # Feature bank
โ”œโ”€โ”€ rhea/                    # RHEA database
โ”œโ”€โ”€ rxn_json/                # Reaction JSON files
โ”œโ”€โ”€ sample/                  # Sample data
โ””โ”€โ”€ uniprot/                 # UniProt database

ckpt/                        # Model checkpoints (download required)
โ”œโ”€โ”€ prostt5/                 # ProSTT5 model files
โ””โ”€โ”€ rxnrecer/                # RXNRECer model files

results/                     # Output results
โ”œโ”€โ”€ cache/                   # Prediction cache
โ”œโ”€โ”€ logs/                    # Log files
โ”œโ”€โ”€ predictions/             # Prediction outputs
โ””โ”€โ”€ sample/                  # Sample results

docs/                        # Documentation
scripts/                     # Build and utility scripts

๐Ÿ”ง Configuration

For S3 mode (LLM reasoning), set your API key:

export LLM_API_KEY="your_api_key_here"
export LLM_API_URL="https://openrouter.ai/api/v1"

๐Ÿ“š Documentation

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Open a Pull Request

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ“ž Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rxnrecer-1.2.0.tar.gz (74.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rxnrecer-1.2.0-py3-none-any.whl (73.6 kB view details)

Uploaded Python 3

File details

Details for the file rxnrecer-1.2.0.tar.gz.

File metadata

  • Download URL: rxnrecer-1.2.0.tar.gz
  • Upload date:
  • Size: 74.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for rxnrecer-1.2.0.tar.gz
Algorithm Hash digest
SHA256 5a7942ee813ba489273655f766bf101af3a427827645ab7e6cc857747217512c
MD5 4b8fa211bc0c353cfb1a751901b6ed22
BLAKE2b-256 58e8027a3c9b21d08acb204e7862cb833d1cca8321f5cafd287ff50466994091

See more details on using hashes here.

File details

Details for the file rxnrecer-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: rxnrecer-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 73.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for rxnrecer-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 97133cc47adee18acf32be52d739992f15502ac19bc81baa75bba7f05770a9c7
MD5 d99bd90e673d4952ea396a66d402c9c1
BLAKE2b-256 a3aba00085647373af9dc01b8492c14d1d76440d5ab30031fd00508974858936

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page