Skip to main content

Deep learning framework for predicting enzyme-catalyzed reactions from protein sequences

Project description

RXNRECer

Python 3.10+ PyTorch License: MIT PyPI

RXNRECer v1.3.6 is a deep learning framework for predicting enzyme-catalyzed reactions from protein sequences. It is the official implementation of "RXNRECer: Active Learning with Protein Language Models for Fine-Grained Enzyme Reaction Prediction."

๐ŸŽ‰ Now available on PyPI for easy installation!

๐Ÿš€ Features

  • Multi-Stage Prediction: S1 (reaction prediction), S2 (reaction integration), S3 (LLM reasoning)
  • Protein Sequence Analysis: Process protein sequences in FASTA format
  • Deep Learning Models: ESM-2 embeddings with advanced neural architectures
  • GPU Acceleration: CUDA support for faster inference
  • Easy-to-use CLI: Simple command-line interface with comprehensive options
  • Smart Caching: Automatic result caching for faster repeated predictions

๐Ÿ“‹ Requirements

  • Python 3.10+
  • PyTorch 2.0+
  • CUDA 11.0+ (recommended)
  • 32GB+ RAM
  • 40GB+ disk space

๐Ÿš€ Quick Start

1. Install (Recommended)

# Install from PyPI (recommended)
pip install rxnrecer

# Or install from GitHub
pip install git+https://github.com/kingstdio/RXNRECer.git

2. Download Data

# Download required data and model files (~35.8GB total)
rxnrecer-download-data

# Or download separately
rxnrecer-download-data --data-only      # ~8.8GB
rxnrecer-download-data --models-only    # ~14GB
rxnrecer-download-data --extools-only   # ~13GB

3. Run Prediction

# Basic prediction (S1 mode)
rxnrecer -i input.fasta -o output.tsv -m s1

# Detailed prediction (S2 mode)
rxnrecer -i input.fasta -o output.tsv -m s2

# LLM reasoning (S3 mode, requires API key)
rxnrecer -i input.fasta -o output.json -m s3 -f json

๐Ÿ”ง Usage

Command Line Options

rxnrecer [OPTIONS]

Options:
  -i, --input_fasta    Input FASTA file path (required)
  -o, --output_file    Output file path
  -f, --format         Output format: tsv or json (default: tsv)
  -m, --mode           Prediction mode: s1, s2, or s3 (default: s1)
  -b, --batch_size     Batch size for processing (default: 100)
  -v, --version        Show version

Examples

# Basic usage
rxnrecer -i proteins.fasta -o results.tsv

# Custom batch size
rxnrecer -i proteins.fasta -o results.tsv -b 50

# JSON output
rxnrecer -i proteins.fasta -o results.json -f json

# Use default output path
rxnrecer -i proteins.fasta -m s1

Input Format

FASTA file with protein sequences:

>P12345|Sample protein 1
MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG
>P67890|Sample protein 2
MKLIVWALLLLAAWAVERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG

Output Formats

TSV Output (S1/S2):

input_id	RXNRECer	RXNRECer_with_prob	rxn_details
P12345	RHEA:24076;RHEA:14709	0.9999;0.9999	[reaction details]

JSON Output (S3):

[
  {
    "reaction_id": "RHEA:24076",
    "prediction_confidence": 0.9999,
    "reaction_details": {...}
  }
]

๐Ÿ†• Advanced Features

Smart Caching

Results are automatically cached for faster repeated predictions:

# Check cache status
rxnrecer-cache status

# Clear cache
rxnrecer-cache clear --all

Data Management

Easy data and model file management:

# Download data
rxnrecer-download-data

# Force re-download
rxnrecer-download-data --force

๐Ÿ“ Project Structure

RXNRECer/                               # Project root (release)
โ”œโ”€โ”€ .github/                            # CI/CD workflows
โ”‚   โ””โ”€โ”€ workflows/
โ”œโ”€โ”€ rxnrecer/                           # Main Python package
โ”‚   โ”œโ”€โ”€ cli/                            # Command-line interface
โ”‚   โ”œโ”€โ”€ config/                         # Configuration
โ”‚   โ”œโ”€โ”€ lib/                            # Core libraries
โ”‚   โ”‚   โ”œโ”€โ”€ datasource/                 # Data source handling
โ”‚   โ”‚   โ”œโ”€โ”€ embedding/                  # Protein embeddings
โ”‚   โ”‚   โ”œโ”€โ”€ evaluation/                 # Evaluation helpers
โ”‚   โ”‚   โ”œโ”€โ”€ llm/                        # Language model integration
โ”‚   โ”‚   โ”œโ”€โ”€ ml/                         # Machine learning utilities
โ”‚   โ”‚   โ”œโ”€โ”€ model/                      # Model architectures
โ”‚   โ”‚   โ”œโ”€โ”€ rxn/                        # Reaction processing
โ”‚   โ”‚   โ””โ”€โ”€ smi/                        # SMILES handling
โ”‚   โ”œโ”€โ”€ models/                         # Model wrappers
โ”‚   โ””โ”€โ”€ utils/                          # Utility functions
โ”‚
โ”œโ”€โ”€ extools/                            # External tools (downloaded)
โ”‚   โ”œโ”€โ”€ ec/                             # EC-related resources
โ”‚   โ””โ”€โ”€ msa/                            # MSA binaries (e.g., diamond)
โ”‚
โ”œโ”€โ”€ data/                               # Data files (download required)
โ”‚   โ”œโ”€โ”€ chebi/                          # ChEBI database
โ”‚   โ”œโ”€โ”€ cpd_svg/                        # Compound SVG files
โ”‚   โ”œโ”€โ”€ datasets/                       # Training datasets
โ”‚   โ”œโ”€โ”€ dict/                           # Dictionary files
โ”‚   โ”œโ”€โ”€ feature_bank/                   # Feature bank
โ”‚   โ”œโ”€โ”€ rhea/                           # RHEA database
โ”‚   โ”œโ”€โ”€ rxn_json/                       # Reaction JSON files
โ”‚   โ”œโ”€โ”€ sample/                         # Sample data
โ”‚   โ””โ”€โ”€ uniprot/                        # UniProt database
โ”‚
โ”œโ”€โ”€ ckpt/                              # Model checkpoints (download required)
โ”‚   โ”œโ”€โ”€ esm/                           # ESM models
โ”‚   โ”œโ”€โ”€ prostt5/                       # ProSTT5 models
โ”‚   โ””โ”€โ”€ rxnrecer/                      # RXNRECer model files
โ”‚
โ”œโ”€โ”€ results/                            # Output results
โ”‚   โ”œโ”€โ”€ cache/                          # Prediction cache
โ”‚   โ”œโ”€โ”€ logs/                           # Log files
โ”‚   โ”œโ”€โ”€ predictions/                    # Prediction outputs
โ”‚   โ””โ”€โ”€ sample/                         # Sample results
โ”‚
โ”œโ”€โ”€ docs/                               # Documentation
โ”œโ”€โ”€ scripts/                            # Build and utility scripts
โ”œโ”€โ”€ MANIFEST.in                         # Package data manifest
โ”œโ”€โ”€ pyproject.toml                      # Build and dependencies for PyPI
โ”œโ”€โ”€ environment_rxnrecer-release.yml    # Conda environment
โ”œโ”€โ”€ LICENSE                             # MIT License
โ”œโ”€โ”€ README.md                           # This file
โ””โ”€โ”€ .gitignore                          # Git ignore rules

๐Ÿ”ง Configuration

For S3 mode (LLM reasoning), set your API key:

export LLM_API_KEY="your_api_key_here"
export LLM_API_URL="your_api_url_here"

Examples:

# OpenRouter
export LLM_API_KEY="sk-or-v1-your_openrouter_key_here"
export LLM_API_URL="https://openrouter.ai/api/v1"

# OpenAI
export LLM_API_KEY="sk-your_openai_key_here"
export LLM_API_URL="https://api.openai.com/v1"

# Anthropic
export LLM_API_KEY="sk-ant-your_anthropic_key_here"
export LLM_API_URL="https://api.anthropic.com"

Jupyter Notebook Setup

import os
from rxnrecer.config import config as cfg

# Set your API credentials
cfg.LLM_API_KEY = "your_api_key_here"
cfg.LLM_API_URL = "your_api_url_here"

๐Ÿ“ฆ Installation Options

PyPI Installation (Recommended)

pip install rxnrecer

GitHub Installation (Latest)

pip install git+https://github.com/kingstdio/RXNRECer.git
  • ๐Ÿ”ง Development: Latest development version
  • ๐Ÿ”ง Custom: For advanced users

๐Ÿ“š Documentation

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Open a Pull Request

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ“ž Contact


๐ŸŽฏ Get started now with: pip install rxnrecer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rxnrecer-1.3.7.tar.gz (71.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rxnrecer-1.3.7-py3-none-any.whl (79.1 kB view details)

Uploaded Python 3

File details

Details for the file rxnrecer-1.3.7.tar.gz.

File metadata

  • Download URL: rxnrecer-1.3.7.tar.gz
  • Upload date:
  • Size: 71.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for rxnrecer-1.3.7.tar.gz
Algorithm Hash digest
SHA256 dc37a8512cd38158d537a2c56d6e152cb8e0f00a654ce4bed6b6a4e76311aedd
MD5 b15cbeb31e7da907d95a87f35618358e
BLAKE2b-256 45c185fccfbd8188d620a3849094a3da8b4cc960bd0673de3188fe733730313c

See more details on using hashes here.

File details

Details for the file rxnrecer-1.3.7-py3-none-any.whl.

File metadata

  • Download URL: rxnrecer-1.3.7-py3-none-any.whl
  • Upload date:
  • Size: 79.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for rxnrecer-1.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 4bb3073e3c7a320b821a0dd55cd0cfa80446fb77dd4d5934b4c2a59e426a87a7
MD5 40ff724acae71abbd58e44a345aa049e
BLAKE2b-256 abcda5bbe8e34688cdc7ac9e52be984e78d5f0089911a5dfd3cd68684580d906

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page