Skip to main content

Professional chemistry reaction extraction using fine-tuned LLMs

Project description

RxNExtract

A professional-grade system for extracting chemical reaction information from procedure texts using fine-tuned LLM with Dynamic prompting and self grounding.

PyPI version License: MIT Python 3.8+ Docker HuggingFace

✨ Key Features

  • Advanced AI: Fine-tuned LLM with dynamic prompting and self-grounding
  • Modular Architecture: Clean, maintainable codebase with separation of concerns
  • Multiple Interfaces: CLI, interactive mode, batch processing, and programmatic API
  • Memory Efficient: 4-bit quantization support for deployment on various hardware
  • Comprehensive Analysis: Error analysis, ablation studies, statistical testing, and uncertainty quantification
  • Easy Installation: One-command installation via PyPI, Conda, or Docker

🚀 Quick Start

30-Second Demo

# Install
pip install rxnextract

# Use
python -c "
from chemistry_llm import ChemistryReactionExtractor
extractor = ChemistryReactionExtractor.from_pretrained('chemplusx/rxnextract-complete')
procedure = 'Add 5g NaCl to 100mL water and stir for 30 minutes at room temperature.'
results = extractor.analyze_procedure(procedure)
print('Reactants:', results['extracted_data']['reactants'])
print('Conditions:', results['extracted_data']['conditions'])
"

Try Without Installation

Open in Colab Try on HuggingFace Spaces

📦 Installation Options

Option 1: PyPI (Recommended)

pip install rxnextract                # Basic installation
pip install rxnextract[gpu]           # GPU support
pip install rxnextract[full]          # All features

Option 2: Conda

conda install -c conda-forge rxnextract

Option 3: Docker

docker pull chemplusx/rxnextract:latest
docker run -it --gpus all chemplusx/rxnextract:latest

Option 4: From Source

git clone https://github.com/chemplusx/RxNExtract.git
cd RxNExtract
pip install -e .

🎯 Performance Highlights

Our complete framework achieves significant improvements over baseline methods:

Metric Baseline RxNExtract Improvement
Complete Reaction Accuracy 23.4% 52.1% +122.6%
Entity F1 Score 0.674 0.856 +27.0%
Role Classification Accuracy 68.2% 85.9% +25.9%
Condition F1 Score 0.421 0.689 +63.7%

Error Reduction: 47.8-55.2% across all major error categories Statistical Significance: McNemar's χ² = 134.67 (p < 0.001), Cohen's d = 0.82

📚 Documentation

Document Description
Installation & Setup Guide Detailed installation instructions, system requirements, and configuration
Usage Guide & Examples Comprehensive usage examples, API reference, and advanced features
Analysis & Evaluation Complete analysis framework, metrics, and research reproducibility
Changelog Version history and release notes

🔬 Research Applications

Perfect for:

  • Chemical Literature Mining: Extract structured reaction data from papers
  • Procedure Standardization: Convert natural language to structured formats
  • Database Curation: Automated reaction database construction
  • Educational Tools: Teaching reaction analysis and extraction
  • Research Reproducibility: Systematic evaluation of extraction methods

🤝 Community & Support

Getting Help

For Experimental Chemists

  • 🎯 One-click installations via PyPI and Conda
  • 🐳 Docker containers for consistent environments
  • 📖 User-friendly tutorials and examples
  • 🎓 Video tutorials and webinars

For Developers

  • 🔧 Extensive API documentation
  • 🧪 Comprehensive test suite
  • 🏗️ Modular architecture for easy extension
  • 📋 Contributing guidelines and code standards

🔑 Quick Examples

Basic Usage

from chemistry_llm import ChemistryReactionExtractor

# Initialize extractor
extractor = ChemistryReactionExtractor.from_pretrained("chemplusx/rxnextract-complete")

# Analyze procedure
procedure = """
Dissolve 5.0 g of benzoic acid in 100 mL of hot water.
Add 10 mL of concentrated HCl and cool the solution.
Filter the precipitated product and wash with cold water.
"""

results = extractor.analyze_procedure(procedure)
print(results['extracted_data'])

Command Line Interface

# Interactive mode
rxnextract --interactive

# Batch processing
rxnextract --input procedures.txt --output results.json

# Single procedure
rxnextract --procedure "Add 2g NaCl to 50mL water"

Analysis & Research

from chemistry_llm.analysis import ErrorAnalyzer, AblationStudy

# Error analysis
analyzer = ErrorAnalyzer()
error_results = analyzer.analyze_prediction_errors(predictions, ground_truth)

# Ablation study
ablation = AblationStudy(model_path="./model")
study_results = ablation.run_complete_study(test_data, ground_truth)

🏗️ System Requirements

Component Minimum Recommended
Python 3.8+ 3.9+
RAM 8GB 16GB+
GPU Memory 4GB 12GB+
Storage 20GB 50GB+
CPU 4 cores 8+ cores

Note: Requirements are for inference only. Fine-tuning requires additional resources.

📊 Data and Software Availability

Code Repository: All code used in this study is available under the MIT License at https://github.com/chemplusx/RxNExtract. The MIT License permits unrestricted use, modification, and distribution, making it suitable for both academic research and commercial applications.

Pre-trained Models:

Package Distribution:

  • PyPI: pip install rxnextract
  • Conda-Forge: conda install -c conda-forge rxnextract
  • Docker Hub: docker pull chemplusx/rxnextract:latest

Datasets: Training and evaluation datasets are available at Zenodo DOI: 10.5281/zenodo.XXXXXX

Reproducibility: Complete analysis scripts and configuration files are provided to reproduce all results presented in the paper.

🤝 Contributing

We welcome contributions from the community! Whether you're fixing bugs, adding features, improving documentation, or sharing use cases, your help makes RxNExtract better.

Quick Contributing Guide

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes and add tests
  4. Ensure all tests pass (python -m pytest)
  5. Submit a pull request

See our Contributing Guidelines for detailed instructions.

📄 License & Citation

License: This project is licensed under the MIT License - see the LICENSE file for complete terms.

Citation: If you use RxNExtract in your research, please cite our paper:

@article{rxnextract2025,
  title={RxNExtract: A Professional-Grade System for Chemical Reaction Extraction using Fine-tuned LLMs},
  author={[Your Authors]},
  journal={[Journal Name]},
  year={2025},
  doi={[DOI]}
}

🔗 Links


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rxnextract-1.2.2.tar.gz (48.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rxnextract-1.2.2-py3-none-any.whl (50.0 kB view details)

Uploaded Python 3

File details

Details for the file rxnextract-1.2.2.tar.gz.

File metadata

  • Download URL: rxnextract-1.2.2.tar.gz
  • Upload date:
  • Size: 48.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for rxnextract-1.2.2.tar.gz
Algorithm Hash digest
SHA256 71887f3c97287b0578019a0488ad3932e371a7079420d7c2f5a6b9803db7a4f4
MD5 ee4239f5ed684e73e49795b980b55696
BLAKE2b-256 fad74c6519ea34135e481fc74576a6146c844140ddaea5228f268d2096465a18

See more details on using hashes here.

File details

Details for the file rxnextract-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: rxnextract-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 50.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for rxnextract-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 98faa12ef77ce33b4644939388ff1cb6d8ea7f36df10b4d42a181ab246097b18
MD5 df5c0bfc44ba3f7a7101dda7da017f8b
BLAKE2b-256 0c5208f31de6939d96c7c2d5c836dc64e90a37ea397f594c3127aff74340ea8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page