Skip to main content

Professional chemistry reaction extraction using fine-tuned LLMs

Project description

RxNExtract

A professional-grade system for extracting chemical reaction information from procedure texts using fine-tuned LLM with Dynamic prompting and self grounding.

PyPI version License: MIT Python 3.8+ Docker HuggingFace

✨ Key Features

  • Advanced AI: Fine-tuned LLM with dynamic prompting and self-grounding
  • Modular Architecture: Clean, maintainable codebase with separation of concerns
  • Multiple Interfaces: CLI, interactive mode, batch processing, and programmatic API
  • Memory Efficient: 4-bit quantization support for deployment on various hardware
  • Comprehensive Analysis: Error analysis, ablation studies, statistical testing, and uncertainty quantification
  • Easy Installation: One-command installation via PyPI, Conda, or Docker

🚀 Quick Start

30-Second Demo

# Install
pip install rxnextract

# Use
python -c "
from chemistry_llm import ChemistryReactionExtractor
extractor = ChemistryReactionExtractor.from_pretrained('chemplusx/rxnextract-complete')
procedure = 'Add 5g NaCl to 100mL water and stir for 30 minutes at room temperature.'
results = extractor.analyze_procedure(procedure)
print('Reactants:', results['extracted_data']['reactants'])
print('Conditions:', results['extracted_data']['conditions'])
"

Try Without Installation

Open in Colab Try on HuggingFace Spaces

📦 Installation Options

Option 1: PyPI (Recommended)

pip install rxnextract                # Basic installation
pip install rxnextract[gpu]           # GPU support
pip install rxnextract[full]          # All features

Option 2: Conda

conda install -c conda-forge rxnextract

Option 3: Docker

docker pull chemplusx/rxnextract:latest
docker run -it --gpus all chemplusx/rxnextract:latest

Option 4: From Source

git clone https://github.com/chemplusx/RxNExtract.git
cd RxNExtract
pip install -e .

🎯 Performance Highlights

Our complete framework achieves significant improvements over baseline methods:

Metric Baseline RxNExtract Improvement
Complete Reaction Accuracy 23.4% 52.1% +122.6%
Entity F1 Score 0.674 0.856 +27.0%
Role Classification Accuracy 68.2% 85.9% +25.9%
Condition F1 Score 0.421 0.689 +63.7%

Error Reduction: 47.8-55.2% across all major error categories Statistical Significance: McNemar's χ² = 134.67 (p < 0.001), Cohen's d = 0.82

📚 Documentation

Document Description
Installation & Setup Guide Detailed installation instructions, system requirements, and configuration
Usage Guide & Examples Comprehensive usage examples, API reference, and advanced features
Analysis & Evaluation Complete analysis framework, metrics, and research reproducibility
Changelog Version history and release notes

🔬 Research Applications

Perfect for:

  • Chemical Literature Mining: Extract structured reaction data from papers
  • Procedure Standardization: Convert natural language to structured formats
  • Database Curation: Automated reaction database construction
  • Educational Tools: Teaching reaction analysis and extraction
  • Research Reproducibility: Systematic evaluation of extraction methods

🤝 Community & Support

Getting Help

For Experimental Chemists

  • 🎯 One-click installations via PyPI and Conda
  • 🐳 Docker containers for consistent environments
  • 📖 User-friendly tutorials and examples
  • 🎓 Video tutorials and webinars

For Developers

  • 🔧 Extensive API documentation
  • 🧪 Comprehensive test suite
  • 🏗️ Modular architecture for easy extension
  • 📋 Contributing guidelines and code standards

🔑 Quick Examples

Basic Usage

from chemistry_llm import ChemistryReactionExtractor

# Initialize extractor
extractor = ChemistryReactionExtractor.from_pretrained("chemplusx/rxnextract-complete")

# Analyze procedure
procedure = """
Dissolve 5.0 g of benzoic acid in 100 mL of hot water.
Add 10 mL of concentrated HCl and cool the solution.
Filter the precipitated product and wash with cold water.
"""

results = extractor.analyze_procedure(procedure)
print(results['extracted_data'])

Command Line Interface

# Interactive mode
rxnextract --interactive

# Batch processing
rxnextract --input procedures.txt --output results.json

# Single procedure
rxnextract --procedure "Add 2g NaCl to 50mL water"

Analysis & Research

from chemistry_llm.analysis import ErrorAnalyzer, AblationStudy

# Error analysis
analyzer = ErrorAnalyzer()
error_results = analyzer.analyze_prediction_errors(predictions, ground_truth)

# Ablation study
ablation = AblationStudy(model_path="./model")
study_results = ablation.run_complete_study(test_data, ground_truth)

🏗️ System Requirements

Component Minimum Recommended
Python 3.8+ 3.9+
RAM 8GB 16GB+
GPU Memory 4GB 12GB+
Storage 20GB 50GB+
CPU 4 cores 8+ cores

Note: Requirements are for inference only. Fine-tuning requires additional resources.

📊 Data and Software Availability

Code Repository: All code used in this study is available under the MIT License at https://github.com/chemplusx/RxNExtract. The MIT License permits unrestricted use, modification, and distribution, making it suitable for both academic research and commercial applications.

Pre-trained Models:

Package Distribution:

  • PyPI: pip install rxnextract
  • Conda-Forge: conda install -c conda-forge rxnextract
  • Docker Hub: docker pull chemplusx/rxnextract:latest

Datasets: Training and evaluation datasets are available at Zenodo DOI: 10.5281/zenodo.XXXXXX

Reproducibility: Complete analysis scripts and configuration files are provided to reproduce all results presented in the paper.

🤝 Contributing

We welcome contributions from the community! Whether you're fixing bugs, adding features, improving documentation, or sharing use cases, your help makes RxNExtract better.

Quick Contributing Guide

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes and add tests
  4. Ensure all tests pass (python -m pytest)
  5. Submit a pull request

See our Contributing Guidelines for detailed instructions.

📄 License & Citation

License: This project is licensed under the MIT License - see the LICENSE file for complete terms.

Citation: If you use RxNExtract in your research, please cite our paper:

@article{rxnextract2025,
  title={RxNExtract: A Professional-Grade System for Chemical Reaction Extraction using Fine-tuned LLMs},
  author={[Your Authors]},
  journal={[Journal Name]},
  year={2025},
  doi={[DOI]}
}

🔗 Links


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rxnextract-1.2.1.tar.gz (48.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rxnextract-1.2.1-py3-none-any.whl (50.0 kB view details)

Uploaded Python 3

File details

Details for the file rxnextract-1.2.1.tar.gz.

File metadata

  • Download URL: rxnextract-1.2.1.tar.gz
  • Upload date:
  • Size: 48.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for rxnextract-1.2.1.tar.gz
Algorithm Hash digest
SHA256 f4132d887e8e0d697eae3a8a3cd3132e09d5b052bfc5acd5bf78b9cf22315f33
MD5 bdb8c4d3c164a565976afc4fe9124d84
BLAKE2b-256 b1e77c534448fd220564571171999127b3991823aa0bcc0b37b352a898612ec1

See more details on using hashes here.

File details

Details for the file rxnextract-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: rxnextract-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 50.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for rxnextract-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 78be64177fd1234e1ac55f0fb941ac60f2fae7d87b3b3dfde9926a62175112cd
MD5 1a54abc8ae3b67457b89cd63ca8aa56c
BLAKE2b-256 c0de47dcc54a1b0e154750a451ec91acab9f8f7ab6cf1841dd571f58a797c89e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page