Skip to main content

Professional chemistry reaction extraction using fine-tuned LLMs

Project description

RxNExtract

A professional-grade system for extracting chemical reaction information from procedure texts using fine-tuned LLM with Dynamic prompting and self grounding.

PyPI version License: MIT Python 3.8+ Docker HuggingFace

✨ Key Features

  • Advanced AI: Fine-tuned LLM with dynamic prompting and self-grounding
  • Modular Architecture: Clean, maintainable codebase with separation of concerns
  • Multiple Interfaces: CLI, interactive mode, batch processing, and programmatic API
  • Memory Efficient: 4-bit quantization support for deployment on various hardware
  • Comprehensive Analysis: Error analysis, ablation studies, statistical testing, and uncertainty quantification
  • Easy Installation: One-command installation via PyPI, Conda, or Docker

🚀 Quick Start

30-Second Demo

# Install
pip install rxnextract

# Use
python -c "
from chemistry_llm import ChemistryReactionExtractor
extractor = ChemistryReactionExtractor.from_pretrained('chemplusx/rxnextract-complete')
procedure = 'Add 5g NaCl to 100mL water and stir for 30 minutes at room temperature.'
results = extractor.analyze_procedure(procedure)
print('Reactants:', results['extracted_data']['reactants'])
print('Conditions:', results['extracted_data']['conditions'])
"

Try Without Installation

Open in Colab Try on HuggingFace Spaces

📦 Installation Options

Option 1: PyPI (Recommended)

pip install rxnextract                # Basic installation
pip install rxnextract[gpu]           # GPU support
pip install rxnextract[full]          # All features

Option 2: Conda

conda install -c conda-forge rxnextract

Option 3: Docker

docker pull chemplusx/rxnextract:latest
docker run -it --gpus all chemplusx/rxnextract:latest

Option 4: From Source

git clone https://github.com/chemplusx/RxNExtract.git
cd RxNExtract
pip install -e .

🎯 Performance Highlights

Our complete framework achieves significant improvements over baseline methods:

Metric Baseline RxNExtract Improvement
Complete Reaction Accuracy 23.4% 52.1% +122.6%
Entity F1 Score 0.674 0.856 +27.0%
Role Classification Accuracy 68.2% 85.9% +25.9%
Condition F1 Score 0.421 0.689 +63.7%

Error Reduction: 47.8-55.2% across all major error categories Statistical Significance: McNemar's χ² = 134.67 (p < 0.001), Cohen's d = 0.82

📚 Documentation

Document Description
Installation & Setup Guide Detailed installation instructions, system requirements, and configuration
Usage Guide & Examples Comprehensive usage examples, API reference, and advanced features
Analysis & Evaluation Complete analysis framework, metrics, and research reproducibility
Changelog Version history and release notes

🔬 Research Applications

Perfect for:

  • Chemical Literature Mining: Extract structured reaction data from papers
  • Procedure Standardization: Convert natural language to structured formats
  • Database Curation: Automated reaction database construction
  • Educational Tools: Teaching reaction analysis and extraction
  • Research Reproducibility: Systematic evaluation of extraction methods

🤝 Community & Support

Getting Help

For Experimental Chemists

  • 🎯 One-click installations via PyPI and Conda
  • 🐳 Docker containers for consistent environments
  • 📖 User-friendly tutorials and examples
  • 🎓 Video tutorials and webinars

For Developers

  • 🔧 Extensive API documentation
  • 🧪 Comprehensive test suite
  • 🏗️ Modular architecture for easy extension
  • 📋 Contributing guidelines and code standards

🔑 Quick Examples

Basic Usage

from chemistry_llm import ChemistryReactionExtractor

# Initialize extractor
extractor = ChemistryReactionExtractor.from_pretrained("chemplusx/rxnextract-complete")

# Analyze procedure
procedure = """
Dissolve 5.0 g of benzoic acid in 100 mL of hot water.
Add 10 mL of concentrated HCl and cool the solution.
Filter the precipitated product and wash with cold water.
"""

results = extractor.analyze_procedure(procedure)
print(results['extracted_data'])

Command Line Interface

# Interactive mode
rxnextract --interactive

# Batch processing
rxnextract --input procedures.txt --output results.json

# Single procedure
rxnextract --procedure "Add 2g NaCl to 50mL water"

Analysis & Research

from chemistry_llm.analysis import ErrorAnalyzer, AblationStudy

# Error analysis
analyzer = ErrorAnalyzer()
error_results = analyzer.analyze_prediction_errors(predictions, ground_truth)

# Ablation study
ablation = AblationStudy(model_path="./model")
study_results = ablation.run_complete_study(test_data, ground_truth)

🏗️ System Requirements

Component Minimum Recommended
Python 3.8+ 3.9+
RAM 8GB 16GB+
GPU Memory 4GB 12GB+
Storage 20GB 50GB+
CPU 4 cores 8+ cores

Note: Requirements are for inference only. Fine-tuning requires additional resources.

📊 Data and Software Availability

Code Repository: All code used in this study is available under the MIT License at https://github.com/chemplusx/RxNExtract. The MIT License permits unrestricted use, modification, and distribution, making it suitable for both academic research and commercial applications.

Pre-trained Models:

Package Distribution:

  • PyPI: pip install rxnextract
  • Conda-Forge: conda install -c conda-forge rxnextract
  • Docker Hub: docker pull chemplusx/rxnextract:latest

Datasets: Training and evaluation datasets are available at Zenodo DOI: 10.5281/zenodo.XXXXXX

Reproducibility: Complete analysis scripts and configuration files are provided to reproduce all results presented in the paper.

🤝 Contributing

We welcome contributions from the community! Whether you're fixing bugs, adding features, improving documentation, or sharing use cases, your help makes RxNExtract better.

Quick Contributing Guide

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes and add tests
  4. Ensure all tests pass (python -m pytest)
  5. Submit a pull request

See our Contributing Guidelines for detailed instructions.

📄 License & Citation

License: This project is licensed under the MIT License - see the LICENSE file for complete terms.

Citation: If you use RxNExtract in your research, please cite our paper:

@article{rxnextract2025,
  title={RxNExtract: A Professional-Grade System for Chemical Reaction Extraction using Fine-tuned LLMs},
  author={[Your Authors]},
  journal={[Journal Name]},
  year={2025},
  doi={[DOI]}
}

🔗 Links


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rxnextract-1.2.0.tar.gz (48.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rxnextract-1.2.0-py3-none-any.whl (49.9 kB view details)

Uploaded Python 3

File details

Details for the file rxnextract-1.2.0.tar.gz.

File metadata

  • Download URL: rxnextract-1.2.0.tar.gz
  • Upload date:
  • Size: 48.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for rxnextract-1.2.0.tar.gz
Algorithm Hash digest
SHA256 07440514612fbcd8a1bf94caabf7e54884305db425774f08b97c3f32eab9ec5c
MD5 827c17e6c13dddbbd0fa534e23fee7bc
BLAKE2b-256 cda4719dfe1f5e66d945a0ccd5c7d25f9f95f711ec06c02d93f24c66bd681630

See more details on using hashes here.

File details

Details for the file rxnextract-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: rxnextract-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 49.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for rxnextract-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 48a009f562026d7f2f81516c603420c3ec6a024741cbf78f3603b3add9659133
MD5 e25ef21fba3ce4cbc4315367cdb8de31
BLAKE2b-256 5c32ac0cb1699f445f85aabcfeab9bd45ed7b0d89b98bdb7252d53ec5a3f915b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page