Professional chemistry reaction extraction using fine-tuned LLMs
Project description
RxNExtract
A professional-grade system for extracting chemical reaction information from procedure texts using fine-tuned LLM with Dynamic prompting and self grounding.
✨ Key Features
- Advanced AI: Fine-tuned LLM with dynamic prompting and self-grounding
- Modular Architecture: Clean, maintainable codebase with separation of concerns
- Multiple Interfaces: CLI, interactive mode, batch processing, and programmatic API
- Memory Efficient: 4-bit quantization support for deployment on various hardware
- Comprehensive Analysis: Error analysis, ablation studies, statistical testing, and uncertainty quantification
- Easy Installation: One-command installation via PyPI, Conda, or Docker
🚀 Quick Start
30-Second Demo
# Install
pip install rxnextract
# Use
python -c "
from chemistry_llm import ChemistryReactionExtractor
extractor = ChemistryReactionExtractor.from_pretrained('chemplusx/rxnextract-complete')
procedure = 'Add 5g NaCl to 100mL water and stir for 30 minutes at room temperature.'
results = extractor.analyze_procedure(procedure)
print('Reactants:', results['extracted_data']['reactants'])
print('Conditions:', results['extracted_data']['conditions'])
"
Try Without Installation
📦 Installation Options
Option 1: PyPI (Recommended)
pip install rxnextract # Basic installation
pip install rxnextract[gpu] # GPU support
pip install rxnextract[full] # All features
Option 2: Conda
conda install -c conda-forge rxnextract
Option 3: Docker
docker pull chemplusx/rxnextract:latest
docker run -it --gpus all chemplusx/rxnextract:latest
Option 4: From Source
git clone https://github.com/chemplusx/RxNExtract.git
cd RxNExtract
pip install -e .
🎯 Performance Highlights
Our complete framework achieves significant improvements over baseline methods:
| Metric | Baseline | RxNExtract | Improvement |
|---|---|---|---|
| Complete Reaction Accuracy | 23.4% | 52.1% | +122.6% |
| Entity F1 Score | 0.674 | 0.856 | +27.0% |
| Role Classification Accuracy | 68.2% | 85.9% | +25.9% |
| Condition F1 Score | 0.421 | 0.689 | +63.7% |
Error Reduction: 47.8-55.2% across all major error categories Statistical Significance: McNemar's χ² = 134.67 (p < 0.001), Cohen's d = 0.82
📚 Documentation
| Document | Description |
|---|---|
| Installation & Setup Guide | Detailed installation instructions, system requirements, and configuration |
| Usage Guide & Examples | Comprehensive usage examples, API reference, and advanced features |
| Analysis & Evaluation | Complete analysis framework, metrics, and research reproducibility |
| Changelog | Version history and release notes |
🔬 Research Applications
Perfect for:
- Chemical Literature Mining: Extract structured reaction data from papers
- Procedure Standardization: Convert natural language to structured formats
- Database Curation: Automated reaction database construction
- Educational Tools: Teaching reaction analysis and extraction
- Research Reproducibility: Systematic evaluation of extraction methods
🤝 Community & Support
Getting Help
- 📚 Documentation: docs.rxnextract.org
- 🐛 Bug Reports: GitHub Issues
- 💬 Discussions: GitHub Discussions
- 📧 Email: support@rxnextract.org
For Experimental Chemists
- 🎯 One-click installations via PyPI and Conda
- 🐳 Docker containers for consistent environments
- 📖 User-friendly tutorials and examples
- 🎓 Video tutorials and webinars
For Developers
- 🔧 Extensive API documentation
- 🧪 Comprehensive test suite
- 🏗️ Modular architecture for easy extension
- 📋 Contributing guidelines and code standards
🔑 Quick Examples
Basic Usage
from chemistry_llm import ChemistryReactionExtractor
# Initialize extractor
extractor = ChemistryReactionExtractor.from_pretrained("chemplusx/rxnextract-complete")
# Analyze procedure
procedure = """
Dissolve 5.0 g of benzoic acid in 100 mL of hot water.
Add 10 mL of concentrated HCl and cool the solution.
Filter the precipitated product and wash with cold water.
"""
results = extractor.analyze_procedure(procedure)
print(results['extracted_data'])
Command Line Interface
# Interactive mode
rxnextract --interactive
# Batch processing
rxnextract --input procedures.txt --output results.json
# Single procedure
rxnextract --procedure "Add 2g NaCl to 50mL water"
Analysis & Research
from chemistry_llm.analysis import ErrorAnalyzer, AblationStudy
# Error analysis
analyzer = ErrorAnalyzer()
error_results = analyzer.analyze_prediction_errors(predictions, ground_truth)
# Ablation study
ablation = AblationStudy(model_path="./model")
study_results = ablation.run_complete_study(test_data, ground_truth)
🏗️ System Requirements
| Component | Minimum | Recommended |
|---|---|---|
| Python | 3.8+ | 3.9+ |
| RAM | 8GB | 16GB+ |
| GPU Memory | 4GB | 12GB+ |
| Storage | 20GB | 50GB+ |
| CPU | 4 cores | 8+ cores |
Note: Requirements are for inference only. Fine-tuning requires additional resources.
📊 Data and Software Availability
Code Repository: All code used in this study is available under the MIT License at https://github.com/chemplusx/RxNExtract. The MIT License permits unrestricted use, modification, and distribution, making it suitable for both academic research and commercial applications.
Pre-trained Models:
- HuggingFace Hub: chemplusx/rxnextract-complete
- Model cards with training details, performance metrics, and usage guidelines
Package Distribution:
- PyPI:
pip install rxnextract - Conda-Forge:
conda install -c conda-forge rxnextract - Docker Hub:
docker pull chemplusx/rxnextract:latest
Datasets: Training and evaluation datasets are available at Zenodo DOI: 10.5281/zenodo.XXXXXX
Reproducibility: Complete analysis scripts and configuration files are provided to reproduce all results presented in the paper.
🤝 Contributing
We welcome contributions from the community! Whether you're fixing bugs, adding features, improving documentation, or sharing use cases, your help makes RxNExtract better.
Quick Contributing Guide
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes and add tests
- Ensure all tests pass (
python -m pytest) - Submit a pull request
See our Contributing Guidelines for detailed instructions.
📄 License & Citation
License: This project is licensed under the MIT License - see the LICENSE file for complete terms.
Citation: If you use RxNExtract in your research, please cite our paper:
@article{rxnextract2025,
title={RxNExtract: A Professional-Grade System for Chemical Reaction Extraction using Fine-tuned LLMs},
author={[Your Authors]},
journal={[Journal Name]},
year={2025},
doi={[DOI]}
}
🔗 Links
- Homepage: https://github.com/chemplusx/RxNExtract
- Documentation: https://docs.rxnextract.org
- PyPI Package: https://pypi.org/project/rxnextract/
- Docker Images: https://hub.docker.com/r/chemplusx/rxnextract
- HuggingFace Models: https://huggingface.co/chemplusx/rxnextract-complete
- Paper: [Link to published paper]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rxnextract-1.2.5.tar.gz.
File metadata
- Download URL: rxnextract-1.2.5.tar.gz
- Upload date:
- Size: 48.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a78252090f49b34434a0938d65adb5bf926da8cee1af0fda77dce2ce6b483fe6
|
|
| MD5 |
c4e14ae7587ea00948e1bce41f6224dc
|
|
| BLAKE2b-256 |
809bc92a784567a4fa1c32d9edb7a7101436f21ea80dd85eab434e699575e1e1
|
File details
Details for the file rxnextract-1.2.5-py3-none-any.whl.
File metadata
- Download URL: rxnextract-1.2.5-py3-none-any.whl
- Upload date:
- Size: 50.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4dc345c10e2281c66c185b832e178b31756951a8471b654d15733be13d6fd476
|
|
| MD5 |
85d75ed710cd15a57bb6b61defc9bd5a
|
|
| BLAKE2b-256 |
87f115431d7e8b17c20db99355a6da50caf93b734b94eef4d8ca2e522fba98e9
|