A comprehensive toolkit for fine-tuning medical large language models with RAG capabilities
Project description
Medical LLM Fine-tuning with RAG System
A comprehensive PyPI package for fine-tuning large language models on medical literature with entity relationship extraction capabilities, featuring Qwen3-4B-Thinking model integration and advanced RAG-Anything multimodal document processing.
๐ฆ Quick Start
# Install the package
pip install medllm-finetune-rag[all]
# Manual Unsloth installation (for 2x faster training)
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
# Quick training
medllm train medical_data.json
# Or use Python API
python -c "from medllm import quick_train; quick_train('medical_data.json')"
๐ Features
๐ค Advanced Model Support
- Qwen3-4B-Thinking Integration: State-of-the-art reasoning model with thinking capabilities
- Unsloth Acceleration: 2x faster training with reduced VRAM usage
- Thinking Mode: Support for reasoning-based inference with
<think></think>tags - Multiple Training Formats: Instruction fine-tuning and NER training formats
๐ RAG-Anything Integration
- Multimodal Document Processing: PDF, DOCX, images, tables, equations
- Advanced Parsers: MinerU and Docling for robust document extraction
- LightRAG Knowledge Graph: Graph-based retrieval for enhanced context
- Multiple Search Modes: Naive, local, global, and hybrid retrieval strategies
๐ฅ Medical Specialization
- Medical Entity Extraction: Specialized for Bacteria, Disease, Evidence entities
- Relationship Extraction: Complex medical relationship understanding
- Domain-Specific Processing: Optimized for medical literature analysis
- Evaluation Metrics: Built-in medical NER evaluation tools
โก Platform Optimization
- Apple Silicon Support: Optimized for Mac M2/M3 with MPS backend
- CUDA Acceleration: Full GPU acceleration for NVIDIA cards
- CPU Fallback: Reliable CPU-only training option
- Universal Configuration: Single config system for all platforms
๐๏ธ Project Structure
medllm-finetune-rag/
โโโ core/ # Core training modules
โ โโโ medical_llm_trainer.py # Main training class with Unsloth integration
โ โโโ data_processing.py # Data preprocessing and format conversion
โ โโโ medical_rag_system.py # RAG-Anything system implementation
โ โโโ evaluation_metrics.py # Model evaluation tools
โ โโโ huggingface_uploader.py # HuggingFace model upload utility
โโโ config/ # Configuration files
โ โโโ config_mac_m2.yaml # Mac M2/M3 optimized configuration
โ โโโ config_cuda.yaml # CUDA GPU configuration
โ โโโ config_cpu.yaml # CPU-only configuration
โโโ scripts/ # Utility scripts
โ โโโ setup_environment.py # Environment setup and verification
โ โโโ run_training_pipeline.py # Legacy training pipeline runner
โ โโโ quick_start.sh # Quick setup script
โ โโโ setup_qwen3.sh # Qwen3 specific environment setup
โโโ docs/ # Documentation
โ โโโ UNIVERSAL_TRAINER_GUIDE.md # Universal trainer documentation
โ โโโ README_QWEN3.md # Qwen3 integration guide
โ โโโ README_MAC_M2.md # Mac M2 setup guide
โโโ examples/ # Example scripts and demos
โ โโโ medical_rag_demo.py # Comprehensive RAG-Anything demo
โ โโโ quick_rag_test.py # Quick RAG functionality test
โ โโโ qwen3_example.py # Qwen3 model example
โ โโโ run_mac_m2.py # Mac M2 optimized runner
โโโ universal_trainer.py # Universal training script
โโโ run.py # Simple command wrapper
โโโ requirements.txt # Python dependencies
โโโ .env # Environment variables (API keys)
โโโ README.md # This file
๐ง Installation
Prerequisites
- Python 3.9-3.13
- PyTorch with appropriate backend (CUDA/MPS/CPU)
- Git and GitHub CLI (optional, for repository management)
- OpenAI API key (optional, for RAG-Anything functionality)
Quick Setup
- Clone the repository:
git clone <repository-url>
cd medllm-finetune-rag
2. **Automated setup** (recommended):
```bash
chmod +x scripts/setup_qwen3.sh
./scripts/setup_qwen3.sh
This will automatically:
- Create and activate a virtual environment
- Install all required dependencies including RAG-Anything
- Set up platform-specific optimizations (Mac M2/CUDA/CPU)
- Verify the installation
-
Configure API keys (optional, for RAG functionality):
# Create .env file with your API keys echo "OPENAI_API_KEY=your_openai_api_key_here" > .env echo "HF_TOKEN=your_huggingface_token_here" >> .env
-
Quick test:
# Test basic functionality python examples/quick_rag_test.py # Test model inference python run.py test
Manual Installation
# Core dependencies
pip install torch torchvision torchaudio
pip install transformers>=4.36.0
pip install datasets>=2.14.0
pip install peft>=0.6.0
pip install bitsandbytes>=0.41.0
# Unsloth for efficient training (if compatible)
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install unsloth_zoo
pip install trl>=0.7.0
# Additional utilities
pip install wandb
pip install scikit-learn
pip install numpy pandas
pip install tqdm
pip install PyYAML
๐ Quick Start
1. Universal Trainer (Recommended)
The universal trainer provides a unified interface for all training operations:
# Quick inference test
python run.py test
# Training with different configurations
python run.py train # Mac M2 optimized
python run.py cuda # CUDA GPU training
python run.py cpu # CPU-only training
# Training with HuggingFace upload
python run.py train-upload
# Interactive mode
python run.py interactive
# Full pipeline with upload
python run.py full-upload
2. RAG-Anything Demo
# Test RAG functionality
python examples/quick_rag_test.py
# Run comprehensive RAG demo
python examples/medical_rag_demo.py
3. Configuration-Based Training
# Use specific configuration files
python universal_trainer.py --config config/config_mac_m2.yaml --mode train
python universal_trainer.py --config config/config_cuda.yaml --mode inference
python universal_trainer.py --config config/config_cpu.yaml --mode eval
4. Medical Training Pipeline
For research and development with full control over each stage:
# Complete medical pipeline (all stages)
python scripts/run_training_pipeline.py --stage all
# Individual stages
python scripts/run_training_pipeline.py --stage 1 # Data preprocessing
python scripts/run_training_pipeline.py --stage 2 # Model training
python scripts/run_training_pipeline.py --stage 3 # RAG system building
python scripts/run_training_pipeline.py --stage 4 # Evaluation
python scripts/run_training_pipeline.py --stage 5 # Demo inference
# Custom parameters
python scripts/run_training_pipeline.py --stage 2 --model Qwen/Qwen3-4B-Thinking-2507 --epochs 5
5. Direct Training API
from core.medical_llm_trainer import MedicalLLMTrainer
# Initialize trainer with Qwen3-4B-Thinking
trainer = MedicalLLMTrainer(
model_name="Qwen/Qwen3-4B-Thinking-2507",
use_unsloth=True,
use_qlora=True,
max_seq_length=2048
)
# Train on your data
trainer.train("path/to/your/training_data.json")
# Inference with thinking mode
result = trainer.inference(
"Hepatitis C virus causes chronic liver infection.",
enable_thinking=True
)
print(result)
๐ ๏ธ Training Tools Comparison
This project provides multiple ways to train and use medical LLMs, each optimized for different use cases:
๐ Tools Overview
| Tool | Purpose | Best For | Complexity |
|---|---|---|---|
scripts/run.py |
Simple command wrapper | Quick tasks, daily use | โญ Low |
scripts/run_training_pipeline.py |
Complete medical pipeline | Research, development | โญโญโญ High |
universal_trainer.py |
Configuration-driven trainer | Production, customization | โญโญ Medium |
| Direct API | Python integration | Custom applications | โญโญ Medium |
๐ run.py - Quick & Easy Commands
Perfect for: Daily use, quick testing, simple training
# Quick commands - just works!
python run.py test # Quick inference test
python run.py train # Full training (Mac M2 optimized)
python run.py train-upload # Train + upload to HuggingFace
python run.py cuda # CUDA optimized inference
python run.py interactive # Interactive chat mode
python run.py full-upload # Complete pipeline + upload
How it works: Simple wrapper that translates commands to universal_trainer.py calls
- 113 lines of code
- No dependencies beyond subprocess
- Instant gratification
๐ฅ run_training_pipeline.py - Medical Research Pipeline
Perfect for: Medical NLP research, stage-by-stage development, detailed analysis
# Stage-based execution with full control
python scripts/run_training_pipeline.py --stage all # Complete pipeline
python scripts/run_training_pipeline.py --stage 1 # Data preprocessing only
python scripts/run_training_pipeline.py --stage 2 --epochs 10 # Custom training
python scripts/run_training_pipeline.py --stage 3 # RAG system building
python scripts/run_training_pipeline.py --stage 4 # Evaluation with metrics
python scripts/run_training_pipeline.py --stage 5 # Demo inference
# Advanced options
python scripts/run_training_pipeline.py --stage 2 \
--model Qwen/Qwen3-4B-Thinking-2507 \
--device cuda \
--batch_size 4 \
--lr 3e-4
Pipeline Stages:
- Data Preprocessing: Load, clean, augment medical data
- Model Training: Fine-tune Qwen3-4B-Thinking for medical tasks
- RAG System Building: Create RAG-Anything enhanced retrieval system
- Evaluation: Comprehensive metrics and performance analysis
- Demo Inference: Multi-mode RAG inference demonstration
Features:
- 673 lines of comprehensive functionality
- Async RAG-Anything integration
- Detailed error handling and recovery
- Medical-specific evaluation metrics
- Stage-by-stage execution control
โ๏ธ universal_trainer.py - Configuration-Driven
Perfect for: Production deployments, custom configurations, reproducible experiments
# Configuration-based training
python universal_trainer.py --config config/config_mac_m2.yaml --mode train
python universal_trainer.py --config config/config_cuda.yaml --mode inference
python universal_trainer.py --config config/config_cpu.yaml --mode eval
# Custom configurations
python universal_trainer.py --config my_custom_config.yaml --mode full --upload-to-hf
Features:
- YAML-based configuration
- Platform optimization (Mac M2/CUDA/CPU)
- HuggingFace integration
- RAG-Anything support
๐ฏ When to Use Which Tool?
| Scenario | Recommended Tool | Command Example |
|---|---|---|
| Quick test | run.py |
python run.py test |
| Daily training | run.py |
python run.py train |
| Research experiment | run_training_pipeline.py |
python scripts/run_training_pipeline.py --stage all |
| Custom evaluation | run_training_pipeline.py |
python scripts/run_training_pipeline.py --stage 4 |
| RAG development | run_training_pipeline.py |
python scripts/run_training_pipeline.py --stage 3 |
| Production deployment | universal_trainer.py |
python universal_trainer.py --config prod_config.yaml |
| Custom integration | Direct API | See Python examples below |
2. Data Processing
from core.data_processing import MedicalDataProcessor
# Process raw medical data
processor = MedicalDataProcessor("raw_data.json")
processor.load_data()
processor.save_processed_data("processed_data/")
3. Using the Mock Trainer (for development)
from examples.english_stable_solution import MockMedicalTrainer
# Use when network issues prevent model download
trainer = MockMedicalTrainer()
result = trainer.inference(
"Streptococcus pneumoniae causes pneumonia.",
enable_thinking=True
)
๐ Model Configuration
Supported Models
- Qwen3-4B-Thinking-2507 (Recommended): Advanced reasoning capabilities
- Qwen2.5-7B-Instruct: General instruction following
- Custom models: Compatible with HuggingFace transformers
Training Parameters
model:
name: "Qwen/Qwen3-4B-Thinking-2507"
use_unsloth: true
use_qlora: true
torch_dtype: "bfloat16"
max_seq_length: 2048
training:
batch_size: 4
gradient_accumulation_steps: 4
learning_rate: 2.0e-4
num_train_epochs: 3
warmup_steps: 10
save_steps: 100
๐ง Entity Types and Relations
Entity Types
- Bacteria: Bacteria, viruses, and other pathogens
- Disease: Diseases, symptoms, and pathological conditions
- Evidence: Research evidence, conclusions, and findings
Relationship Types
- is_a: Hierarchical relationship
- biomarker_for: Biomarker relationship
- correlated_with: Correlation relationship
- has_relationship: General relationship
๐ Mac M2/M3 Support
Special optimizations for Apple Silicon:
- MPS Backend: Automatic detection and usage of Metal Performance Shaders
- Virtual Environment: Automatic setup to avoid system conflicts
- Fallback Mechanisms: Graceful degradation when Unsloth is incompatible
- Memory Optimization: Efficient memory usage for limited RAM
See README_MAC_M2.md for detailed setup instructions.
๐ Thinking Mode
The Qwen3-4B-Thinking model supports reasoning mode with <think></think> tags:
# Enable thinking mode for complex reasoning
result = trainer.inference(
"Analyze the relationship between H. pylori and gastric cancer.",
enable_thinking=True
)
# Output includes reasoning process:
# <think>
# I need to analyze the relationship between H. pylori and gastric cancer...
# </think>
#
# Final analysis in JSON format...
๐ Evaluation
Built-in evaluation metrics:
- Entity Recognition: Precision, Recall, F1-score
- Relation Extraction: Accuracy and relationship-specific metrics
- Medical Accuracy: Domain-specific evaluation criteria
from core.evaluation_metrics import MedicalEvaluator
evaluator = MedicalEvaluator()
results = evaluator.evaluate_model(trainer, test_data)
print(f"F1 Score: {results['f1_score']:.4f}")
๐ ๏ธ Troubleshooting
Common Issues
-
Network Download Errors:
- Use the mock trainer for development:
examples/english_stable_solution.py - Try different networks or download during off-peak hours
- Use
examples/download_step_by_step.pyfor manual downloads
- Use the mock trainer for development:
-
Mac M2 Compatibility:
- Use virtual environment to avoid system conflicts
- Fallback to standard transformers if Unsloth fails
- Check
docs/README_MAC_M2.mdfor specific solutions
-
Memory Issues:
- Reduce batch size and max sequence length
- Enable gradient checkpointing
- Use QLoRA for memory-efficient fine-tuning
Environment Issues
# Reset virtual environment
rm -rf venv
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
๐ Quick Reference
๐ Most Common Commands
# For everyday use (recommended)
python run.py test # Quick test
python run.py train # Train model
python run.py interactive # Chat mode
# For research and development
python scripts/run_training_pipeline.py --stage all # Full pipeline
python scripts/run_training_pipeline.py --stage 4 # Evaluation only
# For RAG functionality
python examples/quick_rag_test.py # Test RAG
python examples/medical_rag_demo.py # Full RAG demo
# For setup and configuration
python scripts/setup_environment.py # Auto setup
python scripts/setup_environment.py --mode install # Install deps only
๐ง File Structure Quick Guide
medllm-finetune-rag/
โโโ scripts/run.py # ๐ Start here - simple commands
โโโ scripts/run_training_pipeline.py # ๐ Research pipeline
โโโ universal_trainer.py # Configuration-driven trainer
โโโ examples/ # Demo scripts and examples
โโโ config/ # Platform-specific configs
โโโ core/ # Core modules (advanced use)
๐ก Troubleshooting Quick Fixes
# Dependencies missing?
python scripts/setup_environment.py --mode install
# Import errors?
pip install -r requirements.txt
# Mac M2 issues?
python scripts/setup_qwen3.sh
# Want to start fresh?
rm -rf venv && python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
๐ค Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit changes:
git commit -am 'Add feature' - Push to branch:
git push origin feature-name - Submit a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Unsloth for efficient LLM training
- Qwen Team for the Qwen3-4B-Thinking model
- HuggingFace for the transformers ecosystem
- Medical research community for domain expertise
๐ Support
- ๐ง Issues: Use GitHub Issues for bug reports and feature requests
- ๐ Documentation: Check the
docs/directory for detailed guides - ๐ฌ Discussions: Use GitHub Discussions for questions and community support
Note: This project is designed for research and educational purposes. Ensure compliance with relevant medical data regulations and ethical guidelines when working with medical literature and patient data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file medllm_finetune_rag-0.1.2.tar.gz.
File metadata
- Download URL: medllm_finetune_rag-0.1.2.tar.gz
- Upload date:
- Size: 110.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6bb1baed65ebff27d69bfb611257a0600cc088377cf28f17d2323b24dd7bd9b6
|
|
| MD5 |
1178584b9dfcc29019c31f49f8c553cb
|
|
| BLAKE2b-256 |
567794bb9c39bb15acb6ea856b781028b8a6013a005d03b3d63b555a6738d138
|
File details
Details for the file medllm_finetune_rag-0.1.2-py3-none-any.whl.
File metadata
- Download URL: medllm_finetune_rag-0.1.2-py3-none-any.whl
- Upload date:
- Size: 98.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0da0b0f80d13ac2bba34104542e4719da408651125158ca53ab6f971a50867b
|
|
| MD5 |
ed8dab5a1e3c4a5420fe31cd00480a8c
|
|
| BLAKE2b-256 |
05f9c9cf8c3ba4a9777f3f05dae6e912500e75180194ed4eb9df6dfe1662baff
|