Convert HuggingFace Trainer notebooks to HTML with preserved training data
Project description
🤗 HuggingFace Trainer Notebook Converter
A specialized Python package for converting Jupyter notebooks that use the HuggingFace Trainer API to HTML while preserving and enhancing training progress visualization.
🧹 Clean Conversion Process
This package provides a clean, direct conversion with no intermediate files left behind:
- ✅ Direct Conversion: Converts directly from .ipynb to HTML in one step
- ✅ No Residual Files: All temporary files are automatically cleaned up
- ✅ Efficient Processing: Uses memory for intermediate steps where possible
- ✅ PyPI Ready: Designed for simple installation via pip
� Universal Converter Update
The converter works with a wide variety of HuggingFace Trainer API usage patterns:
- ✅ Multiple Model Types: Automatically detects and enhances different model architectures (SequenceClassification, TokenClassification, QuestionAnswering, etc.)
- ✅ Diverse Metrics: Identifies and visualizes various evaluation metrics (accuracy, F1, precision, recall, etc.)
- ✅ Flexible Training Patterns: Works with different training approaches and visualization styles
- ✅ Enhanced Analysis: Generates detailed reports about training configurations and results
🚀 Quick Start
Installation
pip install hf-trainer-nbconvert
Usage
Command Line
# Convert a notebook to HTML (no residual files)
hf-trainer-nbconvert your_notebook.ipynb
# Specify output file
hf-trainer-nbconvert your_notebook.ipynb -o output.html
Python API
# Simple, direct API (recommended)
from hf_trainer_nbconvert import convert_notebook_to_html
# One-line conversion with automatic cleanup
html_path = convert_notebook_to_html("your_notebook.ipynb")
print(f"HTML saved to: {html_path}")
# Or with custom output path
html_path = convert_notebook_to_html(
"your_notebook.ipynb",
output_html_path="custom_output.html"
)
🧪 Test Models
The test_models
directory contains sample notebooks that demonstrate different usages of the HuggingFace Trainer API:
- Text Classification: Simple sentiment analysis with BERT
- Named Entity Recognition: Token classification with CoNLL-2003 dataset
Use these to test the universality of the converter:
python test_universal_converter.py -a -v
🎯 Features
- Smart Trainer Detection: Automatically identifies cells containing Hugging Face Trainer API usage
- Enhanced Training Visualization: Adds improved progress tracking and loss visualization
- Multiple Output Formats: Convert to Python, HTML, Markdown with enhanced formatting
- Training Analysis: Comprehensive analysis of Trainer usage patterns
- Progress Enhancement: Automatically enhances training cells with better progress tracking
- Custom Styling: Adds custom CSS for HTML output to highlight training components
🚀 Installation
- Clone or download this repository
- Install the required dependencies:
pip install -r requirements.txt
📁 Files
hf_trainer_nbconvert.py
- Main converter moduleexample_usage.py
- Usage examples and demonstrationsrequirements.txt
- Required dependenciesREADME.md
- This documentation
🔧 Usage
Python API
from hf_trainer_nbconvert import HuggingFaceTrainerConverter
# Initialize converter
converter = HuggingFaceTrainerConverter("your_notebook.ipynb")
# Convert to Python with enhancements
python_code = converter.convert_to_python("enhanced_script.py")
# Convert to HTML with custom styling
html_content = converter.convert_to_html("enhanced_notebook.html")
# Convert to Markdown
markdown_content = converter.convert_to_markdown("enhanced_docs.md")
# Analyze Trainer usage
analysis = converter.analyze_trainer_usage()
print(f"Found {len(analysis['trainer_cells'])} Trainer cells")
# Generate training report
report = converter.generate_training_report()
Command Line Interface
# Basic conversion to Python
python hf_trainer_nbconvert.py notebook.ipynb -f python
# Convert to HTML with custom output path
python hf_trainer_nbconvert.py notebook.ipynb -f html -o output.html
# Convert to all formats
python hf_trainer_nbconvert.py notebook.ipynb -f all
# Generate analysis report
python hf_trainer_nbconvert.py notebook.ipynb --analyze
# Generate training report
python hf_trainer_nbconvert.py notebook.ipynb --report -o report.md
# Convert without enhancements (standard nbconvert)
python hf_trainer_nbconvert.py notebook.ipynb --no-enhance
🎨 Enhanced Features
1. Training Progress Visualization
The converter automatically enhances training cells with:
# Enhanced training visualization
if hasattr(trainer.state, 'log_history'):
logs = trainer.state.log_history
if logs:
steps = [log.get('step', 0) for log in logs if 'loss' in log]
losses = [log.get('loss', 0) for log in logs if 'loss' in log]
if steps and losses:
plt.figure(figsize=(10, 6))
plt.plot(steps, losses, 'b-', linewidth=2, label='Training Loss')
plt.xlabel('Training Steps')
plt.ylabel('Loss')
plt.title('Training Progress - Loss Over Time')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
2. Training Documentation
Trainer instantiation cells are enhanced with detailed documentation:
# Enhanced Training Configuration
# This cell initializes the Hugging Face Trainer with the following key components:
# - Model: The transformer model to be fine-tuned
# - Training Arguments: Configuration for training hyperparameters
# - Dataset: Training and evaluation datasets
# - Tokenizer: For text preprocessing
3. Custom HTML Styling
HTML output includes custom CSS for better visualization:
.trainer-cell {
border-left: 4px solid #ff6b6b;
padding-left: 10px;
background-color: #fff5f5;
}
.training-progress {
border-left: 4px solid #4ecdc4;
padding-left: 10px;
background-color: #f0fffe;
}
📊 Analysis Features
Trainer Usage Analysis
The converter can analyze your notebook and provide insights:
analysis = converter.analyze_trainer_usage()
# Returns:
{
'trainer_cells': [6], # Cells with Trainer instantiation
'training_cells': [7], # Cells with training execution
'visualization_cells': [8], # Cells with training visualization
'imports': [...], # Detected imports
'models_used': [...], # Models found in the notebook
'datasets_used': [...] # Datasets detected
}
Training Report Generation
Generate comprehensive reports about your training setup:
# Hugging Face Trainer Analysis Report
**Notebook:** example_notebook.ipynb
## Summary
- Total cells: 15
- Code cells: 12
- Trainer-related cells: 3
- Training execution cells: 1
- Visualization cells: 2
## Trainer API Usage
- Trainer instantiation found in cells: [6]
- Training execution found in cells: [7]
- Training visualization found in cells: [8, 10]
## Models Detected
- BertForQuestionAnswering (Cell 6)
## Recommendations
- Consider adding more comprehensive evaluation metrics
- Add early stopping callback for better training control
🔍 Detected Patterns
The converter automatically detects:
- Trainer Instantiation:
Trainer(...)
,trainer = Trainer(...)
- Training Execution:
trainer.train()
,training_log = trainer.train()
- Visualization:
plt.plot(...loss...)
,matplotlib...training
- Model Usage:
BertForQuestionAnswering
,GPT2LMHeadModel
, etc. - Import Statements:
from transformers import ...
🎯 Use Cases
- Converting Training Notebooks: Transform research notebooks into production-ready scripts
- Documentation Generation: Create comprehensive HTML/Markdown documentation
- Training Analysis: Analyze and optimize your training setup
- Progress Visualization: Enhance training progress tracking
- Code Sharing: Generate clean Python scripts from experimental notebooks
🛠️ Customization
You can extend the converter by:
- Adding Custom Preprocessors: Create new pattern detection logic
- Custom Templates: Modify output templates for different formats
- Enhanced Analysis: Add more sophisticated training analysis
- Custom Styling: Modify CSS for HTML output
Example custom preprocessor:
class CustomTrainerPreprocessor(HuggingFaceTrainerPreprocessor):
def _is_custom_pattern(self, source: str) -> bool:
# Add your custom detection logic
return 'your_pattern' in source
def _enhance_custom_cell(self, cell, index: int):
# Add your custom enhancement
return cell
🤝 Contributing
Feel free to contribute by:
- Reporting issues
- Suggesting new features
- Submitting pull requests
- Improving documentation
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
👨💻 Author
K N S Sri Harshith
Email: knssriharshith@gmail.com
- Adding support for more training frameworks
- Improving visualization capabilities
- Adding new output formats
- Enhancing analysis features
- Improving documentation
📝 License
MIT License - Feel free to use and modify as needed.
🔗 Related Tools
- nbconvert - The underlying conversion library
- Hugging Face Transformers - The ML library this tool specializes in
- Jupyter - The notebook environment
📞 Support
If you encounter issues or have suggestions:
- Check the analysis output for insights
- Use the
--analyze
flag to understand your notebook structure - Try the
--no-enhance
flag if you encounter compatibility issues - Review the generated training report for optimization suggestions
Happy Training! 🚀
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hf_trainer_nbconvert-2.0.0.tar.gz
.
File metadata
- Download URL: hf_trainer_nbconvert-2.0.0.tar.gz
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
59cdd121930af63ec7bdffe04af102002722b50942642a8d76c1272db8a4bdc3
|
|
MD5 |
fb353d0e27a3c8d64ffbafaafb9e8711
|
|
BLAKE2b-256 |
38d2f5cd9993872951aff0120f7b6f10674e8c625e3939c16489b065e3107e6f
|
File details
Details for the file hf_trainer_nbconvert-2.0.0-py3-none-any.whl
.
File metadata
- Download URL: hf_trainer_nbconvert-2.0.0-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
22780b3794f6d8bdacb67f278bccbb7c17a9d8f7246b28d8f0aa6b31ff048e4c
|
|
MD5 |
37b263f35e319879d51857ea69e95c59
|
|
BLAKE2b-256 |
3b33fad08141c23eb3e5dac8b51d80fc2c636e70faa5455bc0f9c8c2d9f1ae11
|