Zettelkasten-inspired ML model catalog with intelligent knowledge management
Project description
Athelas: Zettelkasten-Inspired ML Model Catalog
Athelas is a comprehensive machine learning repository organized according to Zettelkasten knowledge management principles. It provides a unified catalog of ML models, data processing components, and intelligent knowledge management tools designed to facilitate model discovery, comparison, and innovation.
Overview
Athelas transforms traditional ML repositories into a living knowledge system by implementing a dual-layer architecture:
- Implementation Notes: Atomic, reusable ML components (models, processors, utilities)
- Literature Notes: Contextual documentation that connects and explains components
- Intelligent Agents: Knowledge orchestrator and retriever for automated knowledge management
Key Features
๐ง Intelligent Knowledge Management
- Knowledge Orchestrator: Automatically maintains connections between components, generates documentation, and validates relationships
- Knowledge Retriever: Enables semantic search and discovery through RAG-based interfaces and knowledge graph exploration
- Dual-Layer Architecture: Combines executable code with rich contextual documentation
๐ Explicit Component Connectivity
- Connection registries document relationships between models and processors
- Cross-reference metadata enables discovery of compatible components
- Knowledge graph visualization of component relationships
โก Multi-Framework Support
- PyTorch Lightning: Advanced neural network implementations
- PyTorch: Native PyTorch models and components
- XGBoost & LightGBM: Gradient boosting models
- Reinforcement Learning: Actor-critic and bandit algorithms
- AWS Bedrock: Cloud-based model integrations
๐ Comprehensive Processing Pipeline
- Text Processing: BERT tokenization, Gensim processing, text augmentation
- Tabular Processing: Numerical imputation, categorical encoding, feature engineering
- Image Processing: Computer vision preprocessing and augmentation
- Multimodal Processing: Cross-modal fusion and attention mechanisms
Architecture
Repository Structure
src/
โโโ models/ # ML Model Implementations
โ โโโ lightning/ # PyTorch Lightning models
โ โโโ pytorch/ # Native PyTorch implementations
โ โโโ xgboost/ # XGBoost models
โ โโโ lightgbm/ # LightGBM models
โ โโโ actor_critic/ # Reinforcement learning models
โโโ processing/ # Data Processing Components
โ โโโ text/ # Text processing (BERT, Gensim, etc.)
โ โโโ tabular/ # Tabular data processing
โ โโโ image/ # Image processing
โ โโโ feature/ # Feature engineering
โ โโโ augmentation/ # Data augmentation
โโโ knowledge/ # Knowledge Management System
โ โโโ orchestrator/ # Knowledge orchestration agents
โ โโโ retriever/ # Intelligent retrieval system
โ โโโ connections/ # Component relationship registries
โ โโโ demonstrations/ # Usage examples and tutorials
โโโ utils/ # Shared utilities
slipbox/ # Knowledge Documentation
โโโ models/ # Model documentation and analysis
โโโ processing/ # Processing component documentation
โโโ knowledge/ # Knowledge system documentation
Core Design Principles
- Atomicity: Each component focuses on a single, well-defined responsibility
- Explicit Connectivity: Relationships between components are explicitly documented
- Emergent Organization: Structure evolves naturally from content relationships
- Knowledge Preservation: Implementation and context are preserved together
Installation
pip install athelas
Development Installation
git clone https://github.com/TianpeiLuke/athelas.git
cd athelas
pip install -e .
Quick Start
Basic Model Usage
from athelas.models.lightning import BertClassifier
from athelas.processing.text import BertTokenizeProcessor
# Initialize components
tokenizer = BertTokenizeProcessor()
model = BertClassifier(config={
'num_classes': 2,
'learning_rate': 2e-5
})
# Process data
processed_text = tokenizer("Example text for classification")
# Train model
trainer = pl.Trainer(max_epochs=3)
trainer.fit(model, train_dataloader)
Knowledge System Queries
from athelas.knowledge.retriever import KnowledgeRetriever
# Initialize knowledge retriever
retriever = KnowledgeRetriever()
# Semantic search for components
results = retriever.search("text classification with BERT")
# Explore component relationships
related = retriever.find_related_components("bert_classifier")
# Get recommendations based on context
recommendations = retriever.recommend_components({
'task': 'text_classification',
'data_type': 'text',
'framework': 'lightning'
})
Component Discovery
from athelas.knowledge.orchestrator import KnowledgeOrchestrator
# Initialize orchestrator
orchestrator = KnowledgeOrchestrator()
# Discover compatible processors for a model
compatible = orchestrator.find_compatible_processors("bert_classifier")
# Get alternative models for a task
alternatives = orchestrator.find_alternatives("text_classification")
Available Components
Models
Text Classification
- BERT Classifier: Transformer-based classification with Hugging Face integration
- Text CNN: Convolutional neural networks for text classification
- LSTM: Recurrent neural networks for sequence classification
Multimodal Models
- Multimodal BERT: Text and tabular data fusion
- Cross-Attention: Attention-based multimodal fusion
- Gate Fusion: Gated multimodal feature combination
- Mixture of Experts: Sparse multimodal processing
Traditional ML
- XGBoost: Gradient boosting for tabular data
- LightGBM: Fast gradient boosting implementation
Processing Components
Text Processing
- BERT Tokenizer: Hugging Face BERT tokenization
- Gensim Processor: Word2Vec and Doc2Vec processing
- Text Augmentation: Data augmentation for text
Tabular Processing
- Numerical Imputation: Missing value handling
- Categorical Encoding: Label and one-hot encoding
- Feature Engineering: Automated feature creation
Knowledge Management
Intelligent Discovery
Athelas includes intelligent agents that help you discover and connect components:
# Ask natural language questions about the catalog
answer = retriever.ask("What models work best for multimodal classification?")
# Explore the knowledge graph
graph = retriever.get_knowledge_graph()
connections = graph.get_connections("bert_classifier")
Automatic Documentation
The Knowledge Orchestrator automatically:
- Extracts metadata from component implementations
- Maintains connection registries between components
- Generates and updates documentation
- Validates component relationships
Contributing
Athelas follows Zettelkasten principles for contributions:
- Atomic Components: Create focused, single-purpose implementations
- Explicit Connections: Document relationships with other components
- Rich Metadata: Include structured metadata in component docstrings
- Knowledge Documentation: Provide contextual documentation in the slipbox
Adding a New Component
"""
---
component_type: model
framework: lightning
task: text_classification
connections:
requires:
- "processing.text.bert_tokenize_processor.BertTokenizeProcessor"
alternatives:
- "models.lightning.text_cnn.TextCNN"
---
"""
class YourModel(LightningModule):
"""Your model implementation with metadata."""
pass
License
MIT License - see LICENSE for details.
Citation
If you use Athelas in your research, please cite:
@software{athelas2025,
title={Athelas: Zettelkasten-Inspired ML Model Catalog},
author={Xie, Tianpei},
year={2025},
url={https://github.com/TianpeiLuke/athelas}
}
Related Projects
- Zettelkasten Method: Knowledge management methodology
- PyTorch Lightning: Framework for professional AI research
- Hugging Face Transformers: State-of-the-art NLP models
Athelas: *From the Greek word meaning "healing" - helping researchers and practitioners heal the fragmentation in ML model development through
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file athelas-0.2.1.tar.gz.
File metadata
- Download URL: athelas-0.2.1.tar.gz
- Upload date:
- Size: 924.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
951a568f7b18c8a31c5c062204d1839696138ee98251feb328e1cf2f81376cc8
|
|
| MD5 |
cfc7ecb2afdd7502c96a86815fa39c76
|
|
| BLAKE2b-256 |
6a21a064fc738083dec173c611adf07d4214e7fa0c484e01bc70a2ac1c03404e
|
File details
Details for the file athelas-0.2.1-py3-none-any.whl.
File metadata
- Download URL: athelas-0.2.1-py3-none-any.whl
- Upload date:
- Size: 434.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e137d273042af5e711565c3907a024468d79256b6b48d6fba133e091c20b700a
|
|
| MD5 |
f80b5b1cf8aeaba302ae0a2944cdc82c
|
|
| BLAKE2b-256 |
885649390a6d99a10b9a8be0da7cfb04a31f0f8f353c5a1ac44f2227e91d4e2e
|