Skip to main content

Zettelkasten-inspired ML model catalog with intelligent knowledge management

Project description

Athelas: Zettelkasten-Inspired ML Model Catalog

Athelas is a comprehensive machine learning repository organized according to Zettelkasten knowledge management principles. It provides a unified catalog of ML models, data processing components, and intelligent knowledge management tools designed to facilitate model discovery, comparison, and innovation.

Overview

Athelas transforms traditional ML repositories into a living knowledge system by implementing a dual-layer architecture:

  • Implementation Notes: Atomic, reusable ML components (models, processors, utilities)
  • Literature Notes: Contextual documentation that connects and explains components
  • Intelligent Agents: Knowledge orchestrator and retriever for automated knowledge management

Key Features

๐Ÿง  Intelligent Knowledge Management

  • Knowledge Orchestrator: Automatically maintains connections between components, generates documentation, and validates relationships
  • Knowledge Retriever: Enables semantic search and discovery through RAG-based interfaces and knowledge graph exploration
  • Dual-Layer Architecture: Combines executable code with rich contextual documentation

๐Ÿ”— Explicit Component Connectivity

  • Connection registries document relationships between models and processors
  • Cross-reference metadata enables discovery of compatible components
  • Knowledge graph visualization of component relationships

โšก Multi-Framework Support

  • PyTorch Lightning: Advanced neural network implementations
  • PyTorch: Native PyTorch models and components
  • XGBoost & LightGBM: Gradient boosting models
  • Reinforcement Learning: Actor-critic and bandit algorithms
  • AWS Bedrock: Cloud-based model integrations

๐Ÿ›  Comprehensive Processing Pipeline

  • Text Processing: BERT tokenization, Gensim processing, text augmentation
  • Tabular Processing: Numerical imputation, categorical encoding, feature engineering
  • Image Processing: Computer vision preprocessing and augmentation
  • Multimodal Processing: Cross-modal fusion and attention mechanisms

Architecture

Repository Structure

src/
โ”œโ”€โ”€ models/                # ML Model Implementations
โ”‚   โ”œโ”€โ”€ lightning/        # PyTorch Lightning models
โ”‚   โ”œโ”€โ”€ pytorch/          # Native PyTorch implementations
โ”‚   โ”œโ”€โ”€ xgboost/          # XGBoost models
โ”‚   โ”œโ”€โ”€ lightgbm/         # LightGBM models
โ”‚   โ””โ”€โ”€ actor_critic/     # Reinforcement learning models
โ”œโ”€โ”€ processing/           # Data Processing Components
โ”‚   โ”œโ”€โ”€ text/             # Text processing (BERT, Gensim, etc.)
โ”‚   โ”œโ”€โ”€ tabular/          # Tabular data processing
โ”‚   โ”œโ”€โ”€ image/            # Image processing
โ”‚   โ”œโ”€โ”€ feature/          # Feature engineering
โ”‚   โ””โ”€โ”€ augmentation/     # Data augmentation
โ”œโ”€โ”€ knowledge/            # Knowledge Management System
โ”‚   โ”œโ”€โ”€ orchestrator/     # Knowledge orchestration agents
โ”‚   โ”œโ”€โ”€ retriever/        # Intelligent retrieval system
โ”‚   โ”œโ”€โ”€ connections/      # Component relationship registries
โ”‚   โ””โ”€โ”€ demonstrations/   # Usage examples and tutorials
โ””โ”€โ”€ utils/                # Shared utilities

slipbox/                  # Knowledge Documentation
โ”œโ”€โ”€ models/               # Model documentation and analysis
โ”œโ”€โ”€ processing/           # Processing component documentation
โ””โ”€โ”€ knowledge/            # Knowledge system documentation

Core Design Principles

  1. Atomicity: Each component focuses on a single, well-defined responsibility
  2. Explicit Connectivity: Relationships between components are explicitly documented
  3. Emergent Organization: Structure evolves naturally from content relationships
  4. Knowledge Preservation: Implementation and context are preserved together

Installation

pip install athelas

Development Installation

git clone https://github.com/TianpeiLuke/athelas.git
cd athelas
pip install -e .

Quick Start

Basic Model Usage

from athelas.models.lightning import BertClassifier
from athelas.processing.text import BertTokenizeProcessor

# Initialize components
tokenizer = BertTokenizeProcessor()
model = BertClassifier(config={
    'num_classes': 2,
    'learning_rate': 2e-5
})

# Process data
processed_text = tokenizer("Example text for classification")

# Train model
trainer = pl.Trainer(max_epochs=3)
trainer.fit(model, train_dataloader)

Knowledge System Queries

from athelas.knowledge.retriever import KnowledgeRetriever

# Initialize knowledge retriever
retriever = KnowledgeRetriever()

# Semantic search for components
results = retriever.search("text classification with BERT")

# Explore component relationships
related = retriever.find_related_components("bert_classifier")

# Get recommendations based on context
recommendations = retriever.recommend_components({
    'task': 'text_classification',
    'data_type': 'text',
    'framework': 'lightning'
})

Component Discovery

from athelas.knowledge.orchestrator import KnowledgeOrchestrator

# Initialize orchestrator
orchestrator = KnowledgeOrchestrator()

# Discover compatible processors for a model
compatible = orchestrator.find_compatible_processors("bert_classifier")

# Get alternative models for a task
alternatives = orchestrator.find_alternatives("text_classification")

Available Components

Models

Text Classification

  • BERT Classifier: Transformer-based classification with Hugging Face integration
  • Text CNN: Convolutional neural networks for text classification
  • LSTM: Recurrent neural networks for sequence classification

Multimodal Models

  • Multimodal BERT: Text and tabular data fusion
  • Cross-Attention: Attention-based multimodal fusion
  • Gate Fusion: Gated multimodal feature combination
  • Mixture of Experts: Sparse multimodal processing

Traditional ML

  • XGBoost: Gradient boosting for tabular data
  • LightGBM: Fast gradient boosting implementation

Processing Components

Text Processing

  • BERT Tokenizer: Hugging Face BERT tokenization
  • Gensim Processor: Word2Vec and Doc2Vec processing
  • Text Augmentation: Data augmentation for text

Tabular Processing

  • Numerical Imputation: Missing value handling
  • Categorical Encoding: Label and one-hot encoding
  • Feature Engineering: Automated feature creation

Knowledge Management

Intelligent Discovery

Athelas includes intelligent agents that help you discover and connect components:

# Ask natural language questions about the catalog
answer = retriever.ask("What models work best for multimodal classification?")

# Explore the knowledge graph
graph = retriever.get_knowledge_graph()
connections = graph.get_connections("bert_classifier")

Automatic Documentation

The Knowledge Orchestrator automatically:

  • Extracts metadata from component implementations
  • Maintains connection registries between components
  • Generates and updates documentation
  • Validates component relationships

Contributing

Athelas follows Zettelkasten principles for contributions:

  1. Atomic Components: Create focused, single-purpose implementations
  2. Explicit Connections: Document relationships with other components
  3. Rich Metadata: Include structured metadata in component docstrings
  4. Knowledge Documentation: Provide contextual documentation in the slipbox

Adding a New Component

"""
---
component_type: model
framework: lightning
task: text_classification
connections:
  requires:
    - "processing.text.bert_tokenize_processor.BertTokenizeProcessor"
  alternatives:
    - "models.lightning.text_cnn.TextCNN"
---
"""

class YourModel(LightningModule):
    """Your model implementation with metadata."""
    pass

License

MIT License - see LICENSE for details.

Citation

If you use Athelas in your research, please cite:

@software{athelas2025,
  title={Athelas: Zettelkasten-Inspired ML Model Catalog},
  author={Xie, Tianpei},
  year={2025},
  url={https://github.com/TianpeiLuke/athelas}
}

Related Projects


Athelas: *From the Greek word meaning "healing" - helping researchers and practitioners heal the fragmentation in ML model development through

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

athelas-0.2.1.tar.gz (924.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

athelas-0.2.1-py3-none-any.whl (434.0 kB view details)

Uploaded Python 3

File details

Details for the file athelas-0.2.1.tar.gz.

File metadata

  • Download URL: athelas-0.2.1.tar.gz
  • Upload date:
  • Size: 924.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for athelas-0.2.1.tar.gz
Algorithm Hash digest
SHA256 951a568f7b18c8a31c5c062204d1839696138ee98251feb328e1cf2f81376cc8
MD5 cfc7ecb2afdd7502c96a86815fa39c76
BLAKE2b-256 6a21a064fc738083dec173c611adf07d4214e7fa0c484e01bc70a2ac1c03404e

See more details on using hashes here.

File details

Details for the file athelas-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: athelas-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 434.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for athelas-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e137d273042af5e711565c3907a024468d79256b6b48d6fba133e091c20b700a
MD5 f80b5b1cf8aeaba302ae0a2944cdc82c
BLAKE2b-256 885649390a6d99a10b9a8be0da7cfb04a31f0f8f353c5a1ac44f2227e91d4e2e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page