A Hierarchical Pruning Architecture for On-Device Language Intelligence

These details have not been verified by PyPI

Project links

Project description

PruneForge

A Hierarchical Pruning Architecture for On-Device Language Intelligence

PruneForge is a research prototype (v0.1, 2026) that replaces the standard transformer decode-from-scratch approach with a retrieve → prune → assemble pipeline, designed to run efficiently on CPU-class hardware.

Architecture

Query
  │
  ▼
Knowledge Pool          ← compressed semantic store (contrastive pre-training)
  │  TopK retrieval
  ▼
Stage 1: Morphological Decomposition   ← BiLSTM, character-level, language-agnostic
  │
Stage 2: Topical Relevance Filter      ← bilinear scorer, prunes ~70% of candidates
  │
Stage 3: Composability Assessment      ← pairwise MLP, O(k²)
  │
Stage 4: Coherence Validation          ← small Transformer encoder
  │
Stage 5: Handoff Formatting            ← sort by relevance, no parameters
  │
  ▼
Composition Motor                      ← Transformer decoder + self-correction loop
  │
  ▼
Output text

Symbolic Content Flag (SCF): when the candidate set is predominantly symbolic (code, math, formulas), stages 2–4 switch to operator-aware modes rather than semantic modes.

Installation

pip install pruneforge

With HNSW retrieval support (optional, requires faiss):

pip install "pruneforge[hnsw]"

From source (editable):

git clone https://github.com/astromini/pruneforge
cd pruneforge
pip install -e .

Quick Start

Inference

from pruneforge import PruneForgeModel

model = PruneForgeModel.from_pretrained("output/")
response = model.generate("Yarın hava nasıl olacak?")
print(response)

Training

# Plain-text corpus only (self-supervised)
python train.py --data corpus.txt --lang tr --output output/

# With morphological annotations (Stage 1 supervised)
python train.py --data corpus.txt --morph morph.tsv --lang tr --output output/

# GPU
python train.py --data corpus.txt --lang multilingual --device cuda --output output/

# Quick smoke test (small epochs)
python train.py --data corpus.txt --pool_epochs 1 --stage_epochs 1 --motor_epochs 1

Training runs 7 sequential phases automatically:

Phase	Component	Notes
1	Knowledge Pool	Contrastive pre-training (NT-Xent)
2	Stage 1	Morphological decomposition (supervised or self-supervised)
3	Stage 2	Topical relevance filter
4	Stage 3	Composability assessment
5	Stage 4	Coherence validation
6	Motor	Supervised CE + REINFORCE
7	Joint	End-to-end fine-tuning

Morphological Corpus Format

For Stage 1 supervised training (--morph), provide a TSV file:

yemekler	yemek	ler
koşuyorum	koş	uyor,um
running	run	ing

Columns: token \t root \t affix_1,affix_2,...

If no morphological corpus is available, omit --morph — Stage 1 falls back to self-supervised character-level reconstruction.

Configuration

All hyperparameters live in pruneforge/config.py. Pass a PruneForgeConfig to any component:

from pruneforge.config import PruneForgeConfig

cfg = PruneForgeConfig()
cfg.knowledge_pool.num_concepts = 500_000   # reduce for low-RAM devices
cfg.knowledge_pool.embedding_dim = 128
cfg.motor.num_layers = 4

Package Structure

pruneforge/
├── pruneforge/
│   ├── __init__.py
│   ├── config.py               ← all hyperparameters
│   ├── inference.py            ← PruneForgeModel (end-to-end)
│   ├── knowledge_pool/
│   │   ├── __init__.py
│   │   └── pool.py             ← KnowledgePool + QueryEncoder
│   ├── pipeline/
│   │   ├── __init__.py
│   │   ├── pipeline.py         ← HierarchicalPruningPipeline
│   │   ├── stage1_morphology.py
│   │   ├── stage2_relevance.py
│   │   ├── stage3_composability.py
│   │   ├── stage4_coherence.py
│   │   └── stage5_handoff.py
│   ├── motor/
│   │   ├── __init__.py
│   │   └── motor.py            ← CompositionMotor
│   └── utils/
│       ├── __init__.py
│       └── data.py             ← tokenizer + datasets
├── train.py                    ← single-command training script
├── tests/
│   └── test_e2e.py
├── pyproject.toml
├── setup.py
└── requirements.txt

Running Tests

pip install -e .
python tests/test_e2e.py

Expected output:

PruneForge End-to-End Smoke Test
========================================
[ 5/5 ] Tokenizer ... OK
[ 1/5 ] Knowledge Pool ... OK
[ 2/5 ] Full Pipeline (Stages 1-5) ... OK  (10→N→N→N tokens)
[ 3/5 ] Composition Motor ... OK  (iterations used: [1, 1])
[ 4/5 ] Training losses ... OK  (CE=...  RL=...)

✓ All tests passed.

Requirements

Python ≥ 3.9
PyTorch ≥ 2.0.0
NumPy ≥ 1.24.0
(optional) faiss-cpu ≥ 1.7.4 for HNSW retrieval

License

Apache 2.0 — see LICENSE.

Citation

@software{pruneforge2026,
  author  = {astromini},
  title   = {PruneForge: A Hierarchical Pruning Architecture for On-Device Language Intelligence},
  year    = {2026},
  version = {0.1.0},
  url     = {https://github.com/astromini/pruneforge},
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.4

May 10, 2026

This version

0.1.3

May 10, 2026

0.1.1

May 10, 2026

0.1.0

May 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pruneforge-0.1.3.tar.gz (34.4 kB view details)

Uploaded May 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pruneforge-0.1.3-py3-none-any.whl (38.4 kB view details)

Uploaded May 10, 2026 Python 3

File details

Details for the file pruneforge-0.1.3.tar.gz.

File metadata

Download URL: pruneforge-0.1.3.tar.gz
Upload date: May 10, 2026
Size: 34.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for pruneforge-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`4cc3d8e4f80d00aebc39f598b1ba2fa806489f0ae7f4a6d08917e8d8c8b35934`
MD5	`52f521bf54c03891be54aa4bc7ae8206`
BLAKE2b-256	`07d62249f1ba1ef942f2d175f89407a4b22bfa2d9fa982a7766de76d301f0cac`

See more details on using hashes here.

File details

Details for the file pruneforge-0.1.3-py3-none-any.whl.

File metadata

Download URL: pruneforge-0.1.3-py3-none-any.whl
Upload date: May 10, 2026
Size: 38.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for pruneforge-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4c6bb1fc5dd095bfa3965d4ea71b74edd2b14e1ec9992d7fc277f3c23922f715`
MD5	`f3a1007277fb954e499870e80e9e0479`
BLAKE2b-256	`55003c0aad455d3ff04373cf3d7bd93c98a206b7cfe3aa42632c4dff9f87ac46`

See more details on using hashes here.

pruneforge 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PruneForge

Architecture

Installation

Quick Start

Inference

Training

Morphological Corpus Format

Configuration

Package Structure

Running Tests

Requirements

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes