Topological Entity Similarity Structure for Emergent Response Analysis
Project description
TESSERA
Topological Entity Similarity Structure for Emergent Response Analysis
A domain-general framework for detecting and exploiting geometric structure in multi-dimensional feature spaces for predictive classification.
Targeting: Nature Computational Science
Core Hypothesis
When entities are characterized by multi-dimensional features and outcomes depend on relational position in feature space, similarity network topology carries predictive information that direct feature-based classifiers miss.
Three Failure Geometries
TESSERA detects geometrically distinct response regions in entity similarity space:
- Ecotone — responses cluster at boundaries between stable operational domains (analogous to ecological transition zones)
- Isolated cluster — responses occupy a distinct region, separate from main domains
- Diffuse — responses scattered within domains, driven by individual feature thresholds
Three Demonstration Domains
| Ecology | Manufacturing | HPC | |
|---|---|---|---|
| Entity | Reef survey | Production run | Job |
| Features | Environmental conditions | Sensor measurements | Resource usage |
| Outcome | Bleaching | Defect | Failure |
| Dataset | Global Coral Bleaching DB | UCI SECOM | Synthetic / NØMAD |
| N | ~35,000 | 1,567 | 5,000+ |
Validation Protocol
- Phase 1 — Ablation: GNN vs. MLP, RF, XGBoost, LogReg (justify network)
- Phase 2 — Similarity: 7 measures compared (Simpson, Cosine, Bray-Curtis, Jaccard, Pearson, Euclidean, Mahalanobis)
- Phase 3 — Bin sensitivity: n_bins = 2, 3, 4, 5 (Simpson only)
- Phase 4 — Temporal: 5-fold temporal cross-validation
Project Structure
tessera/
├── __init__.py # Package init (v0.1.0)
├── core/
│ ├── __init__.py
│ ├── similarity.py # 7 similarity measures
│ └── network.py # Graph construction (threshold, kNN, weighted)
├── synthetic/
│ ├── __init__.py
│ ├── landscape.py # Feature space topology builder
│ ├── outcomes.py # Failure pattern assignment (3 types + mixed)
│ └── visualize.py # Diagnostic plots
├── data_acquisition.py # Real dataset download & preprocessing
├── test_generator.py # Synthetic data validation
└── test_pipeline.py # End-to-end pipeline test
Current Status
- Synthetic data generator (3 patterns + mixed, with entity injection)
- Similarity engine (7 measures, vectorized)
- Network construction (threshold, kNN, weighted)
- Pipeline validation (data → similarity → graph)
- GNN model
- Real dataset acquisition (coral bleaching, SECOM)
- Validation pipeline (4 phases)
- Methods section
- Results
- Introduction, Discussion, Conclusions
Paper Writing Order
- Methods
- Results
- Introduction
- Discussion
- Conclusions
- Abstract (last)
References
- van Woesik, R. & Kratochwill, C. (2022). A global coral-bleaching database, 1980–2020. Scientific Data, 9(20).
- McCann, M. & Johnston, A. (2008). SECOM Dataset. UCI ML Repository.
- Tonini, J. (2025). NØMAD-HPC. Journal of Open Research Software.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file tessera_ml-0.2.0.tar.gz.
File metadata
- Download URL: tessera_ml-0.2.0.tar.gz
- Upload date:
- Size: 90.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1952d60925b9b7931d9ff9b79a6ba9b20f2de3ba7ed108546a3173d67632afe5
|
|
| MD5 |
96aa3aa159af9d2ac72aeaa9e2aaf88d
|
|
| BLAKE2b-256 |
9a7126f5513f8bb1649615fb387a86c8ad9f00075ae454b49fbba49926f87a28
|