Skip to main content

Diagnostic profiling of labeled embeddings for classification model complexity guidance.

Project description

separatix logo

separatix

separatix profiles labeled feature spaces before classifier training and returns transparent, confidence-aware guidance about apparent classification complexity.

The intended use case includes learned embeddings, but the package is not restricted to embeddings. It also works on raw feature matrices when you want a coarse diagnostic of whether the observed class geometry looks mostly linear, smoothly nonlinear, local or kernel-like, fragmented, bottlenecked, or too unreliable to trust.

separatix does not claim to pick the optimal classifier. It is a pretraining diagnostic and auditing tool designed to make its reasoning visible.

Installation

pip install separatix

To install the latest development version directly from GitHub:

pip install "git+https://github.com/NiklasMelton/Separatix.git@develop"

Quick start

from separatix import diagnose

recommendation = diagnose(X, y, random_state=0)
print(recommendation)

For a structured audit:

from separatix import diagnose

report = diagnose(X, y, return_report=True, random_state=0)
print(report.recommendation_text)
print(report.decision_path)
print(report.scores)
print(report.to_json())

What It Accepts

  • Dense NumPy arrays
  • SciPy sparse matrices
  • pandas DataFrames and Series when pandas is installed
  • Binary and multiclass classification targets
  • String or numeric labels treated as categorical class identifiers

Regression, multilabel classification, and multioutput classification are not supported.

What It Returns

By default, diagnose(...) returns a plain-text recommendation. With return_report=True, it returns a DiagnosticReport that includes:

  • the recommendation label
  • plain-text recommendation text
  • confidence level
  • underlying metric groups
  • normalized summary scores
  • a visible decision path
  • warnings and skipped diagnostics
  • sampling and densification events
  • preprocessing and runtime metadata

The report is JSON-serializable through report.to_dict() and report.to_json().

Recommendation Categories

  • linear_likely_sufficient
  • smooth_nonlinear_recommended
  • kernel_or_local_recommended
  • high_capacity_or_partitioning_recommended
  • feature_or_label_bottleneck_likely
  • insufficient_data_or_unreliable_geometry
  • inconclusive

These categories are intentionally coarse. They describe the apparent geometry and difficulty of the labeled feature space, not a guaranteed best model choice.

Decision Pipeline

The recommendation is produced by a fixed, inspectable pipeline:

  1. Validate inputs and encode labels.
  2. Audit class counts, imbalance, sparsity, and basic dataset conditions.
  3. Compute geometry, neighborhood, and boundary-related diagnostics.
  4. Run simple probe models and compare them to a dummy baseline.
  5. Aggregate the raw metrics into normalized scores such as signal, linearity, nonlinearity, overlap, fragmentation, and reliability.
  6. Apply explicit rule-based branching to map those scores to a recommendation category and confidence level.
  7. Render both a plain-language summary and a structured report.

The full rationale and decision rules are documented in docs/decision_pipeline.md.

Sparse Inputs And Memory Behavior

Sparse matrices are accepted directly. Diagnostics that need dense data use a shared densification policy rather than a separate dense-only code path. When a step would require densification, separatix can fail, skip, or warn and subsample before densifying, depending on configuration. These events are recorded in the report.

Examples

Related Work

This package is not an implementation of a published dataset-complexity procedure, but the project is adjacent to and inspired by prior work on classification complexity and data geometry. In particular, would like to acknowledge:

  • Ho and Basu, "Complexity Measures of Supervised Classification Problems" (PDF)
  • Lorena, Garcia, Lehmann, Souto, and Ho, "How Complex Is Your Classification Problem? A Survey on Measuring Classification Complexity" (DOI, PDF)

We do not follow those procedures directly, but they are relevant background for why geometry-aware pretraining diagnostics are useful.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

separatix-0.1.0a1.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

separatix-0.1.0a1-py3-none-any.whl (31.8 kB view details)

Uploaded Python 3

File details

Details for the file separatix-0.1.0a1.tar.gz.

File metadata

  • Download URL: separatix-0.1.0a1.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for separatix-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 8af2251748a9b46b21169f4a578329f199b9b2fd8877686f2db8a4c5048ced5b
MD5 ef9a073576aa8011b9140568a9c198a7
BLAKE2b-256 bea3a5308304e56ffb5c0308aab4a7030a164b84816d140a8b354f37572096f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for separatix-0.1.0a1.tar.gz:

Publisher: pypi-publish.yml on NiklasMelton/Separatix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file separatix-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: separatix-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 31.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for separatix-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 b20af5da79e20bf08dbe3d91d92a9135531c72b82fa8bd95b1e981f0464cb32e
MD5 58bf5f7a832b0ba57f5b40dfc76ba4dc
BLAKE2b-256 be584db905c4fd0ee63fd1787bffb6f7a35aa4b931cfe29844e48f2e4d8a4faa

See more details on using hashes here.

Provenance

The following attestation bundles were made for separatix-0.1.0a1-py3-none-any.whl:

Publisher: pypi-publish.yml on NiklasMelton/Separatix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page