Skip to main content

Density Yields Features - Rust core for structure discovery in embedding spaces

Project description

DYF - Outlier Classification

Fast outlier classification using PCA-based LSH. Identifies three types of items in embedding spaces:

  • Dense: Items in well-populated semantic buckets
  • Bridge: Sparse items that find community via recovery PCA (connect clusters)
  • Orphan: Truly unique items with no semantic neighbors

Installation

pip install dyf

Quick Start

import numpy as np
from dyf import OutlierClassifier

# Create embeddings (e.g., from sentence-transformers)
embeddings = np.random.randn(10000, 384).astype(np.float32)

# Classify outliers
classifier = OutlierClassifier(embedding_dim=384)
classifier.fit(embeddings)

# Get results
print(classifier.report())
bridge = classifier.get_bridge()  # Indices of bridge items
orphans = classifier.get_orphans()    # Indices of orphan items

Performance

~60ms for 60K embeddings (384 dimensions) - 3.8x faster than pure Python/sklearn.

API

OutlierClassifier

OutlierClassifier(
    embedding_dim: int,
    initial_bits: int = 14,      # Bits for initial PCA LSH
    recovery_bits: int = 8,       # Bits for recovery PCA
    dense_threshold: int = 10,    # Min bucket size for "dense"
    intra_outlier_std: float = 2.0,  # Std threshold for intra-bucket outliers
    recovery_cluster_min: int = 3,   # Min cluster size for "recovered"
    seed: int = 31
)

Methods:

  • fit(embeddings) - Fit on numpy array (n_samples, embedding_dim)
  • fit_arrow(arrow_array) - Fit on PyArrow FixedSizeListArray (zero-copy)
  • get_bridge() - Get indices of bridge items
  • get_orphans() - Get indices of orphan items
  • get_statuses() - Get status for all items
  • report() - Get classification report

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dyf_rs-0.2.1-cp311-cp311-win_amd64.whl (544.7 kB view details)

Uploaded CPython 3.11Windows x86-64

dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl (13.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_aarch64.whl (5.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

dyf_rs-0.2.1-cp311-cp311-macosx_11_0_arm64.whl (654.5 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

dyf_rs-0.2.1-cp311-cp311-macosx_10_12_x86_64.whl (691.7 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file dyf_rs-0.2.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: dyf_rs-0.2.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 544.7 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dyf_rs-0.2.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a0109ab73c6d68f8194b048a5cdb0bed93480510948b72110167a0da74be8774
MD5 910b44703436ab60d508cfebc3f4c09c
BLAKE2b-256 9c3691a35d40c73086603fc0cc66e1a87ccaf5aa42543a88eb7c82dc3d78df12

See more details on using hashes here.

Provenance

The following attestation bundles were made for dyf_rs-0.2.1-cp311-cp311-win_amd64.whl:

Publisher: build-wheels.yml on jdonaldson/dyf-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ff5851011a3e68683569e5e3c9281cf127033ea4f7c862a4c41ea69935204d3c
MD5 247d4c243d1b5a65d09c0faac13b54e2
BLAKE2b-256 903b1022413c3c970f0fa369d4ba7ef462c7112d4c46f9102f69ff202a619afc

See more details on using hashes here.

Provenance

The following attestation bundles were made for dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: build-wheels.yml on jdonaldson/dyf-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 200e3abc06b29647332a260410da5f97d71af3b6e7b6d8d0f7a3e471d1f16daa
MD5 68ca54ec0c0dc871075689d2b45e6c80
BLAKE2b-256 0c713b6bcb3ea71c161fc68aebd2a40c24249fbd2d6671fb3cb7f7c0c881a837

See more details on using hashes here.

Provenance

The following attestation bundles were made for dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_aarch64.whl:

Publisher: build-wheels.yml on jdonaldson/dyf-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dyf_rs-0.2.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dyf_rs-0.2.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6e380dd8b3c23426d441adbe81148f84ce80941c4059087d0124b23f4d13cadd
MD5 1575e968ed92571bfc6a44a5db1f2127
BLAKE2b-256 b97fb74a51285620bcb28f00f9f894c6725479b479933f01c407ce28856bebf8

See more details on using hashes here.

Provenance

The following attestation bundles were made for dyf_rs-0.2.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: build-wheels.yml on jdonaldson/dyf-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dyf_rs-0.2.1-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dyf_rs-0.2.1-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 27001adbad95f878ca8ad2f320a16a1fd8aa2eb0e1a6bbe8971e0ca67351395a
MD5 3a723ca4244f2f8c1559bafe57a80170
BLAKE2b-256 05bea981269d0a28a80be1ab47d42d0ac0b9b4cc8d07b3a6baa1227b09ece85f

See more details on using hashes here.

Provenance

The following attestation bundles were made for dyf_rs-0.2.1-cp311-cp311-macosx_10_12_x86_64.whl:

Publisher: build-wheels.yml on jdonaldson/dyf-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page