Density Yields Features - Rust core for structure discovery in embedding spaces
Project description
DYF - Outlier Classification
Fast outlier classification using PCA-based LSH. Identifies three types of items in embedding spaces:
- Dense: Items in well-populated semantic buckets
- Bridge: Sparse items that find community via recovery PCA (connect clusters)
- Orphan: Truly unique items with no semantic neighbors
Installation
pip install dyf
Quick Start
import numpy as np
from dyf import OutlierClassifier
# Create embeddings (e.g., from sentence-transformers)
embeddings = np.random.randn(10000, 384).astype(np.float32)
# Classify outliers
classifier = OutlierClassifier(embedding_dim=384)
classifier.fit(embeddings)
# Get results
print(classifier.report())
bridge = classifier.get_bridge() # Indices of bridge items
orphans = classifier.get_orphans() # Indices of orphan items
Performance
~60ms for 60K embeddings (384 dimensions) - 3.8x faster than pure Python/sklearn.
API
OutlierClassifier
OutlierClassifier(
embedding_dim: int,
initial_bits: int = 14, # Bits for initial PCA LSH
recovery_bits: int = 8, # Bits for recovery PCA
dense_threshold: int = 10, # Min bucket size for "dense"
intra_outlier_std: float = 2.0, # Std threshold for intra-bucket outliers
recovery_cluster_min: int = 3, # Min cluster size for "recovered"
seed: int = 31
)
Methods:
fit(embeddings)- Fit on numpy array (n_samples, embedding_dim)fit_arrow(arrow_array)- Fit on PyArrow FixedSizeListArray (zero-copy)get_bridge()- Get indices of bridge itemsget_orphans()- Get indices of orphan itemsget_statuses()- Get status for all itemsreport()- Get classification report
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dyf_rs-0.2.1-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: dyf_rs-0.2.1-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 544.7 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0109ab73c6d68f8194b048a5cdb0bed93480510948b72110167a0da74be8774
|
|
| MD5 |
910b44703436ab60d508cfebc3f4c09c
|
|
| BLAKE2b-256 |
9c3691a35d40c73086603fc0cc66e1a87ccaf5aa42543a88eb7c82dc3d78df12
|
Provenance
The following attestation bundles were made for dyf_rs-0.2.1-cp311-cp311-win_amd64.whl:
Publisher:
build-wheels.yml on jdonaldson/dyf-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dyf_rs-0.2.1-cp311-cp311-win_amd64.whl -
Subject digest:
a0109ab73c6d68f8194b048a5cdb0bed93480510948b72110167a0da74be8774 - Sigstore transparency entry: 815528154
- Sigstore integration time:
-
Permalink:
jdonaldson/dyf-core@00957d3084574f0dcc4eb465b5e9ad5e753892f9 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/jdonaldson
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-wheels.yml@00957d3084574f0dcc4eb465b5e9ad5e753892f9 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 13.0 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff5851011a3e68683569e5e3c9281cf127033ea4f7c862a4c41ea69935204d3c
|
|
| MD5 |
247d4c243d1b5a65d09c0faac13b54e2
|
|
| BLAKE2b-256 |
903b1022413c3c970f0fa369d4ba7ef462c7112d4c46f9102f69ff202a619afc
|
Provenance
The following attestation bundles were made for dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl:
Publisher:
build-wheels.yml on jdonaldson/dyf-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_x86_64.whl -
Subject digest:
ff5851011a3e68683569e5e3c9281cf127033ea4f7c862a4c41ea69935204d3c - Sigstore transparency entry: 815528145
- Sigstore integration time:
-
Permalink:
jdonaldson/dyf-core@00957d3084574f0dcc4eb465b5e9ad5e753892f9 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/jdonaldson
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-wheels.yml@00957d3084574f0dcc4eb465b5e9ad5e753892f9 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 5.9 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
200e3abc06b29647332a260410da5f97d71af3b6e7b6d8d0f7a3e471d1f16daa
|
|
| MD5 |
68ca54ec0c0dc871075689d2b45e6c80
|
|
| BLAKE2b-256 |
0c713b6bcb3ea71c161fc68aebd2a40c24249fbd2d6671fb3cb7f7c0c881a837
|
Provenance
The following attestation bundles were made for dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_aarch64.whl:
Publisher:
build-wheels.yml on jdonaldson/dyf-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dyf_rs-0.2.1-cp311-cp311-manylinux_2_28_aarch64.whl -
Subject digest:
200e3abc06b29647332a260410da5f97d71af3b6e7b6d8d0f7a3e471d1f16daa - Sigstore transparency entry: 815528130
- Sigstore integration time:
-
Permalink:
jdonaldson/dyf-core@00957d3084574f0dcc4eb465b5e9ad5e753892f9 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/jdonaldson
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-wheels.yml@00957d3084574f0dcc4eb465b5e9ad5e753892f9 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file dyf_rs-0.2.1-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: dyf_rs-0.2.1-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 654.5 kB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e380dd8b3c23426d441adbe81148f84ce80941c4059087d0124b23f4d13cadd
|
|
| MD5 |
1575e968ed92571bfc6a44a5db1f2127
|
|
| BLAKE2b-256 |
b97fb74a51285620bcb28f00f9f894c6725479b479933f01c407ce28856bebf8
|
Provenance
The following attestation bundles were made for dyf_rs-0.2.1-cp311-cp311-macosx_11_0_arm64.whl:
Publisher:
build-wheels.yml on jdonaldson/dyf-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dyf_rs-0.2.1-cp311-cp311-macosx_11_0_arm64.whl -
Subject digest:
6e380dd8b3c23426d441adbe81148f84ce80941c4059087d0124b23f4d13cadd - Sigstore transparency entry: 815528108
- Sigstore integration time:
-
Permalink:
jdonaldson/dyf-core@00957d3084574f0dcc4eb465b5e9ad5e753892f9 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/jdonaldson
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-wheels.yml@00957d3084574f0dcc4eb465b5e9ad5e753892f9 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file dyf_rs-0.2.1-cp311-cp311-macosx_10_12_x86_64.whl.
File metadata
- Download URL: dyf_rs-0.2.1-cp311-cp311-macosx_10_12_x86_64.whl
- Upload date:
- Size: 691.7 kB
- Tags: CPython 3.11, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27001adbad95f878ca8ad2f320a16a1fd8aa2eb0e1a6bbe8971e0ca67351395a
|
|
| MD5 |
3a723ca4244f2f8c1559bafe57a80170
|
|
| BLAKE2b-256 |
05bea981269d0a28a80be1ab47d42d0ac0b9b4cc8d07b3a6baa1227b09ece85f
|
Provenance
The following attestation bundles were made for dyf_rs-0.2.1-cp311-cp311-macosx_10_12_x86_64.whl:
Publisher:
build-wheels.yml on jdonaldson/dyf-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dyf_rs-0.2.1-cp311-cp311-macosx_10_12_x86_64.whl -
Subject digest:
27001adbad95f878ca8ad2f320a16a1fd8aa2eb0e1a6bbe8971e0ca67351395a - Sigstore transparency entry: 815528118
- Sigstore integration time:
-
Permalink:
jdonaldson/dyf-core@00957d3084574f0dcc4eb465b5e9ad5e753892f9 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/jdonaldson
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-wheels.yml@00957d3084574f0dcc4eb465b5e9ad5e753892f9 -
Trigger Event:
workflow_dispatch
-
Statement type: