Skip to main content

Density Yields Features - Rust core for structure discovery in embedding spaces

Project description

DYF-RS - Density Yields Features (Rust Core)

Rust-accelerated core for DYF. Discover structure in embedding spaces using density-based LSH.

  • Dense: Core items in well-populated semantic regions
  • Bridge: Transitional items connecting different clusters
  • Orphan: Unique items with no semantic neighbors

Installation

pip install dyf-rs

For the full Python package with serialization, embedding generation, and LLM labeling:

pip install dyf

Quick Start

import numpy as np
from dyf_rs import DensityClassifier

# Your embeddings (e.g., from sentence-transformers)
embeddings = np.random.randn(10000, 384).astype(np.float32)

# Find structure
classifier = DensityClassifier(embedding_dim=384)
classifier.fit(embeddings)

# What did we find?
print(classifier.report())
# Corpus: 10000 items
#   Dense: 9500 (95.0%)
#   Bridge: 450 (4.5%)
#   Orphan: 50 (0.5%)

# Get indices
bridges = classifier.get_bridge()  # Transitional items
orphans = classifier.get_orphans() # Unique items

Performance

Dataset Time Per item
60K embeddings (384d) ~60ms 1.0 µs

~4x faster than pure Python/sklearn.

API

DensityClassifier

DensityClassifier(
    embedding_dim: int,
    initial_bits: int = 14,      # LSH resolution
    recovery_bits: int = 8,      # Coarser recovery resolution
    dense_threshold: int = 10,   # Min bucket size for "dense"
    seed: int = 31
)

# Methods
classifier.fit(embeddings)
classifier.fit_arrow(arrow_array)  # Zero-copy from PyArrow
classifier.get_dense()             # Dense item indices
classifier.get_bridge()            # Bridge item indices
classifier.get_orphans()           # Orphan item indices
classifier.get_bucket_id(idx)      # Which bucket is item in?
classifier.report()                # Summary statistics

See Also

  • dyf - Full Python package with serialization, configs, and LLM labeling
  • Curvo FDA Navigator - DYF in action on 2.69M FDA medical devices

License

Proprietary

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dyf_rs-0.3.0-cp311-cp311-win_amd64.whl (545.0 kB view details)

Uploaded CPython 3.11Windows x86-64

dyf_rs-0.3.0-cp311-cp311-manylinux_2_28_x86_64.whl (13.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

dyf_rs-0.3.0-cp311-cp311-manylinux_2_28_aarch64.whl (5.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

dyf_rs-0.3.0-cp311-cp311-macosx_11_0_arm64.whl (655.2 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

dyf_rs-0.3.0-cp311-cp311-macosx_10_12_x86_64.whl (691.9 kB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file dyf_rs-0.3.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: dyf_rs-0.3.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 545.0 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for dyf_rs-0.3.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5a69a7fd8c83f4820b3fa4276ab67a07062bff8108d787bbec3e9e482ba31476
MD5 36f74078e11ac9ad3ab5042f83ee7a1d
BLAKE2b-256 6f0aa4ab8a4acb76266c7f9b09c55ca7ae2f9cf12a9a1d19d3ff1e5b4ecceed5

See more details on using hashes here.

File details

Details for the file dyf_rs-0.3.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for dyf_rs-0.3.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 00da09a9f5ac8a9430c74cb10e413ce41e9901ad3a149b8cdf512ad2f63f5bbc
MD5 d5f7b8c389e1bee8ab201fa5e5f25a4f
BLAKE2b-256 c88a77937d4b9b04988c3823dcd3ced3dfb0f3ec4bafcd76db65e61c608bfb1e

See more details on using hashes here.

File details

Details for the file dyf_rs-0.3.0-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for dyf_rs-0.3.0-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 15ab30f72268a0c257cd613b65003c0b982db026a1f281a9b99a34a61edd8177
MD5 df0f71fb380fb7dee1281b85129c3a1a
BLAKE2b-256 97f4d5963d6aa92e1f75b8ab122d50fa1a861eef2e3ab8d836420165f0344b3c

See more details on using hashes here.

File details

Details for the file dyf_rs-0.3.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dyf_rs-0.3.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d429b46007e734a5f7cec672c889ed75ef9c07af91897a030076c5b0e0f7b84a
MD5 42dd0f4ee5bf41412ce7c9e5505432f4
BLAKE2b-256 dbd51645696ace6813e205fc057267379eb82039baed25fc105f731707e6ca65

See more details on using hashes here.

File details

Details for the file dyf_rs-0.3.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dyf_rs-0.3.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 36049e67dee2a0701881f7419ca2d678e3404135fe2ebc35cc8af3b233f224ff
MD5 c4d57f795d85160f7598de1872dd6eda
BLAKE2b-256 d30997849e254df730b81a62b0d02645e1b55cea7c0076d9c554be7d96be4b3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page