Skip to main content

Density Yields Features - Rust core for structure discovery in embedding spaces

Project description

DYF-RS - Density Yields Features (Rust Core)

Rust-accelerated core for DYF. Discover structure in embedding spaces using density-based LSH.

  • Dense: Core items in well-populated semantic regions
  • Bridge: Transitional items connecting different clusters
  • Orphan: Unique items with no semantic neighbors

Installation

pip install dyf-rs

For the full Python package with serialization, embedding generation, and LLM labeling:

pip install dyf

Quick Start

import numpy as np
from dyf_rs import DensityClassifier

# Your embeddings (e.g., from sentence-transformers)
embeddings = np.random.randn(10000, 384).astype(np.float32)

# Find structure
classifier = DensityClassifier(embedding_dim=384)
classifier.fit(embeddings)

# What did we find?
print(classifier.report())
# Corpus: 10000 items
#   Dense: 9500 (95.0%)
#   Bridge: 450 (4.5%)
#   Orphan: 50 (0.5%)

# Get indices
bridges = classifier.get_bridge()  # Transitional items
orphans = classifier.get_orphans() # Unique items

Performance

Dataset Time Per item
60K embeddings (384d) ~60ms 1.0 µs

~4x faster than pure Python/sklearn.

API

DensityClassifier

DensityClassifier(
    embedding_dim: int,
    initial_bits: int = 14,      # LSH resolution
    recovery_bits: int = 8,      # Coarser recovery resolution
    dense_threshold: int = 10,   # Min bucket size for "dense"
    seed: int = 31
)

# Methods
classifier.fit(embeddings)
classifier.fit_arrow(arrow_array)  # Zero-copy from PyArrow
classifier.get_dense()             # Dense item indices
classifier.get_bridge()            # Bridge item indices
classifier.get_orphans()           # Orphan item indices
classifier.get_bucket_id(idx)      # Which bucket is item in?
classifier.report()                # Summary statistics

See Also

  • dyf - Full Python package with serialization, configs, and LLM labeling
  • Curvo FDA Navigator - DYF in action on 2.69M FDA medical devices

License

Proprietary

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dyf_rs-0.5.0.tar.gz (47.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dyf_rs-0.5.0-cp312-cp312-macosx_11_0_arm64.whl (748.5 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file dyf_rs-0.5.0.tar.gz.

File metadata

  • Download URL: dyf_rs-0.5.0.tar.gz
  • Upload date:
  • Size: 47.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.4

File hashes

Hashes for dyf_rs-0.5.0.tar.gz
Algorithm Hash digest
SHA256 16b6793bbc7036a2caab205f0e7ee5ea1728897b2741e08b6c942ed9243116df
MD5 d58639371b27209f712cb647a83c7e3c
BLAKE2b-256 25c5327b8fdbe2a711fd5f734419dcc76c6eb3fb56512b8819dbde03c36d5eca

See more details on using hashes here.

File details

Details for the file dyf_rs-0.5.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dyf_rs-0.5.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a8d3e2c72bc5a5c97c5ad17dd12105ccf3755f00c5f995b958890300778b84d1
MD5 1d17948a867fdbe5211cab1d2d94982a
BLAKE2b-256 4cda3407e346e3fd711c0c83c29049b85c8ea0b7f27db3f6f09f07c5a5467ab2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page