Skip to main content

Density Yields Features - Rust core for structure discovery in embedding spaces

Project description

DYF-RS - Density Yields Features (Rust Core)

Rust-accelerated core for DYF. Discover structure in embedding spaces using density-based LSH.

  • Dense: Core items in well-populated semantic regions
  • Bridge: Transitional items connecting different clusters
  • Orphan: Unique items with no semantic neighbors

Installation

pip install dyf-rs

For the full Python package with serialization, embedding generation, and LLM labeling:

pip install dyf

Quick Start

import numpy as np
from dyf_rs import DensityClassifier

# Your embeddings (e.g., from sentence-transformers)
embeddings = np.random.randn(10000, 384).astype(np.float32)

# Find structure
classifier = DensityClassifier(embedding_dim=384)
classifier.fit(embeddings)

# What did we find?
print(classifier.report())
# Corpus: 10000 items
#   Dense: 9500 (95.0%)
#   Bridge: 450 (4.5%)
#   Orphan: 50 (0.5%)

# Get indices
bridges = classifier.get_bridge()  # Transitional items
orphans = classifier.get_orphans() # Unique items

Performance

Dataset Time Per item
60K embeddings (384d) ~60ms 1.0 µs

~4x faster than pure Python/sklearn.

API

DensityClassifier

DensityClassifier(
    embedding_dim: int,
    initial_bits: int = 14,      # LSH resolution
    recovery_bits: int = 8,      # Coarser recovery resolution
    dense_threshold: int = 10,   # Min bucket size for "dense"
    seed: int = 31
)

# Methods
classifier.fit(embeddings)
classifier.fit_arrow(arrow_array)  # Zero-copy from PyArrow
classifier.get_dense()             # Dense item indices
classifier.get_bridge()            # Bridge item indices
classifier.get_orphans()           # Orphan item indices
classifier.get_bucket_id(idx)      # Which bucket is item in?
classifier.report()                # Summary statistics

See Also

  • dyf - Full Python package with serialization, configs, and LLM labeling
  • Curvo FDA Navigator - DYF in action on 2.69M FDA medical devices

License

Proprietary

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dyf_rs-0.4.0.tar.gz (40.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dyf_rs-0.4.0-cp311-cp311-macosx_11_0_arm64.whl (654.9 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file dyf_rs-0.4.0.tar.gz.

File metadata

  • Download URL: dyf_rs-0.4.0.tar.gz
  • Upload date:
  • Size: 40.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for dyf_rs-0.4.0.tar.gz
Algorithm Hash digest
SHA256 bb6484a385ab5fcb15145a983da079fca5fbe8937cf38306fbe9a76482e6ad9f
MD5 95eb3ee687c5cd80a1dc12973103d88e
BLAKE2b-256 429edec9de49cd149bde4973f54dca40b85c2c69a4420b988df629a15c47c60d

See more details on using hashes here.

File details

Details for the file dyf_rs-0.4.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dyf_rs-0.4.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 16928b37acb79b2f671cc60d7da233edf59e8c20f6bc9423b08db57bfb7e5534
MD5 f553f26c296843bb5c23ab6ace045a4a
BLAKE2b-256 b3b3dbdb1aa7934911fde909a613a288091bc4fd15daa50dea3b74cbd29cf6bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page