Skip to main content

Auto-analysis and sampling of point clouds for SDF (Signed Distance Field) training data generation

Project description

sdf-sampler

Auto-analysis and sampling of point clouds for SDF (Signed Distance Field) training data generation.

A lightweight, standalone Python package for generating SDF training hints from point clouds. Automatically detects SOLID (inside) and EMPTY (outside) regions and generates training samples suitable for SDF regression models.

Installation

pip install sdf-sampler

For additional I/O format support (PLY, LAS/LAZ):

pip install sdf-sampler[io]

Command-Line Interface

sdf-sampler provides a CLI for common workflows:

# Run as module
python -m sdf_sampler --help

# Or use the installed command
sdf-sampler --help

Commands

pipeline - Full workflow (recommended)

Run the complete pipeline: analyze point cloud → generate samples → export.

# Basic usage
sdf-sampler pipeline scan.ply -o training_data.parquet

# With options
sdf-sampler pipeline scan.ply \
    -o training_data.parquet \
    -n 50000 \
    -s inverse_square \
    --save-constraints constraints.json \
    -v

Options:

  • -o, --output: Output parquet file (default: <input>_samples.parquet)
  • -n, --total-samples: Number of samples to generate (default: 10000)
  • -s, --strategy: Sampling strategy: constant, density, inverse_square (default: inverse_square)
  • -a, --algorithms: Specific algorithms to run (default: all)
  • --save-constraints: Also save constraints to JSON
  • --seed: Random seed for reproducibility
  • -v, --verbose: Verbose output

analyze - Detect regions

Analyze a point cloud to detect SOLID/EMPTY regions.

sdf-sampler analyze scan.ply -o constraints.json -v

Options:

  • -o, --output: Output JSON file (default: <input>_constraints.json)
  • -a, --algorithms: Algorithms to run (see below)
  • --no-hull-filter: Disable hull filtering
  • -v, --verbose: Verbose output

sample - Generate training samples

Generate training samples from a constraints file.

sdf-sampler sample scan.ply constraints.json -o samples.parquet -n 50000

Options:

  • -o, --output: Output parquet file
  • -n, --total-samples: Number of samples (default: 10000)
  • -s, --strategy: Sampling strategy (default: inverse_square)
  • --seed: Random seed
  • -v, --verbose: Verbose output

info - Inspect files

Show information about point clouds, constraints, or sample files.

sdf-sampler info scan.ply
sdf-sampler info constraints.json
sdf-sampler info samples.parquet

Python SDK

Quick Start

from sdf_sampler import SDFAnalyzer, SDFSampler, load_point_cloud

# 1. Load point cloud (supports PLY, LAS, CSV, NPZ, Parquet)
xyz, normals = load_point_cloud("scan.ply")

# 2. Auto-analyze to detect EMPTY/SOLID regions
analyzer = SDFAnalyzer()
result = analyzer.analyze(xyz=xyz, normals=normals)
print(f"Generated {len(result.constraints)} constraints")

# 3. Generate training samples
sampler = SDFSampler()
samples = sampler.generate(
    xyz=xyz,
    constraints=result.constraints,
    strategy="inverse_square",
    total_samples=50000,
)

# 4. Export to parquet
sampler.export_parquet(samples, "training_data.parquet")

SDFAnalyzer

Analyzes point clouds to detect SOLID and EMPTY regions.

from sdf_sampler import SDFAnalyzer
from sdf_sampler.config import AnalyzerConfig, AutoAnalysisOptions

# With default config
analyzer = SDFAnalyzer()

# With custom config
analyzer = SDFAnalyzer(config=AnalyzerConfig(
    min_gap_size=0.10,      # Minimum gap for flood fill
    max_grid_dim=200,       # Maximum voxel grid dimension
    cone_angle=15.0,        # Ray propagation cone angle
    hull_filter_enabled=True,  # Filter outside X-Y hull
))

# Run analysis
result = analyzer.analyze(
    xyz=xyz,                    # (N, 3) point positions
    normals=normals,            # (N, 3) point normals (optional)
    algorithms=["flood_fill", "voxel_regions"],  # Which algorithms to run
)

# Access results
print(f"Total constraints: {result.summary.total_constraints}")
print(f"SOLID: {result.summary.solid_constraints}")
print(f"EMPTY: {result.summary.empty_constraints}")

# Get constraint dicts for sampling
constraints = result.constraints

Analysis Algorithms

Algorithm Description Output
flood_fill Detects EMPTY (outside) regions by ray propagation from sky Box or SamplePoint constraints
voxel_regions Detects SOLID (underground) regions Box or SamplePoint constraints
normal_offset Generates paired SOLID/EMPTY boxes along surface normals Box constraints
normal_idw Inverse distance weighted sampling along normals SamplePoint constraints
pocket Detects interior cavities Pocket constraints

SDFSampler

Generates training samples from constraints.

from sdf_sampler import SDFSampler
from sdf_sampler.config import SamplerConfig

# With default config
sampler = SDFSampler()

# With custom config
sampler = SDFSampler(config=SamplerConfig(
    total_samples=10000,
    inverse_square_base_samples=100,
    inverse_square_falloff=2.0,
    near_band=0.02,
))

# Generate samples
samples = sampler.generate(
    xyz=xyz,                     # Point cloud for distance computation
    constraints=constraints,      # From analyzer.analyze().constraints
    strategy="inverse_square",    # Sampling strategy
    seed=42,                      # For reproducibility
)

# Export
sampler.export_parquet(samples, "output.parquet")

# Or get DataFrame
df = sampler.to_dataframe(samples)

Sampling Strategies

Strategy Description
constant Fixed number of samples per constraint
density Samples proportional to constraint volume
inverse_square More samples near surface, fewer far away (recommended)

Constraint Types

The analyzer generates various constraint types:

  • BoxConstraint: Axis-aligned bounding box
  • SphereConstraint: Spherical region
  • SamplePointConstraint: Direct point with signed distance
  • PocketConstraint: Detected cavity region

Each constraint has:

  • sign: "solid" (negative SDF) or "empty" (positive SDF)
  • weight: Sample weight (default 1.0)

I/O Helpers

from sdf_sampler import load_point_cloud, export_parquet

# Load various formats
xyz, normals = load_point_cloud("scan.ply")    # PLY (requires trimesh)
xyz, normals = load_point_cloud("scan.las")    # LAS/LAZ (requires laspy)
xyz, normals = load_point_cloud("scan.csv")    # CSV with x,y,z columns
xyz, normals = load_point_cloud("scan.npz")    # NumPy archive
xyz, normals = load_point_cloud("scan.parquet") # Parquet

# Export samples
export_parquet(samples, "output.parquet")

Output Format

The exported parquet file contains columns:

Column Type Description
x, y, z float 3D position
phi float Signed distance (negative=solid, positive=empty)
nx, ny, nz float Normal vector (if available)
weight float Sample weight
source string Sample origin (e.g., "box_solid", "flood_fill_empty")
is_surface bool Whether sample is on surface
is_free bool Whether sample is in free space (EMPTY)

Configuration Reference

AnalyzerConfig

Option Default Description
min_gap_size 0.10 Minimum gap size for flood fill (meters)
max_grid_dim 200 Maximum voxel grid dimension
cone_angle 15.0 Ray propagation cone half-angle (degrees)
normal_offset_pairs 40 Number of box pairs for normal_offset
idw_sample_count 1000 Total IDW samples
idw_max_distance 0.5 Maximum IDW distance (meters)
hull_filter_enabled True Filter outside X-Y alpha shape
hull_alpha 1.0 Alpha shape parameter

SamplerConfig

Option Default Description
total_samples 10000 Default total samples
samples_per_primitive 100 Samples per constraint (CONSTANT)
samples_per_cubic_meter 10000 Sample density (DENSITY)
inverse_square_base_samples 100 Base samples (INVERSE_SQUARE)
inverse_square_falloff 2.0 Falloff exponent
near_band 0.02 Near-band width
seed 0 Random seed

Integration with Ubik

sdf-sampler is the core analysis engine for Ubik, an interactive web application for SDF labeling. Use sdf-sampler directly for:

  • Automated batch processing pipelines
  • Integration into ML training workflows
  • Custom analysis scripts

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdf_sampler-0.3.0.tar.gz (106.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdf_sampler-0.3.0-py3-none-any.whl (43.1 kB view details)

Uploaded Python 3

File details

Details for the file sdf_sampler-0.3.0.tar.gz.

File metadata

  • Download URL: sdf_sampler-0.3.0.tar.gz
  • Upload date:
  • Size: 106.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for sdf_sampler-0.3.0.tar.gz
Algorithm Hash digest
SHA256 a80cb26fc4cacd1407993d6a2b004563df9ba88934e55743de1921b03226d1b2
MD5 6d0c2f8ce037130791117fb4c8e9eda1
BLAKE2b-256 0582054ce7043789fb20ee52342a0bb252afd9d3c84c51c528354fb18d3ea181

See more details on using hashes here.

File details

Details for the file sdf_sampler-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: sdf_sampler-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 43.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for sdf_sampler-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cf535a5ce39da6f1a6003f0719a144fccfc955f2f9acc3e1b13959c679028952
MD5 a746da9a1dff253894898cc99816b734
BLAKE2b-256 8b773f78553b23d50ea9ee5cf6b0cc896cb0e62720a5815edf50b380985a1644

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page