Ultra-fast Gaussian Splatting PLY I/O library - pure Python with NumPy and Numba

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

opsiclear

These details have not been verified by PyPI

Project description

gsply

Ultra-Fast Gaussian Splatting PLY I/O Library

93M Gaussians/sec read | 57M Gaussians/sec write | Auto-optimized

Quick API Preview

from gsply import plyread, plywrite

# Read PLY file (auto-detects format, zero-copy)
data = plyread("model.ply")

# Unpack to individual arrays
means, scales, quats, opacities, sh0, shN = data.unpack()

# Write PLY file (automatically optimized)
plywrite("output.ply", data)

# Or write with individual arrays
plywrite("output.ply", means, scales, quats, opacities, sh0, shN)

Performance: 93M Gaussians/sec read, 57M Gaussians/sec write (400K Gaussians in 6-7ms)

Installation | Features | Documentation | Benchmarks

Overview

Ultra-fast Gaussian Splatting PLY I/O for Python. Zero-copy reads, auto-optimized writes, optional GPU acceleration.

Key Features:

Fast: 93M Gaussians/sec read, 57M Gaussians/sec write (zero-copy)
Auto-optimized: Writes are 2.6-2.8x faster automatically
Pure Python: NumPy + Numba (no C++ compilation)
Format support: Uncompressed PLY + PlayCanvas compressed (71-74% smaller)
GPU ready: Optional PyTorch integration with GSTensor

Features

Performance

Peak throughput: 93M Gaussians/sec read, 57M Gaussians/sec write
Auto-optimized writes: 2.6-2.8x faster automatically via consolidation
Zero-copy paths: Additional 2.8x speedup for data from plyread() (total 7-8x)
Benchmarks (400K Gaussians):
- SH0: Read 5.7ms (70 M/s), Write 7-22ms (18-57 M/s)
- SH3: Read 31ms (13 M/s), Write 35-96ms (4-11 M/s)
- Compressed: 71-74% smaller, 15-110ms writes

Capabilities

Format support: Uncompressed PLY + PlayCanvas compressed format
SH degrees: Supports SH0-SH3 (14-59 properties)
Auto-detection: Automatically detects format and SH degree
GPU acceleration: Optional PyTorch integration (GSTensor)
In-memory compression: Compress/decompress without disk I/O
Type-safe: Full type hints for Python 3.10+

Installation

pip install gsply

Dependencies: NumPy and Numba (auto-installed)

Optional GPU acceleration:

pip install torch  # For GSTensor GPU features

Quick Start

Basic Usage

from gsply import plyread, plywrite

# Read PLY file (auto-detects format)
data = plyread("model.ply")

# Access fields
positions = data.means    # (N, 3) xyz coordinates
colors = data.sh0         # (N, 3) RGB colors
scales = data.scales      # (N, 3) scale parameters
rotations = data.quats    # (N, 4) quaternions

# Unpack to individual arrays
means, scales, quats, opacities, sh0, shN = data.unpack()

# Write (automatically optimized)
plywrite("output.ply", data)

# Write compressed (71-74% smaller)
plywrite("output.ply", data, compressed=True)

Advanced Features

from gsply import detect_format, compress_to_bytes, decompress_from_bytes

# Detect format before reading
is_compressed, sh_degree = detect_format("model.ply")

# In-memory compression
compressed_bytes = compress_to_bytes(data)
data_restored = decompress_from_bytes(compressed_bytes)

# GPU acceleration (requires PyTorch)
from gsply import GSTensor
gstensor = GSTensor.from_gsdata(data, device='cuda')

API Reference

Quick Navigation:

Core I/O
- plyread() - Read PLY files
- plywrite() - Write PLY files
- detect_format() - Detect format and SH degree
GSData - CPU dataclass container
- data.unpack() - Unpack to tuple
- data.to_dict() - Convert to dictionary
- data.copy() - Deep copy
- data.consolidate() - Optimize for slicing
- data[index] - Indexing and slicing
- len(data) - Get number of Gaussians
Compression APIs
- compress_to_bytes() - Compress to bytes
- compress_to_arrays() - Compress to arrays
- decompress_from_bytes() - Decompress bytes
Utility Functions
- sh2rgb() - SH to RGB conversion
- rgb2sh() - RGB to SH conversion
- SH_C0 - SH normalization constant
GSTensor (GPU) - PyTorch integration
- GSTensor.from_gsdata() - Convert to GPU
- gstensor.to_gsdata() - Convert to CPU
- gstensor.to() - Device/dtype transfer
- gstensor.cpu() / cuda() - Device shortcuts
- gstensor.half() / float() / double() - Precision conversion
- gstensor.consolidate() - Optimize for slicing
- gstensor.clone() - Deep copy
- gstensor.unpack() - Unpack to tuple
- gstensor.to_dict() - Convert to dictionary
- gstensor[index] - Indexing and slicing
- len(gstensor) - Get number of Gaussians
- Properties & Helpers - device, dtype, get_sh_degree(), has_high_order_sh()

Core I/O

`plyread(file_path)`

Read Gaussian Splatting PLY file (auto-detects format).

Always uses zero-copy optimization for maximum performance.

Parameters:

file_path (str | Path): Path to PLY file

Returns: GSData dataclass with Gaussian parameters:

means: (N, 3) - Gaussian centers
scales: (N, 3) - Log scales
quats: (N, 4) - Rotations as quaternions (wxyz)
opacities: (N,) - Logit opacities
sh0: (N, 3) - DC spherical harmonics
shN: (N, K, 3) - Higher-order SH coefficients (K=0 for degree 0, K=9 for degree 1, etc.)
masks: (N,) - Boolean mask for filtering Gaussians
_base: (N, P) - Internal array for zero-copy views (private)

Performance:

Uncompressed: 5.7ms for 400K Gaussians (70M/sec), 12.8ms for 1M (78M/sec peak)
Compressed: 8.5ms for 400K Gaussians (47M/sec), 16.7ms for 1M (60M/sec)
Scales linearly with data size

Example:

from gsply import plyread

# Zero-copy reading - up to 78M Gaussians/sec
data = plyread("model.ply")
print(f"Loaded {data.means.shape[0]} Gaussians with SH degree {data.shN.shape[1]}")

# Access via attributes
positions = data.means
colors = data.sh0

# Unpack for standard GS workflows
means, scales, quats, opacities, sh0, shN = data.unpack()

# Or exclude shN for SH0 data
means, scales, quats, opacities, sh0 = data.unpack(include_shN=False)

# Or get as dictionary
props = data.to_dict()

`plywrite(file_path, means, scales, quats, opacities, sh0, shN=None, compressed=False)`

Write Gaussian Splatting PLY file.

Parameters:

file_path (str | Path): Output PLY file path (auto-adjusted to .compressed.ply if compressed=True)
means (np.ndarray): Shape (N, 3) - Gaussian centers
scales (np.ndarray): Shape (N, 3) - Log scales
quats (np.ndarray): Shape (N, 4) - Rotations as quaternions (wxyz)
opacities (np.ndarray): Shape (N,) - Logit opacities
sh0 (np.ndarray): Shape (N, 3) - DC spherical harmonics
shN (np.ndarray, optional): Shape (N, K, 3) or (N, K*3) - Higher-order SH
compressed (bool): If True, write compressed format and auto-adjust extension

Format Selection:

compressed=False or .ply extension -> Uncompressed format (fast)
compressed=True -> Compressed format, saves as .compressed.ply automatically
.compressed.ply or .ply_compressed extension -> Compressed format

Performance:

Uncompressed SH0: 3.9ms for 100K (26M/s), 19.3ms for 400K (21M/s), 62.2ms for 1M (16M/s)
Uncompressed SH3: 24.6ms for 100K (4.1M/s), 121.5ms for 400K (3.3M/s), 316.5ms for 1M (3.2M/s)
Compressed SH0: 3.4ms for 100K (29M/s), 15.0ms for 400K (27M/s), 35.5ms for 1M (28M/s) - 71% smaller
Compressed SH3: 22.5ms for 100K (4.5M/s), 110.5ms for 400K (3.6M/s), 210ms for 1M (4.8M/s) - 74% smaller
Up to 2.9x faster when writing data loaded from PLY (zero-copy optimization)

Example:

from gsply import plywrite

# Write uncompressed (fast, ~8ms for 400K Gaussians)
plywrite("output.ply", means, scales, quats, opacities, sh0, shN)

# Write compressed (saves as "output.compressed.ply", ~63ms, 3.4x smaller)
plywrite("output.ply", means, scales, quats, opacities, sh0, shN, compressed=True)

`detect_format(file_path)`

Detect PLY format type and SH degree.

Parameters:

file_path (str | Path): Path to PLY file

Returns: Tuple of (is_compressed, sh_degree):

is_compressed (bool): True if compressed format
sh_degree (int | None): 0-3 for uncompressed, None for compressed/unknown

Example:

from gsply import detect_format

is_compressed, sh_degree = detect_format("model.ply")
if is_compressed:
    print("Compressed PlayCanvas format")
else:
    print(f"Uncompressed format with SH degree {sh_degree}")

GSData

Container dataclass for Gaussian Splatting data with zero-copy optimization.

GSData is returned by plyread() and provides efficient access to Gaussian parameters through both direct attributes and convenience methods. All arrays are mutable and can be modified in-place. Arrays can be views into a shared _base array for maximum performance (zero memory overhead).

Attributes:

means (np.ndarray): Shape (N, 3) - Gaussian centers (xyz positions)
scales (np.ndarray): Shape (N, 3) - Log scales for each axis
quats (np.ndarray): Shape (N, 4) - Rotations as quaternions (wxyz order)
opacities (np.ndarray): Shape (N,) - Logit opacities (before sigmoid)
sh0 (np.ndarray): Shape (N, 3) - DC spherical harmonics (RGB color basis)
shN (np.ndarray | None): Shape (N, K, 3) - Higher-order SH coefficients
- K=0 for SH degree 0 (no higher-order)
- K=9 for SH degree 1
- K=24 for SH degree 2
- K=45 for SH degree 3
masks (np.ndarray): Shape (N,) boolean - Mask for filtering (initialized to all True)
_base (np.ndarray | None): Shape (N, P) - Private base array (auto-managed, do not modify)

Example:

from gsply import plyread

data = plyread("scene.ply")
print(f"Loaded {len(data)} Gaussians")

# Direct attribute access
positions = data.means
colors = data.sh0

# Mutable - modify in place
data.means[0] = [1, 2, 3]
data.sh0 *= 1.5  # Make brighter

`data.unpack(include_shN=True)`

Unpack Gaussian data into tuple of individual arrays.

Most useful for passing data to rendering functions that expect separate arrays rather than a container object.

Parameters:

include_shN (bool): If True, include shN in output (default: True)

Returns:

If include_shN=True: (means, scales, quats, opacities, sh0, shN)
If include_shN=False: (means, scales, quats, opacities, sh0)

Example:

data = plyread("scene.ply")

# Full unpacking (recommended for SH1-3)
means, scales, quats, opacities, sh0, shN = data.unpack()
render(means, scales, quats, opacities, sh0, shN)

# Without higher-order SH (recommended for SH0)
means, scales, quats, opacities, sh0 = data.unpack(include_shN=False)
render(means, scales, quats, opacities, sh0)

# Tuple unpacking for plywrite
plywrite("output.ply", *data.unpack())

`data.to_dict()`

Convert Gaussian data to dictionary for keyword argument unpacking.

Useful when calling functions that accept keyword arguments matching the Gaussian parameter names.

Returns:

Dictionary with keys: means, scales, quats, opacities, sh0, shN

Example:

data = plyread("scene.ply")

# Dictionary unpacking
props = data.to_dict()
render(**props)  # Unpack as kwargs

# Access by key
positions = props['means']
colors = props['sh0']

`data.copy()`

Create deep copy of GSData with independent arrays.

Modifications to the copy will not affect the original data. Optimized to use _base array when available (faster than copying individual arrays).

Returns:

GSData: New GSData object with copied arrays

Example:

data = plyread("scene.ply")

# Create independent copy
data_copy = data.copy()
data_copy.means[0] = 0  # Doesn't affect original

# Use for creating variations
bright = data.copy()
bright.sh0 *= 1.5  # Make brighter

`data.consolidate()`

Consolidate separate arrays into single base array for faster slicing operations.

Creates a _base array from separate arrays, which improves performance for boolean masking operations (1.5x faster). Only beneficial if you plan to perform many boolean mask operations on the same data.

Returns:

GSData: New GSData with _base array, or self if already consolidated

Performance:

One-time cost: ~2ms per 100K Gaussians
Benefit: 1.5x faster boolean masking
Most useful before multiple filter operations

Example:

data = plyread("scene.ply")

# Consolidate for faster filtering
data_consolidated = data.consolidate()

# Now boolean masking is 1.5x faster
high_opacity = data_consolidated[data_consolidated.opacities > 0.5]
low_opacity = data_consolidated[data_consolidated.opacities <= 0.5]

`data[index]`

Slice GSData using standard Python indexing.

Supports integers, slices, boolean masks, and fancy indexing. Returns views when possible (zero-copy).

Indexing Modes:

Integer: data[0] - Returns tuple of (means, scales, quats, opacities, sh0, shN, masks)
Slice: data[100:200] - Returns new GSData with subset
Step: data[::10] - Returns every 10th Gaussian
Boolean mask: data[mask] - Filter by boolean array
Fancy: data[[0, 10, 20]] - Select specific indices

Example:

data = plyread("scene.ply")

# Single Gaussian (returns tuple)
means, scales, quats, opacities, sh0, shN, masks = data[0]

# Slice (returns GSData)
subset = data[100:200]

# Boolean mask (returns GSData)
high_opacity = data[data.opacities > 0.5]

# Step slicing (returns GSData)
every_10th = data[::10]

`len(data)`

Get number of Gaussians in the dataset.

Returns:

int: Number of Gaussians (equivalent to data.means.shape[0])

Example:

data = plyread("scene.ply")
print(f"Loaded {len(data)} Gaussians")

Compression APIs

`compress_to_bytes(data)`

Compress Gaussian splatting data to bytes (PlayCanvas format) without writing to disk.

Useful for network transfer, streaming, or custom storage solutions.

Parameters:

data (GSData): Gaussian data from plyread() or created manually
- Alternative: Pass individual arrays for backward compatibility

Returns: bytes: Complete compressed PLY file as bytes

Example:

from gsply import plyread, compress_to_bytes

# Method 1: Clean API with GSData (recommended)
data = plyread("model.ply")
compressed_bytes = compress_to_bytes(data)  # Simple!

# Method 2: Individual arrays (backward compatible)
compressed_bytes = compress_to_bytes(
    means, scales, quats, opacities, sh0, shN
)

# Send over network or store in database
with open("output.compressed.ply", "wb") as f:
    f.write(compressed_bytes)

`compress_to_arrays(data)`

Compress Gaussian splatting data to component arrays (PlayCanvas format).

Returns separate components for custom processing or partial updates.

Parameters:

data (GSData): Gaussian data from plyread() or created manually
- Alternative: Pass individual arrays for backward compatibility

Returns: Tuple containing:

header_bytes (bytes): PLY header as bytes
chunk_bounds (np.ndarray): Shape (num_chunks, 18) float32 - Chunk boundary array
packed_data (np.ndarray): Shape (N, 4) uint32 - Main compressed data
packed_sh (np.ndarray | None): Shape varies, uint8 - Compressed SH data if present

Example:

from gsply import plyread, compress_to_arrays
from io import BytesIO

# Method 1: Clean API with GSData (recommended)
data = plyread("model.ply")
header, chunks, packed, sh = compress_to_arrays(data)  # Simple!

# Method 2: Individual arrays (backward compatible)
header, chunks, packed, sh = compress_to_arrays(
    means, scales, quats, opacities, sh0, shN
)

# Process components individually
print(f"Header size: {len(header)} bytes")
print(f"Chunks: {chunks.shape[0]} chunks")
print(f"Packed data: {packed.nbytes} bytes")

# Manually assemble if needed
buffer = BytesIO()
buffer.write(header)
buffer.write(chunks.tobytes())
buffer.write(packed.tobytes())
if sh is not None:
    buffer.write(sh.tobytes())

compressed_bytes = buffer.getvalue()

`decompress_from_bytes(compressed_bytes)`

Decompress Gaussian splatting data from bytes (PlayCanvas format) without reading from disk.

Symmetric with compress_to_bytes() - perfect for network transfer, streaming, or custom storage.

Parameters:

compressed_bytes (bytes): Complete compressed PLY file as bytes

Returns: GSData dataclass with decompressed Gaussian parameters:

means: (N, 3) - Gaussian centers
scales: (N, 3) - Log scales
quats: (N, 4) - Rotations as quaternions (wxyz)
opacities: (N,) - Logit opacities
sh0: (N, 3) - DC spherical harmonics
shN: (N, K, 3) - Higher-order SH coefficients
masks: (N,) - Boolean mask (all True for decompressed data)
_base: None (not applicable for decompressed data)

Example:

from gsply import compress_to_bytes, decompress_from_bytes, plyread

# Example 1: Round-trip without disk I/O
data = plyread("model.ply")
compressed = compress_to_bytes(data)
data_restored = decompress_from_bytes(compressed)
# data_restored is ready to use!

# Example 2: Network transfer
# Sender side
compressed_bytes = compress_to_bytes(data)
# send compressed_bytes over network...

# Receiver side
# ...receive compressed_bytes from network
data = decompress_from_bytes(compressed_bytes)
# No temporary files needed!

# Example 3: Database storage
import sqlite3
conn = sqlite3.connect('gaussians.db')
conn.execute('CREATE TABLE IF NOT EXISTS models (id INTEGER, data BLOB)')
# Store
compressed = compress_to_bytes(data)
conn.execute('INSERT INTO models VALUES (?, ?)', (1, compressed))
# Retrieve
row = conn.execute('SELECT data FROM models WHERE id = 1').fetchone()
data_restored = decompress_from_bytes(row[0])

Note: PlayCanvas compression is lossy (quantization). Decompressed data will be very close to but not exactly identical to the original.

Utility Functions

`sh2rgb(sh)`

Convert spherical harmonic DC coefficients to RGB colors.

Converts the DC component (sh0) of spherical harmonics to standard RGB color values in the range [0, 1]. Useful for visualization and color manipulation.

Parameters:

sh (np.ndarray | float): SH DC coefficients - Shape (N, 3) or scalar

Returns:

np.ndarray | float: RGB colors in [0, 1] range

Example:

from gsply import plyread, sh2rgb

data = plyread("scene.ply")

# Convert SH to RGB for visualization
rgb_colors = sh2rgb(data.sh0)
print(f"First color: RGB({rgb_colors[0, 0]:.3f}, {rgb_colors[0, 1]:.3f}, {rgb_colors[0, 2]:.3f})")

# Modify colors in RGB space
rgb_colors *= 1.5  # Make brighter
data.sh0 = rgb2sh(np.clip(rgb_colors, 0, 1))  # Convert back

`rgb2sh(rgb)`

Convert RGB colors to spherical harmonic DC coefficients.

Converts standard RGB color values in the range [0, 1] to the DC component (sh0) of spherical harmonics. Inverse of sh2rgb().

Parameters:

rgb (np.ndarray | float): RGB colors in [0, 1] range - Shape (N, 3) or scalar

Returns:

np.ndarray | float: SH DC coefficients

Example:

from gsply import rgb2sh, plywrite
import numpy as np

# Create Gaussians with specific RGB colors
n = 1000
means = np.random.randn(n, 3).astype(np.float32)
scales = np.ones((n, 3), dtype=np.float32) * 0.01
quats = np.tile([1, 0, 0, 0], (n, 1)).astype(np.float32)
opacities = np.ones(n, dtype=np.float32)

# Set colors in RGB space
rgb_colors = np.random.rand(n, 3).astype(np.float32)  # Random colors
sh0 = rgb2sh(rgb_colors)  # Convert to SH

plywrite("colored.ply", means, scales, quats, opacities, sh0, None)

`SH_C0`

Constant for spherical harmonic DC coefficient normalization.

This constant (0.28209479177387814) is used in the conversion between SH coefficients and RGB colors. It represents the normalization factor for the 0th order spherical harmonic.

Type: float

Value: 0.28209479177387814

Example:

from gsply import SH_C0

# Manual conversion (equivalent to sh2rgb/rgb2sh)
rgb = sh * SH_C0 + 0.5  # SH to RGB
sh = (rgb - 0.5) / SH_C0  # RGB to SH

GPU Support (PyTorch)

Optional GPU acceleration with PyTorch tensors for training and inference workflows.

Installation

PyTorch is optional. GSTensor features are always included in gsply but only work when PyTorch is installed.

# Install gsply first
pip install gsply

# Then install PyTorch if you need GPU acceleration
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

gsply will automatically detect PyTorch and enable GSTensor if available. Without PyTorch, gsply works normally for CPU-only workflows.

GSTensor - GPU-Accelerated Dataclass

GSTensor is a PyTorch-backed version of GSData that enables GPU-accelerated operations:

from gsply import plyread, GSTensor

# Load data from disk (CPU NumPy)
data = plyread("model.ply")

# Convert to GPU tensors (11x faster with _base optimization)
gstensor = GSTensor.from_gsdata(data, device='cuda')

# Access GPU tensors
positions_gpu = gstensor.means  # torch.Tensor on GPU
colors_gpu = gstensor.sh0       # torch.Tensor on GPU

# Unpack for rendering functions (NEW!)
means, scales, quats, opacities, sh0, shN = gstensor.unpack()
rendered = render_gaussians(means, scales, quats, opacities, sh0)

# Or use dict unpacking
rendered = render_gaussians(**gstensor.to_dict())

# Slice on GPU (zero-cost views)
subset = gstensor[100:200]      # Returns GSTensor view

# Training workflow
gstensor_trainable = GSTensor.from_gsdata(data, device='cuda', requires_grad=True)
loss = render_loss(gstensor_trainable.means, ...)
loss.backward()

# Convert back to CPU NumPy
data_cpu = gstensor.to_gsdata()

Key Features

11x Faster GPU Transfer: When data has _base (from plyread() or consolidate()), GPU transfer is 11x faster than manual stacking
Zero-Copy Views: GPU slicing creates views (no memory overhead)
Device Management: Seamless transfer between CPU/GPU with .to(), .cpu(), .cuda()
Training Support: Optional gradient tracking with requires_grad=True
Type Conversions: half(), float(), double() for precision control
Optimized Slicing: 25x faster boolean masking with consolidate()

Performance

GPU Transfer (400K Gaussians, SH0, RTX 3090 Ti):

With _base optimization: 1.99 ms (zero CPU copy overhead)
Without _base (fallback): 22.78 ms (requires CPU stacking)
Speedup: 11.4x faster with _base

Memory Efficiency:

Single tensor transfer vs 5 separate transfers
50% less I/O (no CPU copy when using _base)
GPU views are free (zero additional memory)

API Reference

`GSTensor.from_gsdata(data, device='cuda', dtype=torch.float32, requires_grad=False)`

Convert GSData to GSTensor.

Parameters:

data (GSData): Input Gaussian data
device (str | torch.device): Target device ('cuda', 'cpu', or torch.device)
dtype (torch.dtype): Target dtype (default: torch.float32)
requires_grad (bool): Enable gradient tracking (default: False)

Returns:

GSTensor: GPU-accelerated tensor container

Example:

# Fast path (uses _base if available)
gstensor = GSTensor.from_gsdata(data, device='cuda')

# For training
gstensor = GSTensor.from_gsdata(data, device='cuda', requires_grad=True)

# Half precision for memory savings
gstensor = GSTensor.from_gsdata(data, device='cuda', dtype=torch.float16)

`gstensor.to_gsdata()`

Convert GSTensor back to GSData (CPU NumPy).

Returns:

GSData: CPU NumPy container

Example:

gstensor = GSTensor.from_gsdata(data, device='cuda')
# ... GPU operations ...
data_cpu = gstensor.to_gsdata()  # Back to NumPy

`gstensor.to(device=None, dtype=None)`

Move tensors to different device and/or dtype.

Parameters:

device (str | torch.device, optional): Target device
dtype (torch.dtype, optional): Target dtype

Returns:

GSTensor: New GSTensor on target device/dtype

Example:

gstensor_gpu = gstensor.to('cuda')
gstensor_half = gstensor.to(dtype=torch.float16)
gstensor_gpu_half = gstensor.to('cuda', dtype=torch.float16)

`gstensor.consolidate()`

Create _base tensor for 25x faster slicing.

Returns:

GSTensor: New GSTensor with _base tensor

Example:

# Consolidate for faster slicing
gstensor = gstensor.consolidate()

# Boolean masking is now 25x faster
mask = gstensor.opacities > 0.5
subset = gstensor[mask]  # Fast with _base

`gstensor.clone()`

Create independent deep copy.

Returns:

GSTensor: Cloned GSTensor

Example:

gstensor_copy = gstensor.clone()
gstensor_copy.means[0] = 0  # Doesn't affect original

`gstensor.cpu()`

Move tensors to CPU.

Shorthand for gstensor.to('cpu').

Returns:

GSTensor: GSTensor on CPU

Example:

gstensor_gpu = GSTensor.from_gsdata(data, device='cuda')
gstensor_cpu = gstensor_gpu.cpu()  # Now on CPU

`gstensor.cuda(device=None)`

Move tensors to GPU.

Shorthand for gstensor.to('cuda').

Parameters:

device (int | None): GPU device index (default: None = cuda:0)

Returns:

GSTensor: GSTensor on GPU

Example:

gstensor_gpu = gstensor.cuda()  # Move to cuda:0
gstensor_gpu1 = gstensor.cuda(1)  # Move to cuda:1

`gstensor.half()`, `gstensor.float()`, `gstensor.double()`

Convert tensor precision.

Convenience methods for dtype conversion:

half() - Convert to torch.float16
float() - Convert to torch.float32
double() - Convert to torch.float64

Returns:

GSTensor: GSTensor with new dtype

Example:

# Half precision for memory savings (2x less VRAM)
gstensor_fp16 = gstensor.half()

# Back to full precision
gstensor_fp32 = gstensor_fp16.float()

# Double precision for high accuracy
gstensor_fp64 = gstensor.double()

`gstensor.unpack(include_shN=True)`

Unpack GSTensor into tuple of individual tensors.

Identical to GSData.unpack() but returns PyTorch tensors instead of NumPy arrays.

Parameters:

include_shN (bool): If True, include shN in output (default: True)

Returns:

If include_shN=True: (means, scales, quats, opacities, sh0, shN)
If include_shN=False: (means, scales, quats, opacities, sh0)

Example:

gstensor = GSTensor.from_gsdata(data, device='cuda')

# Full unpacking for rendering
means, scales, quats, opacities, sh0, shN = gstensor.unpack()
rendered = render_gaussians(means, scales, quats, opacities, sh0, shN)

# Without higher-order SH
means, scales, quats, opacities, sh0 = gstensor.unpack(include_shN=False)

`gstensor.to_dict()`

Convert GSTensor to dictionary for keyword argument unpacking.

Identical to GSData.to_dict() but returns PyTorch tensors instead of NumPy arrays.

Returns:

Dictionary with keys: means, scales, quats, opacities, sh0, shN

Example:

gstensor = GSTensor.from_gsdata(data, device='cuda')

# Dictionary unpacking
props = gstensor.to_dict()
rendered = render_gaussians(**props)

`gstensor[index]`

Slice GSTensor using standard Python indexing.

Supports integers, slices, boolean masks, and fancy indexing. Returns views when possible (zero-copy on GPU).

Indexing Modes:

Integer: gstensor[0] - Returns tuple of tensors
Slice: gstensor[100:200] - Returns new GSTensor with subset
Step: gstensor[::10] - Returns every 10th Gaussian
Boolean mask: gstensor[mask] - Filter by boolean tensor
Fancy: gstensor[[0, 10, 20]] - Select specific indices

Example:

gstensor = GSTensor.from_gsdata(data, device='cuda')

# Single Gaussian (returns tuple)
means, scales, quats, opacities, sh0, shN, masks = gstensor[0]

# Slice (returns GSTensor view - zero memory cost)
subset = gstensor[100:200]

# Boolean mask (returns GSTensor)
high_opacity = gstensor[gstensor.opacities > 0.5]

# Step slicing (returns GSTensor)
every_10th = gstensor[::10]

`len(gstensor)`

Get number of Gaussians.

Returns:

int: Number of Gaussians (equivalent to gstensor.means.shape[0])

Example:

gstensor = GSTensor.from_gsdata(data, device='cuda')
print(f"Processing {len(gstensor)} Gaussians on GPU")

`gstensor.device` (property)

Get current device of tensors.

Returns:

torch.device: Current device (e.g., torch.device('cuda:0') or torch.device('cpu'))

Example:

print(f"Tensors are on {gstensor.device}")
if gstensor.device.type == 'cuda':
    print(f"Using GPU {gstensor.device.index}")

`gstensor.dtype` (property)

Get current dtype of tensors.

Returns:

torch.dtype: Current dtype (e.g., torch.float32, torch.float16)

Example:

print(f"Using precision: {gstensor.dtype}")

`gstensor.get_sh_degree()`

Get spherical harmonic degree from data shape.

Returns:

int: SH degree (0-3)

Example:

sh_degree = gstensor.get_sh_degree()
print(f"Data has SH degree {sh_degree}")

`gstensor.has_high_order_sh()`

Check if data has higher-order spherical harmonics.

Returns:

bool: True if SH degree > 0

Example:

if gstensor.has_high_order_sh():
    print("Has higher-order SH coefficients")
else:
    print("Only DC component (SH0)")

Complete Workflow Examples

Training Workflow

import gsply
from gsply import GSTensor
import torch

# Load from disk
data = gsply.plyread("scene.ply")  # Has _base -> fast GPU transfer

# Transfer to GPU (11x faster with _base)
gstensor = GSTensor.from_gsdata(data, device='cuda', requires_grad=True)

# Training loop
optimizer = torch.optim.Adam([gstensor.means, gstensor.scales], lr=0.01)

for epoch in range(100):
    optimizer.zero_grad()

    # Unpack for rendering (cleaner API)
    means, scales, quats, opacities, sh0, shN = gstensor.unpack()
    loss = render_gaussians(means, scales, quats, opacities, sh0)

    loss.backward()
    optimizer.step()

# Save optimized results
optimized_data = gstensor.to_gsdata()
gsply.plywrite("optimized.ply", optimized_data.means, optimized_data.scales,
               optimized_data.quats, optimized_data.opacities,
               optimized_data.sh0, optimized_data.shN)

Inference Workflow

import gsply
from gsply import GSTensor
import torch

# Load scene
data = gsply.plyread("scene.ply")

# Transfer to GPU (inference mode, no gradients)
gstensor = GSTensor.from_gsdata(data, device='cuda', requires_grad=False)

# Filter Gaussians by opacity threshold
high_opacity_mask = gstensor.opacities > 0.5
filtered = gstensor[high_opacity_mask]

# Render filtered scene with unpacking
with torch.no_grad():
    means, scales, quats, opacities, sh0, shN = filtered.unpack()
    rendered = render_gaussians(means, scales, quats, opacities, sh0)

# Save filtered version
filtered_data = filtered.to_gsdata()
gsply.plywrite("filtered.ply", filtered_data.means, filtered_data.scales,
               filtered_data.quats, filtered_data.opacities,
               filtered_data.sh0, filtered_data.shN)

Performance

Benchmark Results

Comprehensive performance benchmarks (source: BENCHMARK_SUMMARY.md):

Uncompressed Format Performance

Gaussians	SH	Read (ms)	Write (ms)	Read (M/s)	Write (M/s)
100K	0	1.5	3.9	68.1	26.0
400K	0	5.7	19.3	70.0	21.0
1M	0	12.8	62.2	78.0	16.1
100K	3	6.9	24.6	14.4	4.1
400K	3	31.1	121.5	12.9	3.3
1M	3	81.8	316.5	12.2	3.2

Compressed Format Performance

Gaussians	SH	Read (ms)	Write (ms)	Read (M/s)	Write (M/s)	Size Reduction
100K	0	2.8	3.4	35.4	29.4	71%
400K	0	8.5	15.0	47.0	26.6	71%
1M	0	16.7	35.5	60.0	28.2	71%
100K	3	30.5	22.5	3.3	4.5	74%
400K	3	25.1	110.5	16.0	3.6	74%
1M	3	256.4	210.0	3.9	4.8	74%

Key Performance Highlights

Peak Read Speed: 78M Gaussians/sec (1M Gaussians, SH0, uncompressed)
Peak Write Speed: 29M Gaussians/sec (100K Gaussians, SH0, compressed)
Uncompressed Read (SH0): 68M/s (100K), 70M/s (400K), 78M/s (1M)
Uncompressed Write (SH0): 26M/s (100K), 21M/s (400K), 16M/s (1M)
Uncompressed SH3: Read 12-14M/s, Write 3-4M/s (scales linearly)
Compressed Read (SH0): 35M/s (100K), 47M/s (400K), 60M/s (1M)
Compressed Write (SH0): 29M/s (100K), 27M/s (400K), 28M/s (1M)
Compressed SH3: Read 16M/s (400K), Write 3.6M/s (400K) with 74% size reduction
Compression Benefits: 71-74% file size reduction across all SH degrees
Scalability: Linear scaling verified up to 1M Gaussians
Real-World Validation: Benchmarks verified on both synthetic and real 4D Gaussian Splatting PLY files

Optimization Details

Zero-copy reads: Direct memory views without data duplication
Zero-copy writes: When data has _base array (from plyread), use directly without copying
Parallel processing: Numba JIT compilation with parallel chunk operations
Smart caching: LRU cache for frequently used headers
Lookup tables: Eliminate branching for SH degree detection
Fast-path checks: Skip unnecessary dtype conversions
Single file handle: Reduce file open/close syscall overhead

Why gsply is Faster

Read Performance (4.3-8x speedup):

gsply: Optimized bulk header read + np.fromfile() + zero-copy views
- Bulk header reading: Single 8KB read + decode (vs. N readline() calls)
- Reads entire binary data as contiguous block in one system call
- Creates memory views directly into the data array (no copies)
- Base array kept alive via GSData container's reference counting
- Consistent performance: Works equally well on real-world and random data
plyfile: Line-by-line header + individual property accesses per element
- Multiple readline() + decode operations for header parsing
- Accesses each property separately through PLY structure
- Stacks columns together requiring multiple memory allocations and copies
- Generic PLY parser handles arbitrary formats with overhead
- Data-dependent performance: 10x slower on random/synthetic data vs real-world structured data

Write Performance:

gsply: Pre-computed templates + pre-allocated array + buffered I/O
- Pre-computed header templates: Avoids dynamic string building in loops
- Buffered I/O: 2MB buffer for large files reduces system call overhead
- Allocates single contiguous array with exact dtype needed
- Fills array via direct slice assignment (no intermediate structures)
- Used when data created from scratch (no _base array) or for SH1-3
- Performance (SH0): 30M Gaussians/sec (100K), 19M Gaussians/sec (400K), 16M Gaussians/sec (1M)
- Performance (SH3): 1.9M Gaussians/sec (100K), 1.4M Gaussians/sec (1M)
plyfile: Dynamic header + per-property assignments + PLY construction
- Builds header dynamically with loop + f-string formatting
- Creates PLY element structure with per-property descriptors
- Assigns each property individually through PLY abstraction layer
- Additional overhead from generic format handling

Key Insight: gsply's performance comes from recognizing that Gaussian Splatting PLY files follow a fixed format, allowing bulk operations and zero-copy views instead of generic PLY parsing.

Format Support

Uncompressed PLY

Standard binary little-endian PLY format with Gaussian Splatting properties:

SH Degree	Properties	Description
0	14	xyz, f_dc(3), opacity, scales(3), quats(4)
1	23	+ 9 f_rest coefficients
2	38	+ 24 f_rest coefficients
3	59	+ 45 f_rest coefficients

Compressed PLY (PlayCanvas)

Chunk-based quantized format with automatic extension handling:

File extension: Automatically saves as .compressed.ply when compressed=True
Compression ratio: 3.4x for SH0 (3.8-14.5x depending on SH degree)
Chunk size: 256 Gaussians per chunk
Bit-packed data: 11-10-11 bits (position/scale), 2+10-10-10 bits (quaternion)
Parallel decompression: 14.74ms for 400K Gaussians (27M Gaussians/sec)
Parallel compression: 63ms for 400K Gaussians (6.3M Gaussians/sec) with radix sort
Compatible with: PlayCanvas, SuperSplat, other WebGL viewers

For format details, see docs/COMPRESSED_FORMAT.md.

Development

Setup

# Clone repository
git clone https://github.com/OpsiClear/gsply.git
cd gsply

# Install in development mode
pip install -e .[dev]

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ -v --cov=gsply --cov-report=html

Project Structure

gsply/
├── src/
│   └── gsply/
│       ├── __init__.py        # Public API
│       ├── gsdata.py          # GSData dataclass
│       ├── reader.py          # PLY reading (uncompressed + compressed)
│       ├── writer.py          # PLY writing (uncompressed + compressed)
│       ├── formats.py         # Format detection and specs
│       ├── torch/             # Optional PyTorch integration
│       │   ├── __init__.py
│       │   └── gstensor.py    # GSTensor GPU dataclass
│       └── py.typed           # PEP 561 type marker
├── tests/                     # Unit tests (169 tests)
├── benchmarks/                # Performance benchmarks
├── docs/                      # Documentation
│   ├── CHANGELOG.md           # Version changelog
│   └── archive/               # Historical documentation
├── .github/                   # CI/CD workflows
├── pyproject.toml             # Package configuration
└── README.md                  # This file

Benchmarking

Compare gsply performance against other PLY libraries:

# Install benchmark dependencies
pip install -e .[benchmark]

# Run benchmark with default settings
python benchmarks/benchmark.py

# Custom test file and iterations
python benchmarks/benchmark.py --config.file path/to/model.ply --config.iterations 20

# Skip write benchmarks
python benchmarks/benchmark.py --config.skip-write

The benchmark measures:

Read performance: Time to load PLY file into numpy arrays
Write performance: Time to write numpy arrays to PLY file
File sizes: Comparison of output file sizes
Verification: Output equivalence between libraries

Example output:

READ PERFORMANCE (50K Gaussians, SH degree 3)
Library         Time            Speedup
gsply (fast)    2.89ms          baseline (FASTEST)
gsply (safe)    4.75ms          0.61x (1.6x slower than fast)
plyfile         18.23ms         0.16x (6.3x SLOWER)
Open3D          43.10ms         0.07x (14.9x slower)

WRITE PERFORMANCE
Library         Time            Speedup         File Size
gsply           8.72ms          baseline (FASTEST)    11.34MB
plyfile         12.18ms         0.72x (1.4x slower)   11.34MB
Open3D          35.69ms         0.24x (4.1x slower)   1.15MB (XYZ only)

Testing

gsply has comprehensive test coverage with 169 passing tests:

# Run all tests (NumPy/Numba core)
pytest tests/ -v

# Run PyTorch tests (requires torch installed)
pytest tests/ -v -k "torch or gstensor"

# Run specific test file
pytest tests/test_reader.py -v

# Run with coverage report
pytest tests/ -v --cov=gsply --cov-report=html

Test categories:

Core I/O: Format detection, reading, writing, round-trip consistency
GSData: Dataclass operations, slicing, masking, consolidation
Compressed format: PlayCanvas compression/decompression
GSTensor (PyTorch): GPU transfer, slicing, device management, conversions
Performance: Optimization verification, benchmark validation
Error handling: Invalid files, malformed data, edge cases

Documentation

gsply includes comprehensive documentation:

docs/CHANGELOG.md - Version changelog and release notes
benchmarks/TRANSFER_OPTIMIZATION_ANALYSIS.md - GPU transfer optimization analysis
benchmarks/QUICK_REFERENCE.md - Performance quick reference
docs/archive/ - Historical documentation from development phases

CI/CD

gsply includes a complete GitHub Actions CI/CD pipeline:

Multi-platform testing: Ubuntu, Windows, macOS
Multi-version testing: Python 3.10, 3.11, 3.12, 3.13
Core + PyTorch testing: Separate test jobs for NumPy/Numba core and PyTorch integration
Automated benchmarking: Performance tracking on PRs
Build verification: Wheel building and installation testing
PyPI publishing: Trusted publishing on GitHub Release
Pip caching: Fast CI runs with dependency caching

Contributing

Contributions are welcome! Please see .github/CONTRIBUTING.md for guidelines.

Quick start:

Fork the repository
Create a feature branch
Make your changes with tests
Run tests and benchmarks
Submit a pull request

License

MIT License - see LICENSE file for details.

Citation

If you use gsply in your research, please cite:

@software{gsply2024,
  author = {OpsiClear},
  title = {gsply: Ultra-Fast Gaussian Splatting PLY I/O},
  year = {2024},
  url = {https://github.com/OpsiClear/gsply}
}

Related Projects

gsplat: CUDA-accelerated Gaussian Splatting rasterizer
nerfstudio: NeRF training framework with Gaussian Splatting support
PlayCanvas SuperSplat: Web-based Gaussian Splatting viewer
3D Gaussian Splatting: Original paper and implementation

Made with Python and numpy

Report Bug | Request Feature | Documentation

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

opsiclear

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.16

Feb 27, 2026

0.2.15

Feb 27, 2026

0.2.14

Feb 26, 2026

0.2.13

Dec 6, 2025

0.2.12

Dec 5, 2025

0.2.11

Nov 30, 2025

0.2.10

Nov 25, 2025

0.2.8

Nov 21, 2025

0.2.7

Nov 19, 2025

0.2.6

Nov 19, 2025

0.2.5

Nov 19, 2025

0.2.4

Nov 19, 2025

0.2.2

Nov 17, 2025

0.2.1

Nov 16, 2025

This version

0.2.0

Nov 15, 2025

0.1.0

Nov 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gsply-0.2.0.tar.gz (93.3 kB view details)

Uploaded Nov 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gsply-0.2.0-py3-none-any.whl (49.4 kB view details)

Uploaded Nov 15, 2025 Python 3

File details

Details for the file gsply-0.2.0.tar.gz.

File metadata

Download URL: gsply-0.2.0.tar.gz
Upload date: Nov 15, 2025
Size: 93.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gsply-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c57c0e59061235b8ffecde4b4c44672b924e5e12b7414ad43a30c7dd5e4fa00b`
MD5	`1c49a2cbd110ede60b86cfb6ca1f7bcd`
BLAKE2b-256	`7058264f236df12dd4fa21e378fd0f817dc0a7f6e89d2b4ccfa58da27eb4d5cd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gsply-0.2.0.tar.gz:

Publisher: publish.yml on OpsiClear/gsply

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gsply-0.2.0.tar.gz
- Subject digest: c57c0e59061235b8ffecde4b4c44672b924e5e12b7414ad43a30c7dd5e4fa00b
- Sigstore transparency entry: 701725708
- Sigstore integration time: Nov 15, 2025
Source repository:
- Permalink: OpsiClear/gsply@fc27eefdeae082e5aebd1c07c511b98c9847a590
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/OpsiClear
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fc27eefdeae082e5aebd1c07c511b98c9847a590
- Trigger Event: release

File details

Details for the file gsply-0.2.0-py3-none-any.whl.

File metadata

Download URL: gsply-0.2.0-py3-none-any.whl
Upload date: Nov 15, 2025
Size: 49.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gsply-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`222c29a7b85c0c5cfbb10d1d0cc32461277522552abfa098d5b56680a486dead`
MD5	`0aa4b6b6430b5a0c7b790d7cdbd7fa9f`
BLAKE2b-256	`acddd22cc8b97e2143fef31d021574bbaf362a3bc90485a55baa46fd9fd52782`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gsply-0.2.0-py3-none-any.whl:

Publisher: publish.yml on OpsiClear/gsply

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gsply-0.2.0-py3-none-any.whl
- Subject digest: 222c29a7b85c0c5cfbb10d1d0cc32461277522552abfa098d5b56680a486dead
- Sigstore transparency entry: 701725709
- Sigstore integration time: Nov 15, 2025
Source repository:
- Permalink: OpsiClear/gsply@fc27eefdeae082e5aebd1c07c511b98c9847a590
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/OpsiClear
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@fc27eefdeae082e5aebd1c07c511b98c9847a590
- Trigger Event: release

gsply 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

gsply

Ultra-Fast Gaussian Splatting PLY I/O Library

Quick API Preview

Overview

Features

Performance

Capabilities

Installation

Quick Start

Basic Usage

Advanced Features

API Reference

Core I/O

plyread(file_path)

plywrite(file_path, means, scales, quats, opacities, sh0, shN=None, compressed=False)

detect_format(file_path)

GSData

data.unpack(include_shN=True)

data.to_dict()

data.copy()

data.consolidate()

data[index]

len(data)

Compression APIs

compress_to_bytes(data)

compress_to_arrays(data)

decompress_from_bytes(compressed_bytes)

Utility Functions

sh2rgb(sh)

rgb2sh(rgb)

SH_C0

GPU Support (PyTorch)

Installation

GSTensor - GPU-Accelerated Dataclass

Key Features

Performance

API Reference

GSTensor.from_gsdata(data, device='cuda', dtype=torch.float32, requires_grad=False)

gstensor.to_gsdata()

gstensor.to(device=None, dtype=None)

gstensor.consolidate()

gstensor.clone()

gstensor.cpu()

gstensor.cuda(device=None)

gstensor.half(), gstensor.float(), gstensor.double()

gstensor.unpack(include_shN=True)

gstensor.to_dict()

gstensor[index]

len(gstensor)

gstensor.device (property)

gstensor.dtype (property)

gstensor.get_sh_degree()

gstensor.has_high_order_sh()

Complete Workflow Examples

Training Workflow

Inference Workflow

Performance

Benchmark Results

Key Performance Highlights

Optimization Details

Why gsply is Faster

Format Support

Uncompressed PLY

Compressed PLY (PlayCanvas)

Development

Setup

Project Structure

Benchmarking

Testing

Documentation

`plyread(file_path)`

`plywrite(file_path, means, scales, quats, opacities, sh0, shN=None, compressed=False)`

`detect_format(file_path)`

`data.unpack(include_shN=True)`

`data.to_dict()`

`data.copy()`

`data.consolidate()`

`data[index]`

`len(data)`

`compress_to_bytes(data)`

`compress_to_arrays(data)`

`decompress_from_bytes(compressed_bytes)`

`sh2rgb(sh)`

`rgb2sh(rgb)`

`SH_C0`

`GSTensor.from_gsdata(data, device='cuda', dtype=torch.float32, requires_grad=False)`

`gstensor.to_gsdata()`

`gstensor.to(device=None, dtype=None)`

`gstensor.consolidate()`

`gstensor.clone()`

`gstensor.cpu()`

`gstensor.cuda(device=None)`

`gstensor.half()`, `gstensor.float()`, `gstensor.double()`

`gstensor.unpack(include_shN=True)`

`gstensor.to_dict()`

`gstensor[index]`

`len(gstensor)`

`gstensor.device` (property)

`gstensor.dtype` (property)

`gstensor.get_sh_degree()`

`gstensor.has_high_order_sh()`