Ultra-fast Gaussian Splatting PLY I/O library - pure Python with NumPy and Numba
Project description
gsply
Ultra-Fast Gaussian Splatting PLY I/O Library
93M Gaussians/sec read | 57M Gaussians/sec write | Auto-optimized
Quick API Preview
from gsply import plyread, plywrite
# Read PLY file (auto-detects format, zero-copy)
data = plyread("model.ply")
# Unpack to individual arrays
means, scales, quats, opacities, sh0, shN = data.unpack()
# Write PLY file (automatically optimized)
plywrite("output.ply", data)
# Or write with individual arrays
plywrite("output.ply", means, scales, quats, opacities, sh0, shN)
Performance: 93M Gaussians/sec read, 57M Gaussians/sec write (400K Gaussians in 6-7ms)
Installation | Features | Documentation | Benchmarks
Overview
Ultra-fast Gaussian Splatting PLY I/O for Python. Zero-copy reads, auto-optimized writes, optional GPU acceleration.
Key Features:
- Fast: 93M Gaussians/sec read, 57M Gaussians/sec write (zero-copy)
- Auto-optimized: Writes are 2.6-2.8x faster automatically
- Pure Python: NumPy + Numba (no C++ compilation)
- Format support: Uncompressed PLY + PlayCanvas compressed (71-74% smaller)
- GPU ready: Optional PyTorch integration with GSTensor
Features
Performance
- Peak throughput: 93M Gaussians/sec read, 57M Gaussians/sec write
- Auto-optimized writes: 2.6-2.8x faster automatically via consolidation
- Zero-copy paths: Additional 2.8x speedup for data from
plyread()(total 7-8x) - Benchmarks (400K Gaussians):
- SH0: Read 5.7ms (70 M/s), Write 7-22ms (18-57 M/s)
- SH3: Read 31ms (13 M/s), Write 35-96ms (4-11 M/s)
- Compressed: 71-74% smaller, 15-110ms writes
Capabilities
- Format support: Uncompressed PLY + PlayCanvas compressed format
- SH degrees: Supports SH0-SH3 (14-59 properties)
- Auto-detection: Automatically detects format and SH degree
- GPU acceleration: Optional PyTorch integration (
GSTensor) - In-memory compression: Compress/decompress without disk I/O
- Type-safe: Full type hints for Python 3.10+
Installation
pip install gsply
Dependencies: NumPy and Numba (auto-installed)
Optional GPU acceleration:
pip install torch # For GSTensor GPU features
Quick Start
Basic Usage
from gsply import plyread, plywrite
# Read PLY file (auto-detects format)
data = plyread("model.ply")
# Access fields
positions = data.means # (N, 3) xyz coordinates
colors = data.sh0 # (N, 3) RGB colors
scales = data.scales # (N, 3) scale parameters
rotations = data.quats # (N, 4) quaternions
# Unpack to individual arrays
means, scales, quats, opacities, sh0, shN = data.unpack()
# Write (automatically optimized)
plywrite("output.ply", data)
# Write compressed (71-74% smaller)
plywrite("output.ply", data, compressed=True)
Advanced Features
from gsply import detect_format, compress_to_bytes, decompress_from_bytes
# Detect format before reading
is_compressed, sh_degree = detect_format("model.ply")
# In-memory compression
compressed_bytes = compress_to_bytes(data)
data_restored = decompress_from_bytes(compressed_bytes)
# GPU acceleration (requires PyTorch)
from gsply import GSTensor
gstensor = GSTensor.from_gsdata(data, device='cuda')
API Reference
Quick Navigation:
- Core I/O
plyread()- Read PLY filesplywrite()- Write PLY filesdetect_format()- Detect format and SH degree
- GSData - CPU dataclass container
data.unpack()- Unpack to tupledata.to_dict()- Convert to dictionarydata.copy()- Deep copydata.consolidate()- Optimize for slicingdata[index]- Indexing and slicinglen(data)- Get number of Gaussians
- Compression APIs
compress_to_bytes()- Compress to bytescompress_to_arrays()- Compress to arraysdecompress_from_bytes()- Decompress bytes
- Utility Functions
- GSTensor (GPU) - PyTorch integration
GSTensor.from_gsdata()- Convert to GPUgstensor.to_gsdata()- Convert to CPUgstensor.to()- Device/dtype transfergstensor.cpu()/cuda()- Device shortcutsgstensor.half()/float()/double()- Precision conversiongstensor.consolidate()- Optimize for slicinggstensor.clone()- Deep copygstensor.unpack()- Unpack to tuplegstensor.to_dict()- Convert to dictionarygstensor[index]- Indexing and slicinglen(gstensor)- Get number of Gaussians- Properties & Helpers -
device,dtype,get_sh_degree(),has_high_order_sh()
Core I/O
plyread(file_path)
Read Gaussian Splatting PLY file (auto-detects format).
Always uses zero-copy optimization for maximum performance.
Parameters:
file_path(str | Path): Path to PLY file
Returns:
GSData dataclass with Gaussian parameters:
means: (N, 3) - Gaussian centersscales: (N, 3) - Log scalesquats: (N, 4) - Rotations as quaternions (wxyz)opacities: (N,) - Logit opacitiessh0: (N, 3) - DC spherical harmonicsshN: (N, K, 3) - Higher-order SH coefficients (K=0 for degree 0, K=9 for degree 1, etc.)masks: (N,) - Boolean mask for filtering Gaussians_base: (N, P) - Internal array for zero-copy views (private)
Performance:
- Uncompressed: 5.7ms for 400K Gaussians (70M/sec), 12.8ms for 1M (78M/sec peak)
- Compressed: 8.5ms for 400K Gaussians (47M/sec), 16.7ms for 1M (60M/sec)
- Scales linearly with data size
Example:
from gsply import plyread
# Zero-copy reading - up to 78M Gaussians/sec
data = plyread("model.ply")
print(f"Loaded {data.means.shape[0]} Gaussians with SH degree {data.shN.shape[1]}")
# Access via attributes
positions = data.means
colors = data.sh0
# Unpack for standard GS workflows
means, scales, quats, opacities, sh0, shN = data.unpack()
# Or exclude shN for SH0 data
means, scales, quats, opacities, sh0 = data.unpack(include_shN=False)
# Or get as dictionary
props = data.to_dict()
plywrite(file_path, means, scales, quats, opacities, sh0, shN=None, compressed=False)
Write Gaussian Splatting PLY file.
Parameters:
file_path(str | Path): Output PLY file path (auto-adjusted to.compressed.plyifcompressed=True)means(np.ndarray): Shape (N, 3) - Gaussian centersscales(np.ndarray): Shape (N, 3) - Log scalesquats(np.ndarray): Shape (N, 4) - Rotations as quaternions (wxyz)opacities(np.ndarray): Shape (N,) - Logit opacitiessh0(np.ndarray): Shape (N, 3) - DC spherical harmonicsshN(np.ndarray, optional): Shape (N, K, 3) or (N, K*3) - Higher-order SHcompressed(bool): If True, write compressed format and auto-adjust extension
Format Selection:
compressed=Falseor.plyextension -> Uncompressed format (fast)compressed=True-> Compressed format, saves as.compressed.plyautomatically.compressed.plyor.ply_compressedextension -> Compressed format
Performance:
- Uncompressed SH0: 3.9ms for 100K (26M/s), 19.3ms for 400K (21M/s), 62.2ms for 1M (16M/s)
- Uncompressed SH3: 24.6ms for 100K (4.1M/s), 121.5ms for 400K (3.3M/s), 316.5ms for 1M (3.2M/s)
- Compressed SH0: 3.4ms for 100K (29M/s), 15.0ms for 400K (27M/s), 35.5ms for 1M (28M/s) - 71% smaller
- Compressed SH3: 22.5ms for 100K (4.5M/s), 110.5ms for 400K (3.6M/s), 210ms for 1M (4.8M/s) - 74% smaller
- Up to 2.9x faster when writing data loaded from PLY (zero-copy optimization)
Example:
from gsply import plywrite
# Write uncompressed (fast, ~8ms for 400K Gaussians)
plywrite("output.ply", means, scales, quats, opacities, sh0, shN)
# Write compressed (saves as "output.compressed.ply", ~63ms, 3.4x smaller)
plywrite("output.ply", means, scales, quats, opacities, sh0, shN, compressed=True)
detect_format(file_path)
Detect PLY format type and SH degree.
Parameters:
file_path(str | Path): Path to PLY file
Returns: Tuple of (is_compressed, sh_degree):
is_compressed(bool): True if compressed formatsh_degree(int | None): 0-3 for uncompressed, None for compressed/unknown
Example:
from gsply import detect_format
is_compressed, sh_degree = detect_format("model.ply")
if is_compressed:
print("Compressed PlayCanvas format")
else:
print(f"Uncompressed format with SH degree {sh_degree}")
GSData
Container dataclass for Gaussian Splatting data with zero-copy optimization.
GSData is returned by plyread() and provides efficient access to Gaussian parameters through both direct attributes and convenience methods. All arrays are mutable and can be modified in-place. Arrays can be views into a shared _base array for maximum performance (zero memory overhead).
Attributes:
means(np.ndarray): Shape (N, 3) - Gaussian centers (xyz positions)scales(np.ndarray): Shape (N, 3) - Log scales for each axisquats(np.ndarray): Shape (N, 4) - Rotations as quaternions (wxyz order)opacities(np.ndarray): Shape (N,) - Logit opacities (before sigmoid)sh0(np.ndarray): Shape (N, 3) - DC spherical harmonics (RGB color basis)shN(np.ndarray | None): Shape (N, K, 3) - Higher-order SH coefficients- K=0 for SH degree 0 (no higher-order)
- K=9 for SH degree 1
- K=24 for SH degree 2
- K=45 for SH degree 3
masks(np.ndarray): Shape (N,) boolean - Mask for filtering (initialized to all True)_base(np.ndarray | None): Shape (N, P) - Private base array (auto-managed, do not modify)
Example:
from gsply import plyread
data = plyread("scene.ply")
print(f"Loaded {len(data)} Gaussians")
# Direct attribute access
positions = data.means
colors = data.sh0
# Mutable - modify in place
data.means[0] = [1, 2, 3]
data.sh0 *= 1.5 # Make brighter
data.unpack(include_shN=True)
Unpack Gaussian data into tuple of individual arrays.
Most useful for passing data to rendering functions that expect separate arrays rather than a container object.
Parameters:
include_shN(bool): If True, include shN in output (default: True)
Returns:
- If
include_shN=True:(means, scales, quats, opacities, sh0, shN) - If
include_shN=False:(means, scales, quats, opacities, sh0)
Example:
data = plyread("scene.ply")
# Full unpacking (recommended for SH1-3)
means, scales, quats, opacities, sh0, shN = data.unpack()
render(means, scales, quats, opacities, sh0, shN)
# Without higher-order SH (recommended for SH0)
means, scales, quats, opacities, sh0 = data.unpack(include_shN=False)
render(means, scales, quats, opacities, sh0)
# Tuple unpacking for plywrite
plywrite("output.ply", *data.unpack())
data.to_dict()
Convert Gaussian data to dictionary for keyword argument unpacking.
Useful when calling functions that accept keyword arguments matching the Gaussian parameter names.
Returns:
- Dictionary with keys:
means,scales,quats,opacities,sh0,shN
Example:
data = plyread("scene.ply")
# Dictionary unpacking
props = data.to_dict()
render(**props) # Unpack as kwargs
# Access by key
positions = props['means']
colors = props['sh0']
data.copy()
Create deep copy of GSData with independent arrays.
Modifications to the copy will not affect the original data. Optimized to use _base array when available (faster than copying individual arrays).
Returns:
GSData: New GSData object with copied arrays
Example:
data = plyread("scene.ply")
# Create independent copy
data_copy = data.copy()
data_copy.means[0] = 0 # Doesn't affect original
# Use for creating variations
bright = data.copy()
bright.sh0 *= 1.5 # Make brighter
data.consolidate()
Consolidate separate arrays into single base array for faster slicing operations.
Creates a _base array from separate arrays, which improves performance for boolean masking operations (1.5x faster). Only beneficial if you plan to perform many boolean mask operations on the same data.
Returns:
GSData: New GSData with_basearray, or self if already consolidated
Performance:
- One-time cost: ~2ms per 100K Gaussians
- Benefit: 1.5x faster boolean masking
- Most useful before multiple filter operations
Example:
data = plyread("scene.ply")
# Consolidate for faster filtering
data_consolidated = data.consolidate()
# Now boolean masking is 1.5x faster
high_opacity = data_consolidated[data_consolidated.opacities > 0.5]
low_opacity = data_consolidated[data_consolidated.opacities <= 0.5]
data[index]
Slice GSData using standard Python indexing.
Supports integers, slices, boolean masks, and fancy indexing. Returns views when possible (zero-copy).
Indexing Modes:
- Integer:
data[0]- Returns tuple of (means, scales, quats, opacities, sh0, shN, masks) - Slice:
data[100:200]- Returns new GSData with subset - Step:
data[::10]- Returns every 10th Gaussian - Boolean mask:
data[mask]- Filter by boolean array - Fancy:
data[[0, 10, 20]]- Select specific indices
Example:
data = plyread("scene.ply")
# Single Gaussian (returns tuple)
means, scales, quats, opacities, sh0, shN, masks = data[0]
# Slice (returns GSData)
subset = data[100:200]
# Boolean mask (returns GSData)
high_opacity = data[data.opacities > 0.5]
# Step slicing (returns GSData)
every_10th = data[::10]
len(data)
Get number of Gaussians in the dataset.
Returns:
int: Number of Gaussians (equivalent todata.means.shape[0])
Example:
data = plyread("scene.ply")
print(f"Loaded {len(data)} Gaussians")
Compression APIs
compress_to_bytes(data)
Compress Gaussian splatting data to bytes (PlayCanvas format) without writing to disk.
Useful for network transfer, streaming, or custom storage solutions.
Parameters:
data(GSData): Gaussian data fromplyread()or created manually- Alternative: Pass individual arrays for backward compatibility
Returns:
bytes: Complete compressed PLY file as bytes
Example:
from gsply import plyread, compress_to_bytes
# Method 1: Clean API with GSData (recommended)
data = plyread("model.ply")
compressed_bytes = compress_to_bytes(data) # Simple!
# Method 2: Individual arrays (backward compatible)
compressed_bytes = compress_to_bytes(
means, scales, quats, opacities, sh0, shN
)
# Send over network or store in database
with open("output.compressed.ply", "wb") as f:
f.write(compressed_bytes)
compress_to_arrays(data)
Compress Gaussian splatting data to component arrays (PlayCanvas format).
Returns separate components for custom processing or partial updates.
Parameters:
data(GSData): Gaussian data fromplyread()or created manually- Alternative: Pass individual arrays for backward compatibility
Returns: Tuple containing:
header_bytes(bytes): PLY header as byteschunk_bounds(np.ndarray): Shape (num_chunks, 18) float32 - Chunk boundary arraypacked_data(np.ndarray): Shape (N, 4) uint32 - Main compressed datapacked_sh(np.ndarray | None): Shape varies, uint8 - Compressed SH data if present
Example:
from gsply import plyread, compress_to_arrays
from io import BytesIO
# Method 1: Clean API with GSData (recommended)
data = plyread("model.ply")
header, chunks, packed, sh = compress_to_arrays(data) # Simple!
# Method 2: Individual arrays (backward compatible)
header, chunks, packed, sh = compress_to_arrays(
means, scales, quats, opacities, sh0, shN
)
# Process components individually
print(f"Header size: {len(header)} bytes")
print(f"Chunks: {chunks.shape[0]} chunks")
print(f"Packed data: {packed.nbytes} bytes")
# Manually assemble if needed
buffer = BytesIO()
buffer.write(header)
buffer.write(chunks.tobytes())
buffer.write(packed.tobytes())
if sh is not None:
buffer.write(sh.tobytes())
compressed_bytes = buffer.getvalue()
decompress_from_bytes(compressed_bytes)
Decompress Gaussian splatting data from bytes (PlayCanvas format) without reading from disk.
Symmetric with compress_to_bytes() - perfect for network transfer, streaming, or custom storage.
Parameters:
compressed_bytes(bytes): Complete compressed PLY file as bytes
Returns:
GSData dataclass with decompressed Gaussian parameters:
means: (N, 3) - Gaussian centersscales: (N, 3) - Log scalesquats: (N, 4) - Rotations as quaternions (wxyz)opacities: (N,) - Logit opacitiessh0: (N, 3) - DC spherical harmonicsshN: (N, K, 3) - Higher-order SH coefficientsmasks: (N,) - Boolean mask (all True for decompressed data)_base: None (not applicable for decompressed data)
Example:
from gsply import compress_to_bytes, decompress_from_bytes, plyread
# Example 1: Round-trip without disk I/O
data = plyread("model.ply")
compressed = compress_to_bytes(data)
data_restored = decompress_from_bytes(compressed)
# data_restored is ready to use!
# Example 2: Network transfer
# Sender side
compressed_bytes = compress_to_bytes(data)
# send compressed_bytes over network...
# Receiver side
# ...receive compressed_bytes from network
data = decompress_from_bytes(compressed_bytes)
# No temporary files needed!
# Example 3: Database storage
import sqlite3
conn = sqlite3.connect('gaussians.db')
conn.execute('CREATE TABLE IF NOT EXISTS models (id INTEGER, data BLOB)')
# Store
compressed = compress_to_bytes(data)
conn.execute('INSERT INTO models VALUES (?, ?)', (1, compressed))
# Retrieve
row = conn.execute('SELECT data FROM models WHERE id = 1').fetchone()
data_restored = decompress_from_bytes(row[0])
Note: PlayCanvas compression is lossy (quantization). Decompressed data will be very close to but not exactly identical to the original.
Utility Functions
sh2rgb(sh)
Convert spherical harmonic DC coefficients to RGB colors.
Converts the DC component (sh0) of spherical harmonics to standard RGB color values in the range [0, 1]. Useful for visualization and color manipulation.
Parameters:
sh(np.ndarray | float): SH DC coefficients - Shape (N, 3) or scalar
Returns:
np.ndarray | float: RGB colors in [0, 1] range
Example:
from gsply import plyread, sh2rgb
data = plyread("scene.ply")
# Convert SH to RGB for visualization
rgb_colors = sh2rgb(data.sh0)
print(f"First color: RGB({rgb_colors[0, 0]:.3f}, {rgb_colors[0, 1]:.3f}, {rgb_colors[0, 2]:.3f})")
# Modify colors in RGB space
rgb_colors *= 1.5 # Make brighter
data.sh0 = rgb2sh(np.clip(rgb_colors, 0, 1)) # Convert back
rgb2sh(rgb)
Convert RGB colors to spherical harmonic DC coefficients.
Converts standard RGB color values in the range [0, 1] to the DC component (sh0) of spherical harmonics. Inverse of sh2rgb().
Parameters:
rgb(np.ndarray | float): RGB colors in [0, 1] range - Shape (N, 3) or scalar
Returns:
np.ndarray | float: SH DC coefficients
Example:
from gsply import rgb2sh, plywrite
import numpy as np
# Create Gaussians with specific RGB colors
n = 1000
means = np.random.randn(n, 3).astype(np.float32)
scales = np.ones((n, 3), dtype=np.float32) * 0.01
quats = np.tile([1, 0, 0, 0], (n, 1)).astype(np.float32)
opacities = np.ones(n, dtype=np.float32)
# Set colors in RGB space
rgb_colors = np.random.rand(n, 3).astype(np.float32) # Random colors
sh0 = rgb2sh(rgb_colors) # Convert to SH
plywrite("colored.ply", means, scales, quats, opacities, sh0, None)
SH_C0
Constant for spherical harmonic DC coefficient normalization.
This constant (0.28209479177387814) is used in the conversion between SH coefficients and RGB colors. It represents the normalization factor for the 0th order spherical harmonic.
Type: float
Value: 0.28209479177387814
Example:
from gsply import SH_C0
# Manual conversion (equivalent to sh2rgb/rgb2sh)
rgb = sh * SH_C0 + 0.5 # SH to RGB
sh = (rgb - 0.5) / SH_C0 # RGB to SH
GPU Support (PyTorch)
Optional GPU acceleration with PyTorch tensors for training and inference workflows.
Installation
PyTorch is optional. GSTensor features are always included in gsply but only work when PyTorch is installed.
# Install gsply first
pip install gsply
# Then install PyTorch if you need GPU acceleration
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
gsply will automatically detect PyTorch and enable GSTensor if available. Without PyTorch, gsply works normally for CPU-only workflows.
GSTensor - GPU-Accelerated Dataclass
GSTensor is a PyTorch-backed version of GSData that enables GPU-accelerated operations:
from gsply import plyread, GSTensor
# Load data from disk (CPU NumPy)
data = plyread("model.ply")
# Convert to GPU tensors (11x faster with _base optimization)
gstensor = GSTensor.from_gsdata(data, device='cuda')
# Access GPU tensors
positions_gpu = gstensor.means # torch.Tensor on GPU
colors_gpu = gstensor.sh0 # torch.Tensor on GPU
# Unpack for rendering functions (NEW!)
means, scales, quats, opacities, sh0, shN = gstensor.unpack()
rendered = render_gaussians(means, scales, quats, opacities, sh0)
# Or use dict unpacking
rendered = render_gaussians(**gstensor.to_dict())
# Slice on GPU (zero-cost views)
subset = gstensor[100:200] # Returns GSTensor view
# Training workflow
gstensor_trainable = GSTensor.from_gsdata(data, device='cuda', requires_grad=True)
loss = render_loss(gstensor_trainable.means, ...)
loss.backward()
# Convert back to CPU NumPy
data_cpu = gstensor.to_gsdata()
Key Features
- 11x Faster GPU Transfer: When data has
_base(fromplyread()orconsolidate()), GPU transfer is 11x faster than manual stacking - Zero-Copy Views: GPU slicing creates views (no memory overhead)
- Device Management: Seamless transfer between CPU/GPU with
.to(),.cpu(),.cuda() - Training Support: Optional gradient tracking with
requires_grad=True - Type Conversions:
half(),float(),double()for precision control - Optimized Slicing: 25x faster boolean masking with
consolidate()
Performance
GPU Transfer (400K Gaussians, SH0, RTX 3090 Ti):
- With
_baseoptimization: 1.99 ms (zero CPU copy overhead) - Without
_base(fallback): 22.78 ms (requires CPU stacking) - Speedup: 11.4x faster with
_base
Memory Efficiency:
- Single tensor transfer vs 5 separate transfers
- 50% less I/O (no CPU copy when using
_base) - GPU views are free (zero additional memory)
API Reference
GSTensor.from_gsdata(data, device='cuda', dtype=torch.float32, requires_grad=False)
Convert GSData to GSTensor.
Parameters:
data(GSData): Input Gaussian datadevice(str | torch.device): Target device ('cuda', 'cpu', or torch.device)dtype(torch.dtype): Target dtype (default: torch.float32)requires_grad(bool): Enable gradient tracking (default: False)
Returns:
GSTensor: GPU-accelerated tensor container
Example:
# Fast path (uses _base if available)
gstensor = GSTensor.from_gsdata(data, device='cuda')
# For training
gstensor = GSTensor.from_gsdata(data, device='cuda', requires_grad=True)
# Half precision for memory savings
gstensor = GSTensor.from_gsdata(data, device='cuda', dtype=torch.float16)
gstensor.to_gsdata()
Convert GSTensor back to GSData (CPU NumPy).
Returns:
GSData: CPU NumPy container
Example:
gstensor = GSTensor.from_gsdata(data, device='cuda')
# ... GPU operations ...
data_cpu = gstensor.to_gsdata() # Back to NumPy
gstensor.to(device=None, dtype=None)
Move tensors to different device and/or dtype.
Parameters:
device(str | torch.device, optional): Target devicedtype(torch.dtype, optional): Target dtype
Returns:
GSTensor: New GSTensor on target device/dtype
Example:
gstensor_gpu = gstensor.to('cuda')
gstensor_half = gstensor.to(dtype=torch.float16)
gstensor_gpu_half = gstensor.to('cuda', dtype=torch.float16)
gstensor.consolidate()
Create _base tensor for 25x faster slicing.
Returns:
GSTensor: New GSTensor with_basetensor
Example:
# Consolidate for faster slicing
gstensor = gstensor.consolidate()
# Boolean masking is now 25x faster
mask = gstensor.opacities > 0.5
subset = gstensor[mask] # Fast with _base
gstensor.clone()
Create independent deep copy.
Returns:
GSTensor: Cloned GSTensor
Example:
gstensor_copy = gstensor.clone()
gstensor_copy.means[0] = 0 # Doesn't affect original
gstensor.cpu()
Move tensors to CPU.
Shorthand for gstensor.to('cpu').
Returns:
GSTensor: GSTensor on CPU
Example:
gstensor_gpu = GSTensor.from_gsdata(data, device='cuda')
gstensor_cpu = gstensor_gpu.cpu() # Now on CPU
gstensor.cuda(device=None)
Move tensors to GPU.
Shorthand for gstensor.to('cuda').
Parameters:
device(int | None): GPU device index (default: None = cuda:0)
Returns:
GSTensor: GSTensor on GPU
Example:
gstensor_gpu = gstensor.cuda() # Move to cuda:0
gstensor_gpu1 = gstensor.cuda(1) # Move to cuda:1
gstensor.half(), gstensor.float(), gstensor.double()
Convert tensor precision.
Convenience methods for dtype conversion:
half()- Convert totorch.float16float()- Convert totorch.float32double()- Convert totorch.float64
Returns:
GSTensor: GSTensor with new dtype
Example:
# Half precision for memory savings (2x less VRAM)
gstensor_fp16 = gstensor.half()
# Back to full precision
gstensor_fp32 = gstensor_fp16.float()
# Double precision for high accuracy
gstensor_fp64 = gstensor.double()
gstensor.unpack(include_shN=True)
Unpack GSTensor into tuple of individual tensors.
Identical to GSData.unpack() but returns PyTorch tensors instead of NumPy arrays.
Parameters:
include_shN(bool): If True, include shN in output (default: True)
Returns:
- If
include_shN=True:(means, scales, quats, opacities, sh0, shN) - If
include_shN=False:(means, scales, quats, opacities, sh0)
Example:
gstensor = GSTensor.from_gsdata(data, device='cuda')
# Full unpacking for rendering
means, scales, quats, opacities, sh0, shN = gstensor.unpack()
rendered = render_gaussians(means, scales, quats, opacities, sh0, shN)
# Without higher-order SH
means, scales, quats, opacities, sh0 = gstensor.unpack(include_shN=False)
gstensor.to_dict()
Convert GSTensor to dictionary for keyword argument unpacking.
Identical to GSData.to_dict() but returns PyTorch tensors instead of NumPy arrays.
Returns:
- Dictionary with keys:
means,scales,quats,opacities,sh0,shN
Example:
gstensor = GSTensor.from_gsdata(data, device='cuda')
# Dictionary unpacking
props = gstensor.to_dict()
rendered = render_gaussians(**props)
gstensor[index]
Slice GSTensor using standard Python indexing.
Supports integers, slices, boolean masks, and fancy indexing. Returns views when possible (zero-copy on GPU).
Indexing Modes:
- Integer:
gstensor[0]- Returns tuple of tensors - Slice:
gstensor[100:200]- Returns new GSTensor with subset - Step:
gstensor[::10]- Returns every 10th Gaussian - Boolean mask:
gstensor[mask]- Filter by boolean tensor - Fancy:
gstensor[[0, 10, 20]]- Select specific indices
Example:
gstensor = GSTensor.from_gsdata(data, device='cuda')
# Single Gaussian (returns tuple)
means, scales, quats, opacities, sh0, shN, masks = gstensor[0]
# Slice (returns GSTensor view - zero memory cost)
subset = gstensor[100:200]
# Boolean mask (returns GSTensor)
high_opacity = gstensor[gstensor.opacities > 0.5]
# Step slicing (returns GSTensor)
every_10th = gstensor[::10]
len(gstensor)
Get number of Gaussians.
Returns:
int: Number of Gaussians (equivalent togstensor.means.shape[0])
Example:
gstensor = GSTensor.from_gsdata(data, device='cuda')
print(f"Processing {len(gstensor)} Gaussians on GPU")
gstensor.device (property)
Get current device of tensors.
Returns:
torch.device: Current device (e.g.,torch.device('cuda:0')ortorch.device('cpu'))
Example:
print(f"Tensors are on {gstensor.device}")
if gstensor.device.type == 'cuda':
print(f"Using GPU {gstensor.device.index}")
gstensor.dtype (property)
Get current dtype of tensors.
Returns:
torch.dtype: Current dtype (e.g.,torch.float32,torch.float16)
Example:
print(f"Using precision: {gstensor.dtype}")
gstensor.get_sh_degree()
Get spherical harmonic degree from data shape.
Returns:
int: SH degree (0-3)
Example:
sh_degree = gstensor.get_sh_degree()
print(f"Data has SH degree {sh_degree}")
gstensor.has_high_order_sh()
Check if data has higher-order spherical harmonics.
Returns:
bool: True if SH degree > 0
Example:
if gstensor.has_high_order_sh():
print("Has higher-order SH coefficients")
else:
print("Only DC component (SH0)")
Complete Workflow Examples
Training Workflow
import gsply
from gsply import GSTensor
import torch
# Load from disk
data = gsply.plyread("scene.ply") # Has _base -> fast GPU transfer
# Transfer to GPU (11x faster with _base)
gstensor = GSTensor.from_gsdata(data, device='cuda', requires_grad=True)
# Training loop
optimizer = torch.optim.Adam([gstensor.means, gstensor.scales], lr=0.01)
for epoch in range(100):
optimizer.zero_grad()
# Unpack for rendering (cleaner API)
means, scales, quats, opacities, sh0, shN = gstensor.unpack()
loss = render_gaussians(means, scales, quats, opacities, sh0)
loss.backward()
optimizer.step()
# Save optimized results
optimized_data = gstensor.to_gsdata()
gsply.plywrite("optimized.ply", optimized_data.means, optimized_data.scales,
optimized_data.quats, optimized_data.opacities,
optimized_data.sh0, optimized_data.shN)
Inference Workflow
import gsply
from gsply import GSTensor
import torch
# Load scene
data = gsply.plyread("scene.ply")
# Transfer to GPU (inference mode, no gradients)
gstensor = GSTensor.from_gsdata(data, device='cuda', requires_grad=False)
# Filter Gaussians by opacity threshold
high_opacity_mask = gstensor.opacities > 0.5
filtered = gstensor[high_opacity_mask]
# Render filtered scene with unpacking
with torch.no_grad():
means, scales, quats, opacities, sh0, shN = filtered.unpack()
rendered = render_gaussians(means, scales, quats, opacities, sh0)
# Save filtered version
filtered_data = filtered.to_gsdata()
gsply.plywrite("filtered.ply", filtered_data.means, filtered_data.scales,
filtered_data.quats, filtered_data.opacities,
filtered_data.sh0, filtered_data.shN)
Performance
Benchmark Results
Comprehensive performance benchmarks (source: BENCHMARK_SUMMARY.md):
Uncompressed Format Performance
| Gaussians | SH | Read (ms) | Write (ms) | Read (M/s) | Write (M/s) |
|---|---|---|---|---|---|
| 100K | 0 | 1.5 | 3.9 | 68.1 | 26.0 |
| 400K | 0 | 5.7 | 19.3 | 70.0 | 21.0 |
| 1M | 0 | 12.8 | 62.2 | 78.0 | 16.1 |
| 100K | 3 | 6.9 | 24.6 | 14.4 | 4.1 |
| 400K | 3 | 31.1 | 121.5 | 12.9 | 3.3 |
| 1M | 3 | 81.8 | 316.5 | 12.2 | 3.2 |
Compressed Format Performance
| Gaussians | SH | Read (ms) | Write (ms) | Read (M/s) | Write (M/s) | Size Reduction |
|---|---|---|---|---|---|---|
| 100K | 0 | 2.8 | 3.4 | 35.4 | 29.4 | 71% |
| 400K | 0 | 8.5 | 15.0 | 47.0 | 26.6 | 71% |
| 1M | 0 | 16.7 | 35.5 | 60.0 | 28.2 | 71% |
| 100K | 3 | 30.5 | 22.5 | 3.3 | 4.5 | 74% |
| 400K | 3 | 25.1 | 110.5 | 16.0 | 3.6 | 74% |
| 1M | 3 | 256.4 | 210.0 | 3.9 | 4.8 | 74% |
Key Performance Highlights
- Peak Read Speed: 78M Gaussians/sec (1M Gaussians, SH0, uncompressed)
- Peak Write Speed: 29M Gaussians/sec (100K Gaussians, SH0, compressed)
- Uncompressed Read (SH0): 68M/s (100K), 70M/s (400K), 78M/s (1M)
- Uncompressed Write (SH0): 26M/s (100K), 21M/s (400K), 16M/s (1M)
- Uncompressed SH3: Read 12-14M/s, Write 3-4M/s (scales linearly)
- Compressed Read (SH0): 35M/s (100K), 47M/s (400K), 60M/s (1M)
- Compressed Write (SH0): 29M/s (100K), 27M/s (400K), 28M/s (1M)
- Compressed SH3: Read 16M/s (400K), Write 3.6M/s (400K) with 74% size reduction
- Compression Benefits: 71-74% file size reduction across all SH degrees
- Scalability: Linear scaling verified up to 1M Gaussians
- Real-World Validation: Benchmarks verified on both synthetic and real 4D Gaussian Splatting PLY files
Optimization Details
- Zero-copy reads: Direct memory views without data duplication
- Zero-copy writes: When data has _base array (from plyread), use directly without copying
- Parallel processing: Numba JIT compilation with parallel chunk operations
- Smart caching: LRU cache for frequently used headers
- Lookup tables: Eliminate branching for SH degree detection
- Fast-path checks: Skip unnecessary dtype conversions
- Single file handle: Reduce file open/close syscall overhead
Why gsply is Faster
Read Performance (4.3-8x speedup):
- gsply: Optimized bulk header read +
np.fromfile()+ zero-copy views- Bulk header reading: Single 8KB read + decode (vs. N readline() calls)
- Reads entire binary data as contiguous block in one system call
- Creates memory views directly into the data array (no copies)
- Base array kept alive via GSData container's reference counting
- Consistent performance: Works equally well on real-world and random data
- plyfile: Line-by-line header + individual property accesses per element
- Multiple readline() + decode operations for header parsing
- Accesses each property separately through PLY structure
- Stacks columns together requiring multiple memory allocations and copies
- Generic PLY parser handles arbitrary formats with overhead
- Data-dependent performance: 10x slower on random/synthetic data vs real-world structured data
Write Performance:
- gsply: Pre-computed templates + pre-allocated array + buffered I/O
- Pre-computed header templates: Avoids dynamic string building in loops
- Buffered I/O: 2MB buffer for large files reduces system call overhead
- Allocates single contiguous array with exact dtype needed
- Fills array via direct slice assignment (no intermediate structures)
- Used when data created from scratch (no _base array) or for SH1-3
- Performance (SH0): 30M Gaussians/sec (100K), 19M Gaussians/sec (400K), 16M Gaussians/sec (1M)
- Performance (SH3): 1.9M Gaussians/sec (100K), 1.4M Gaussians/sec (1M)
- plyfile: Dynamic header + per-property assignments + PLY construction
- Builds header dynamically with loop + f-string formatting
- Creates PLY element structure with per-property descriptors
- Assigns each property individually through PLY abstraction layer
- Additional overhead from generic format handling
Key Insight: gsply's performance comes from recognizing that Gaussian Splatting PLY files follow a fixed format, allowing bulk operations and zero-copy views instead of generic PLY parsing.
Format Support
Uncompressed PLY
Standard binary little-endian PLY format with Gaussian Splatting properties:
| SH Degree | Properties | Description |
|---|---|---|
| 0 | 14 | xyz, f_dc(3), opacity, scales(3), quats(4) |
| 1 | 23 | + 9 f_rest coefficients |
| 2 | 38 | + 24 f_rest coefficients |
| 3 | 59 | + 45 f_rest coefficients |
Compressed PLY (PlayCanvas)
Chunk-based quantized format with automatic extension handling:
- File extension: Automatically saves as
.compressed.plywhencompressed=True - Compression ratio: 3.4x for SH0 (3.8-14.5x depending on SH degree)
- Chunk size: 256 Gaussians per chunk
- Bit-packed data: 11-10-11 bits (position/scale), 2+10-10-10 bits (quaternion)
- Parallel decompression: 14.74ms for 400K Gaussians (27M Gaussians/sec)
- Parallel compression: 63ms for 400K Gaussians (6.3M Gaussians/sec) with radix sort
- Compatible with: PlayCanvas, SuperSplat, other WebGL viewers
For format details, see docs/COMPRESSED_FORMAT.md.
Development
Setup
# Clone repository
git clone https://github.com/OpsiClear/gsply.git
cd gsply
# Install in development mode
pip install -e .[dev]
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ -v --cov=gsply --cov-report=html
Project Structure
gsply/
├── src/
│ └── gsply/
│ ├── __init__.py # Public API
│ ├── gsdata.py # GSData dataclass
│ ├── reader.py # PLY reading (uncompressed + compressed)
│ ├── writer.py # PLY writing (uncompressed + compressed)
│ ├── formats.py # Format detection and specs
│ ├── torch/ # Optional PyTorch integration
│ │ ├── __init__.py
│ │ └── gstensor.py # GSTensor GPU dataclass
│ └── py.typed # PEP 561 type marker
├── tests/ # Unit tests (169 tests)
├── benchmarks/ # Performance benchmarks
├── docs/ # Documentation
│ ├── CHANGELOG.md # Version changelog
│ └── archive/ # Historical documentation
├── .github/ # CI/CD workflows
├── pyproject.toml # Package configuration
└── README.md # This file
Benchmarking
Compare gsply performance against other PLY libraries:
# Install benchmark dependencies
pip install -e .[benchmark]
# Run benchmark with default settings
python benchmarks/benchmark.py
# Custom test file and iterations
python benchmarks/benchmark.py --config.file path/to/model.ply --config.iterations 20
# Skip write benchmarks
python benchmarks/benchmark.py --config.skip-write
The benchmark measures:
- Read performance: Time to load PLY file into numpy arrays
- Write performance: Time to write numpy arrays to PLY file
- File sizes: Comparison of output file sizes
- Verification: Output equivalence between libraries
Example output:
READ PERFORMANCE (50K Gaussians, SH degree 3)
Library Time Speedup
gsply (fast) 2.89ms baseline (FASTEST)
gsply (safe) 4.75ms 0.61x (1.6x slower than fast)
plyfile 18.23ms 0.16x (6.3x SLOWER)
Open3D 43.10ms 0.07x (14.9x slower)
WRITE PERFORMANCE
Library Time Speedup File Size
gsply 8.72ms baseline (FASTEST) 11.34MB
plyfile 12.18ms 0.72x (1.4x slower) 11.34MB
Open3D 35.69ms 0.24x (4.1x slower) 1.15MB (XYZ only)
Testing
gsply has comprehensive test coverage with 169 passing tests:
# Run all tests (NumPy/Numba core)
pytest tests/ -v
# Run PyTorch tests (requires torch installed)
pytest tests/ -v -k "torch or gstensor"
# Run specific test file
pytest tests/test_reader.py -v
# Run with coverage report
pytest tests/ -v --cov=gsply --cov-report=html
Test categories:
- Core I/O: Format detection, reading, writing, round-trip consistency
- GSData: Dataclass operations, slicing, masking, consolidation
- Compressed format: PlayCanvas compression/decompression
- GSTensor (PyTorch): GPU transfer, slicing, device management, conversions
- Performance: Optimization verification, benchmark validation
- Error handling: Invalid files, malformed data, edge cases
Documentation
gsply includes comprehensive documentation:
- docs/CHANGELOG.md - Version changelog and release notes
- benchmarks/TRANSFER_OPTIMIZATION_ANALYSIS.md - GPU transfer optimization analysis
- benchmarks/QUICK_REFERENCE.md - Performance quick reference
- docs/archive/ - Historical documentation from development phases
CI/CD
gsply includes a complete GitHub Actions CI/CD pipeline:
- Multi-platform testing: Ubuntu, Windows, macOS
- Multi-version testing: Python 3.10, 3.11, 3.12, 3.13
- Core + PyTorch testing: Separate test jobs for NumPy/Numba core and PyTorch integration
- Automated benchmarking: Performance tracking on PRs
- Build verification: Wheel building and installation testing
- PyPI publishing: Trusted publishing on GitHub Release
- Pip caching: Fast CI runs with dependency caching
Contributing
Contributions are welcome! Please see .github/CONTRIBUTING.md for guidelines.
Quick start:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run tests and benchmarks
- Submit a pull request
License
MIT License - see LICENSE file for details.
Citation
If you use gsply in your research, please cite:
@software{gsply2024,
author = {OpsiClear},
title = {gsply: Ultra-Fast Gaussian Splatting PLY I/O},
year = {2024},
url = {https://github.com/OpsiClear/gsply}
}
Related Projects
- gsplat: CUDA-accelerated Gaussian Splatting rasterizer
- nerfstudio: NeRF training framework with Gaussian Splatting support
- PlayCanvas SuperSplat: Web-based Gaussian Splatting viewer
- 3D Gaussian Splatting: Original paper and implementation
Made with Python and numpy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gsply-0.2.0.tar.gz.
File metadata
- Download URL: gsply-0.2.0.tar.gz
- Upload date:
- Size: 93.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c57c0e59061235b8ffecde4b4c44672b924e5e12b7414ad43a30c7dd5e4fa00b
|
|
| MD5 |
1c49a2cbd110ede60b86cfb6ca1f7bcd
|
|
| BLAKE2b-256 |
7058264f236df12dd4fa21e378fd0f817dc0a7f6e89d2b4ccfa58da27eb4d5cd
|
Provenance
The following attestation bundles were made for gsply-0.2.0.tar.gz:
Publisher:
publish.yml on OpsiClear/gsply
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gsply-0.2.0.tar.gz -
Subject digest:
c57c0e59061235b8ffecde4b4c44672b924e5e12b7414ad43a30c7dd5e4fa00b - Sigstore transparency entry: 701725708
- Sigstore integration time:
-
Permalink:
OpsiClear/gsply@fc27eefdeae082e5aebd1c07c511b98c9847a590 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/OpsiClear
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fc27eefdeae082e5aebd1c07c511b98c9847a590 -
Trigger Event:
release
-
Statement type:
File details
Details for the file gsply-0.2.0-py3-none-any.whl.
File metadata
- Download URL: gsply-0.2.0-py3-none-any.whl
- Upload date:
- Size: 49.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
222c29a7b85c0c5cfbb10d1d0cc32461277522552abfa098d5b56680a486dead
|
|
| MD5 |
0aa4b6b6430b5a0c7b790d7cdbd7fa9f
|
|
| BLAKE2b-256 |
acddd22cc8b97e2143fef31d021574bbaf362a3bc90485a55baa46fd9fd52782
|
Provenance
The following attestation bundles were made for gsply-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on OpsiClear/gsply
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gsply-0.2.0-py3-none-any.whl -
Subject digest:
222c29a7b85c0c5cfbb10d1d0cc32461277522552abfa098d5b56680a486dead - Sigstore transparency entry: 701725709
- Sigstore integration time:
-
Permalink:
OpsiClear/gsply@fc27eefdeae082e5aebd1c07c511b98c9847a590 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/OpsiClear
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fc27eefdeae082e5aebd1c07c511b98c9847a590 -
Trigger Event:
release
-
Statement type: