A Python package for U-Statistics calculation
Project description
U-Statistics-python u-stats
U-statistics are fundamental tools in statistics, probability theory, theoretical computer science, economics, statistical physics, and machine learning. Named after Wassily Hoeffding, U-statistics provide unbiased estimators for population parameters and form the foundation for many statistical tests and methods. However, computing U-statistics can be computationally demanding, especially for high-order cases where the number of combinations grows exponentially.
This package provides a high-performance, tensor-based implementation for computing U-statistics and V-statistics with significant computational advantages:
- Leverages the underlying combinatorial structure of kernel functions to significantly reduce computational complexity in many cases
- Utilizes optimized einsum engines—
numpy.einsumandtorch.einsum—to enable efficient computation on both CPU and GPU
Table of Contents
1. Installation
Install the package from PyPI:
pip install u-stats
For development installation:
git clone https://github.com/Amedar-Asterisk/U-Statistics-python.git
cd U-Statistics-python
pip install -e .
2. Requirements
Required Dependencies
- Python 3.11+
- NumPy >= 1.20.0
- opt_einsum >= 3.3.0
Optional Dependencies
- torch >= 1.9.0: GPU acceleration and parallel CPU computation
3. Example Usage
3.1 Selection of Backend
The package supports two computation backends for different performance needs:
from u_stats import set_backend, get_backend
# Set backend to NumPy (default)
set_backend("numpy") # CPU computation, deterministic results
# Set backend to PyTorch (optional)
set_backend("torch") # GPU acceleration and parallel CPU computation
# Check current backend
current_backend = get_backend()
print(f"Current backend: {current_backend}")
3.2 Computing U-statistics
ustat is the main function for computing U-statistics.
Here we take a 7th-order U-statistic with kernel function
$$ h(x_1,x_2,\dots,x_7) = h_1(x_1, x_2) h_2(x_2, x_3) \dots h_6(x_6, x_7) $$
as an example. For samples $X = (X_1, \dots, X_n)$, the U-statistic takes the form
$$ U = \frac{1}{n(n-1)\cdots (n-6)} \sum_{(i_1,\dots,i_7) \in P_7}h_1(X_{i_1},X_{i_2})\cdots h_6(X_{i_6},X_{i_7}) $$
where $P_7$ denotes all 7-tuples of distinct indices.
3.2.1 Tensor Assembly
We assume that the kernel function values on samples $X$ have been precomputed and assembled into matrices $H_1, H_2, \dots, H_6 \in \mathbb{R}^{n \times n}$:
$$ H_k[i,j] = h_k(X_i, X_j) $$
3.2.2 Expression Formats
The expression defines how kernel matrices are connected in the computation. We take this U-statistic as an example to explain how to construct expression.
To express the structure of the kernel function $h(x_1, x_2, \dots, x_7) = h_1(x_1, x_2) \cdot h_2(x_2, x_3) \cdots h_{6}(x_{6}, x_7)$, we assign a unique index to each distinct variable $x_1, x_2, \dots, x_7$. For each factor $h_k(x_{k}, x_{k+1})$, we collect the indices of the variables it depends on into a pair. The sequence of pairs is then ordered according to the order of the factors in the product.
We can use the following notation to represent this structure:
Einstein Summation Notation:
expression = "ab,bc,cd,de,ef,fg->"
Nested List Notation:
expression = [[1,2],[2,3],[3,4],[4,5],[5,6],[6,7]]
Format Explanation:
- In Einstein notation: each letter represents an index, each string like
"ab"represents a factor of the kernel - In list notation: each sub-list
[i,j]represents a factor of the kernel Both formats are equivalent and specify the same computation pattern in our package
3.2.3 Complete Example
from u_stats import ustat, set_backend
import numpy as np
# Choose computation backend
set_backend("torch") # Use torch if available
# The default is numpy
# Set number of samples
n = 100
# Create precomputed kernel matrices
# In practice, these would be computed from your actual data
H1 = np.random.rand(n, n)
H2 = np.random.rand(n, n)
H3 = np.random.rand(n, n)
H4 = np.random.rand(n, n)
H5 = np.random.rand(n, n)
H6 = np.random.rand(n, n)
tensors = [H1, H2, H3, H4, H5, H6]
expression = "ab,bc,cd,de,ef,fg->"
# Compute the U-statistic
result = ustat(tensors=tensors, expression=expression, average=True)
print(f"7th-order U-statistic: {result}")
# You can also compute the unaveraged sum
sum_result = ustat(tensors=tensors, expression=expression, average=False)
print(f"Sum (before averaging): {sum_result}")
4. API Reference
4.1 Main Functions
ustat(tensors, expression, average=True, optimize="greedy", **kwargs)
Compute U-statistics from input tensors.
Parameters:
tensors(List[np.ndarray]): List of input tensors (numpy arrays or torch tensors)expression(str | List | Tuple): Tensor contraction expression- String format: Einstein summation notation (e.g., "ij,jk->ik")
- List format: Nested indices (e.g., [[1,2],[2,3]])
average(bool, default=True): Whether to compute average (True) or sum (False)optimize(str, default="greedy"): Optimization strategy for tensor contraction- "greedy": Fast heuristic optimization
- "optimal": Exhaustive search for optimal contraction order
- "dp": Dynamic programming approach
**kwargs: Additional keyword arguments passed toopt_einsum.contract
Returns:
float: Computed U-statistic value
Example:
result = ustat([H1, H2], "ij,jk->", average=True, optimize="optimal")
vstat(tensors, expression, average=True, optimize="greedy", **kwargs)
Compute V-statistics from input tensors.
Parameters: Same as ustat
Returns: Computed V-statistic value
u_stats_loop(tensors, expression)
Reference implementation using explicit loops (for validation and small computations).
Note: This class is primarily for testing and educational purposes. Use ustats for production code.
set_backend(backend_name)
Set the tensor computation backend.
Parameters:
backend_name(str): Backend identifier"numpy": Use NumPy backend"torch": Use PyTorch backend
Example:
set_backend("torch") # Switch to PyTorch backend
get_backend()
Get the current tensor computation backend.
Returns:
str: Current backend name ("numpy" or "torch")
4.2 Classes
UStats(expression)
Class-based interface for U-statistics computation with advanced features.
Parameters:
expression: Tensor contraction expression (same format as function interface)
Methods:
compute(tensors, average=True, **kwargs): Compute U-statisticcomplexity_analysis(): Analyze computational complexityget_contraction_path(): Get optimized contraction path
Example:
ustats_obj = UStats("ij,jk->")
result = ustats_obj.compute([H1, H2], average=True)
complexity = ustats_obj.complexity_analysis()
VStats(expression)
Class-based interface for V-statistics computation with advanced features.
Parameters and Methods: Same as UStats
5. Changelog
See CHANGELOG.md for detailed version history and changes.
6. License
This project is licensed under the MIT License – see the LICENSE file for details.
7. Contributing
We welcome contributions! Here's how you can help:
Reporting Issues
- Use the GitHub issue tracker
- Include minimal reproducible examples
- Specify your environment (Python version, OS, backend)
Development Setup
# Clone the repository
git clone https://github.com/Amedar-Asterisk/U-Statistics-python.git
cd U-Statistics-python
# Install in development mode
pip install -e ".[test]"
Pull Requests
- Fork the repository and create a feature branch
- Add tests for new functionality
- Ensure all tests pass and type checking succeeds
- Update documentation as needed
- Follow the existing code style
For questions or discussions, feel free to open an issue or reach out to the maintainers.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file u_stats-0.7.6.tar.gz.
File metadata
- Download URL: u_stats-0.7.6.tar.gz
- Upload date:
- Size: 35.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f10deabb81f1c47072a2a37c6f5f919c4b7b0331955f40013a5f6142007e168e
|
|
| MD5 |
4922d0a53983b7bcd8fd31671875100f
|
|
| BLAKE2b-256 |
1dfcd90a316598de8778367a89b4be20f67f04334f6b20d9b63acf4b3d8ef359
|
File details
Details for the file u_stats-0.7.6-py3-none-any.whl.
File metadata
- Download URL: u_stats-0.7.6-py3-none-any.whl
- Upload date:
- Size: 36.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11d31033acfbeecd5eb1e13966575524a04b6f3d5ea7cbedefe4bc599de0ce15
|
|
| MD5 |
649fd8e8ff5f6fdb5fbb6da139d62274
|
|
| BLAKE2b-256 |
13ff2d437f49b7a5a124fe8fb205dbeeed6cb888e58eead6321dc0f68a8d7428
|