Skip to main content

A Python package for U-Statistics calculation

Project description

U-Statistics-python u-stats

PyPI version Python 3.11+ License: MIT Style Status

U-statistics are fundamental tools in statistics, probability theory, theoretical computer science, economics, statistical physics, and machine learning. Named after Wassily Hoeffding, U-statistics provide unbiased estimators for population parameters and form the foundation for many statistical tests and methods. However, computing U-statistics can be computationally demanding, especially for high-order cases where the number of combinations grows exponentially.

This package provides a high-performance, tensor-based implementation for computing U-statistics and V-statistics with significant computational advantages:

  • Leverages the underlying combinatorial structure of kernel functions to significantly reduce computational complexity in many cases
  • Utilizes optimized einsum engines—numpy.einsum and torch.einsum—to enable efficient computation on both CPU and GPU

Table of Contents

  1. Installation
  2. Requirements
  3. Example Usage
  4. API Reference
  5. Changelog
  6. License
  7. Contributing

1. Installation

Install the package from PyPI:

pip install u-stats

For development installation:

git clone https://github.com/Amedar-Asterisk/U-Statistics-python.git
cd U-Statistics-python
pip install -e .

2. Requirements

Required Dependencies

  • Python 3.11+
  • NumPy >= 1.20.0
  • opt_einsum >= 3.3.0

Optional Dependencies

  • torch >= 1.9.0: GPU acceleration and parallel CPU computation

3. Example Usage

3.1 Selection of Backend

The package supports two computation backends for different performance needs:

from u_stats import set_backend, get_backend

# Set backend to NumPy (default)
set_backend("numpy")  # CPU computation, deterministic results

# Set backend to PyTorch (optional)
set_backend("torch")  # GPU acceleration and parallel CPU computation

# Check current backend
current_backend = get_backend()
print(f"Current backend: {current_backend}")

3.2 Computing U-statistics

ustat is the main function for computing U-statistics.

Here we take a 7th-order U-statistic with kernel function

$$ h(x_1,x_2,\dots,x_7) = h_1(x_1, x_2) h_2(x_2, x_3) \dots h_6(x_6, x_7) $$

as an example. For samples $X = (X_1, \dots, X_n)$, the U-statistic takes the form

$$ U = \frac{1}{n(n-1)\cdots (n-6)} \sum_{(i_1,\dots,i_7) \in P_7}h_1(X_{i_1},X_{i_2})\cdots h_6(X_{i_6},X_{i_7}) $$

where $P_7$ denotes all 7-tuples of distinct indices.

3.2.1 Tensor Assembly

We assume that the kernel function values on samples $X$ have been precomputed and assembled into matrices $H_1, H_2, \dots, H_6 \in \mathbb{R}^{n \times n}$:

$$ H_k[i,j] = h_k(X_i, X_j) $$

3.2.2 Expression Formats

The expression defines how kernel matrices are connected in the computation. We take this U-statistic as an example to explain how to construct expression.

To express the structure of the kernel function $h(x_1, x_2, \dots, x_7) = h_1(x_1, x_2) \cdot h_2(x_2, x_3) \cdots h_{6}(x_{6}, x_7)$, we assign a unique index to each distinct variable $x_1, x_2, \dots, x_7$. For each factor $h_k(x_{k}, x_{k+1})$, we collect the indices of the variables it depends on into a pair. The sequence of pairs is then ordered according to the order of the factors in the product.

We can use the following notation to represent this structure:

Einstein Summation Notation:

expression = "ab,bc,cd,de,ef,fg->"

Nested List Notation:

expression = [[1,2],[2,3],[3,4],[4,5],[5,6],[6,7]]

Format Explanation:

  • In Einstein notation: each letter represents an index, each string like "ab" represents a factor of the kernel
  • In list notation: each sub-list [i,j] represents a factor of the kernel Both formats are equivalent and specify the same computation pattern in our package

3.2.3 Complete Example

from u_stats import ustat, set_backend
import numpy as np

# Choose computation backend
set_backend("torch")  # Use torch if available
# The default is numpy


# Set number of samples
n = 100

# Create precomputed kernel matrices
# In practice, these would be computed from your actual data
H1 = np.random.rand(n, n)
H2 = np.random.rand(n, n)
H3 = np.random.rand(n, n)
H4 = np.random.rand(n, n)
H5 = np.random.rand(n, n)
H6 = np.random.rand(n, n)

tensors = [H1, H2, H3, H4, H5, H6]
expression = "ab,bc,cd,de,ef,fg->"

# Compute the U-statistic
result = ustat(tensors=tensors, expression=expression, average=True)
print(f"7th-order U-statistic: {result}")

# You can also compute the unaveraged sum
sum_result = ustat(tensors=tensors, expression=expression, average=False)
print(f"Sum (before averaging): {sum_result}")

4. API Reference

4.1 Main Functions

ustat(tensors, expression, average=True, optimize="greedy", **kwargs)

Compute U-statistics from input tensors.

Parameters:

  • tensors (List[np.ndarray]): List of input tensors (numpy arrays or torch tensors)
  • expression (str | List | Tuple): Tensor contraction expression
    • String format: Einstein summation notation (e.g., "ij,jk->ik")
    • List format: Nested indices (e.g., [[1,2],[2,3]])
  • average (bool, default=True): Whether to compute average (True) or sum (False)
  • optimize (str, default="greedy"): Optimization strategy for tensor contraction
    • "greedy": Fast heuristic optimization
    • "optimal": Exhaustive search for optimal contraction order
    • "dp": Dynamic programming approach
  • **kwargs: Additional keyword arguments passed to opt_einsum.contract

Returns:

  • float: Computed U-statistic value

Example:

result = ustat([H1, H2], "ij,jk->", average=True, optimize="optimal")

vstat(tensors, expression, average=True, optimize="greedy", **kwargs)

Compute V-statistics from input tensors.

Parameters: Same as ustat

Returns: Computed V-statistic value

u_stats_loop(tensors, expression)

Reference implementation using explicit loops (for validation and small computations).

Note: This class is primarily for testing and educational purposes. Use ustats for production code.

set_backend(backend_name)

Set the tensor computation backend.

Parameters:

  • backend_name (str): Backend identifier
    • "numpy": Use NumPy backend
    • "torch": Use PyTorch backend

Example:

set_backend("torch")  # Switch to PyTorch backend

get_backend()

Get the current tensor computation backend.

Returns:

  • str: Current backend name ("numpy" or "torch")

4.2 Classes

UStats(expression)

Class-based interface for U-statistics computation with advanced features.

Parameters:

  • expression: Tensor contraction expression (same format as function interface)

Methods:

  • compute(tensors, average=True, **kwargs): Compute U-statistic
  • complexity_analysis(): Analyze computational complexity
  • get_contraction_path(): Get optimized contraction path

Example:

ustats_obj = UStats("ij,jk->")
result = ustats_obj.compute([H1, H2], average=True)
complexity = ustats_obj.complexity_analysis()

VStats(expression)

Class-based interface for V-statistics computation with advanced features.

Parameters and Methods: Same as UStats

5. Changelog

See CHANGELOG.md for detailed version history and changes.

6. License

This project is licensed under the MIT License – see the LICENSE file for details.

7. Contributing

We welcome contributions! Here's how you can help:

Reporting Issues

  • Use the GitHub issue tracker
  • Include minimal reproducible examples
  • Specify your environment (Python version, OS, backend)

Development Setup

# Clone the repository
git clone https://github.com/Amedar-Asterisk/U-Statistics-python.git
cd U-Statistics-python

# Install in development mode
pip install -e ".[test]"

Pull Requests

  • Fork the repository and create a feature branch
  • Add tests for new functionality
  • Ensure all tests pass and type checking succeeds
  • Update documentation as needed
  • Follow the existing code style

For questions or discussions, feel free to open an issue or reach out to the maintainers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

u_stats-0.7.6.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

u_stats-0.7.6-py3-none-any.whl (36.5 kB view details)

Uploaded Python 3

File details

Details for the file u_stats-0.7.6.tar.gz.

File metadata

  • Download URL: u_stats-0.7.6.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for u_stats-0.7.6.tar.gz
Algorithm Hash digest
SHA256 f10deabb81f1c47072a2a37c6f5f919c4b7b0331955f40013a5f6142007e168e
MD5 4922d0a53983b7bcd8fd31671875100f
BLAKE2b-256 1dfcd90a316598de8778367a89b4be20f67f04334f6b20d9b63acf4b3d8ef359

See more details on using hashes here.

File details

Details for the file u_stats-0.7.6-py3-none-any.whl.

File metadata

  • Download URL: u_stats-0.7.6-py3-none-any.whl
  • Upload date:
  • Size: 36.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for u_stats-0.7.6-py3-none-any.whl
Algorithm Hash digest
SHA256 11d31033acfbeecd5eb1e13966575524a04b6f3d5ea7cbedefe4bc599de0ce15
MD5 649fd8e8ff5f6fdb5fbb6da139d62274
BLAKE2b-256 13ff2d437f49b7a5a124fe8fb205dbeeed6cb888e58eead6321dc0f68a8d7428

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page