Skip to main content

A Python package for estimating tail parameters of heavy-tailed distributions, which is useful for analyzing power-law behavior in complex networks.

Project description

tailestim

GitHub | PyPI | conda-forge | Documentation

PyPI version Conda Version PyPI status Test CI status conda-forge build status GitHub license

A Python package for estimating tail parameters of heavy-tailed distributions, including the powerlaw exponent. Please note that the package is still in development at the alpha state, and thus any breaking change may be introduced with coming updates. For changelogs, please refer to the releases page.

[!NOTE] The original estimation implementations are from ivanvoitalov/tail-estimation, which is based on the paper "Scale-free networks well done" (Voitalov et al. 2019). tailestim is a wrapper package that provides a more convenient/modern interface and logging, installable through pip and conda.

Features

  • Multiple estimation methods including Hill, Moments, Kernel, Pickands, and Smooth Hill estimators
  • Double-bootstrap procedure for optimal threshold selection
  • Built-in example datasets

Installation

The package can be installed from PyPI and conda-forge.

pip install tailestim
conda install conda-forge::tailestim

Quick Start

Using Built-in Datasets

from tailestim import TailData
from tailestim import HillEstimator, KernelTypeEstimator, MomentsEstimator

# Load a sample dataset
data = TailData(name='CAIDA_KONECT').data

# Initialize and fit the Hill estimator
estimator = HillEstimator()
estimator.fit(data)

# Get the estimated results
result = estimator.get_result()

# Get the power law exponent
gamma = result.gamma_

# Print full results
print(result)

Using degree sequence from networkx graphs

import networkx as nx
from tailestim import HillEstimator, KernelTypeEstimator, MomentsEstimator

# Create or load your network
G = nx.barabasi_albert_graph(10000, 2)
degree = list(dict(G.degree()).values()) # Degree sequence

# Initialize and fit the Hill estimator
estimator = HillEstimator()
estimator.fit(degree)

# Get the estimated results
result = estimator.get_result()

# Get the power law exponent
gamma = result.gamma_

# Print full results
print(result)

Available Estimators

The package provides several estimators for tail estimation. For details on parameters that can be specified to each estimator, please refer to the original repository ivanvoitalov/tail-estimation, original paper, or the actual code.

  1. Hill Estimator (HillEstimator)
    • Classical Hill estimator with double-bootstrap for optimal threshold selection
    • Generally recommended for power law analysis
  2. Moments Estimator (MomentsEstimator)
    • Moments-based estimation with double-bootstrap
    • More robust to certain types of deviations from pure power law
  3. Kernel-type Estimator (KernelEstimator)
    • Kernel-based estimation with double-bootstrap and bandwidth selection
  4. Pickands Estimator (PickandsEstimator)
    • Pickands-based estimation (no bootstrap)
    • Provides arrays of estimates across different thresholds
  5. Smooth Hill Estimator (SmoothHillEstimator)
    • Smoothed version of the Hill estimator (no bootstrap)

Results

The full result can be obtained by estimator.get_result(), which is a TailEstimatorResult object. This includes attributes such as:

  • gamma_: Power law exponent (γ = 1 + 1/ξ)
  • xi_star_: Tail index (ξ)
  • k_star_: Optimal order statistic
  • Bootstrap results (when applicable):
    • First and second bootstrap AMSE values
    • Optimal bandwidths or minimum AMSE fractions

Example Output

When you print(result) after fitting, you will get the following output.

--------------------------------------------------
Result
--------------------------------------------------
Order statistics: Array of shape (200,) [1.0000, 1.0000, 1.0000, ...]
Tail index estimates: Array of shape (200,) [1614487461647431761920.0000, 1249994621547387551744.0000, 967791073562264862720.0000, ...]
Optimal order statistic (k*): 25153
Tail index (ξ): 0.5942
Power law exponent (γ): 2.6828
Bootstrap Results: 
  First Bootstrap: 
    Fraction of order statistics: None
    AMSE values: None
    H Min: 0.9059
    Maximum index: None
  Second Bootstrap: 
    Fraction of order statistics: None
    AMSE values: None
    H Min: 0.9090
    Maximum index: None

Built-in Datasets

The package includes several example datasets:

  • CAIDA_KONECT
  • Libimseti_in_KONECT
  • Pareto (Follows power-law with $\gamma=2.5$)

Load any example dataset using:

from tailestim import TailData
data = TailData(name='dataset_name').data

Testing

The package includes comprehensive test suites to ensure correctness and numerical accuracy.

Running Tests

Run the test suite using pytest:

pytest tests/

For verbose output:

pytest tests/ -v

Test Structure

Unit Tests

Located in tests/test_*.py, these tests verify:

  • Individual estimator functionality (Hill, Moments, Kernel, Pickands)
  • Noise generation and random seed handling
  • Edge cases and error handling
  • Result data structures and attributes

Validation Tests

tests/test_tailestimation_validation.py provides cross-package validation against the original tail-estimation implementation:

  • Validates numerical equivalence for each estimator (Hill, Moments, Kernel, Pickands)
  • Comprehensive multi-dataset validation across all estimators
  • Reproducibility tests with various random seeds
  • Plot data comparison (PDF, CCDF, bootstrap AMSE)

The validation tests ensure that tailestim produces identical results to the original implementation when using the same base_seed parameter.

Example datasets tested:

  • CAIDA_KONECT (26,475 samples)
  • Libimseti_in_KONECT (168,791 samples)
  • Pareto distributions (synthetic, various sizes)
  • Complete graphs (synthetic): produces error in both cases, as intended

Run validation tests:

pytest tests/test_tailestimation_validation.py -v

Run quick validation (smaller datasets):

pytest tests/test_tailestimation_validation.py -k "quick" -v

Interactive Validation

The examples/validation.ipynb notebook provides an interactive demonstration of the validation process with visualizations comparing tailestim and tail-estimation outputs side-by-side.

References

Citations

If you use tailestim in your research or projects, I would greatly appreciate if you could cite this package, the original implementation, and the original paper (Voitalov et al. 2019).

@article{voitalov2019scalefree,
  title = {Scale-free networks well done},
  author = {Voitalov, Ivan and van der Hoorn, Pim and van der Hofstad, Remco and Krioukov, Dmitri},
  journal = {Phys. Rev. Res.},
  volume = {1},
  issue = {3},
  pages = {033034},
  numpages = {30},
  year = {2019},
  month = {Oct},
  publisher = {American Physical Society},
  doi = {10.1103/PhysRevResearch.1.033034},
  url = {https://link.aps.org/doi/10.1103/PhysRevResearch.1.033034}
}

@software{voitalov2018tailestimation,
  author       = {Voitalov, Ivan},
  title        = {tail-estimation},
  month        = mar,
  year         = 2018,
  publisher    = {GitHub},
  url          = {https://github.com/ivanvoitalov/tail-estimation}
}

@software{ueda2025tailestim,
  author       = {Ueda, Minami},
  title        = {tailestim: A Python package for estimating tail parameters of heavy-tailed distributions},
  month        = mar,
  year         = 2025,
  publisher    = {GitHub},
  url          = {https://github.com/mu373/tailestim}
}

License

tailestim is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tailestim-0.7.0.tar.gz (737.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tailestim-0.7.0-py3-none-any.whl (88.1 kB view details)

Uploaded Python 3

File details

Details for the file tailestim-0.7.0.tar.gz.

File metadata

  • Download URL: tailestim-0.7.0.tar.gz
  • Upload date:
  • Size: 737.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tailestim-0.7.0.tar.gz
Algorithm Hash digest
SHA256 4e9cfd5250b3b1c06496d02897ce0373bef783789269ca2689f56ec3efe2cf54
MD5 c454e6fdf41b93dc8f231b097ebf2c4d
BLAKE2b-256 d4f79e8606bb8ed4e0b234480b98a2bc78042581110843723439c6dca57e4077

See more details on using hashes here.

Provenance

The following attestation bundles were made for tailestim-0.7.0.tar.gz:

Publisher: release.yml on mu373/tailestim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tailestim-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: tailestim-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 88.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tailestim-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ce63c6bdcb55ebb085013a45a004b3bdea7206c3201ec69d8b9de1e87c3e7a53
MD5 b6cc77aa79191c0b50dab001b0d29c0d
BLAKE2b-256 d47029a968408696f810dfb2177a96932bf091ebab0dd4390883095f3f1769d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for tailestim-0.7.0-py3-none-any.whl:

Publisher: release.yml on mu373/tailestim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page