Skip to main content

Hierarchical partitioning and variation partitioning for canonical analyses in Python.

Project description

rdacca_hp

Python implementation of hierarchical partitioning and variation partitioning for canonical analyses, inspired by the R package rdacca.hp.

rdacca_hp provides hierarchical partitioning and variation partitioning for:

  • RDA (Redundancy Analysis)
  • CCA (Canonical Correspondence Analysis)
  • dbRDA (distance-based Redundancy Analysis)

It is designed for users who want a Python workflow similar to rdacca.hp, while supporting mixed predictor types such as:

  • numeric variables
  • unordered categorical variables
  • ordered categorical variables
  • grouped predictor sets

The package also provides:

  • permutation-based significance testing
  • plotting utilities for hierarchical partitioning and variation partitioning results

Features

  • Hierarchical partitioning (hier_part)
  • Variation partitioning (var_part)
  • Support for:
    • numeric predictors
    • unordered factors
    • ordered factors
    • grouped predictors
  • Permutation testing with permu_hp()
  • Plotting utilities for single results and result comparison
  • Baseline validation against R outputs for key RDA use cases

Current status

This package is currently in an early public release stage.

At the current stage:

  • the RDA workflow has been checked carefully against the R package rdacca.hp
  • mixed predictor inputs (numeric + unordered factor + ordered factor) are supported
  • permutation testing is available
  • baseline tests against R outputs are included for selected cases

Notes:

  • results for RDA are expected to closely match the R implementation in validated scenarios
  • CCA and dbRDA are implemented and tested, and further benchmark expansion is planned in future releases.
  • permutation p-values may show small Monte Carlo differences relative to R because random permutation sequences differ across platforms

Installation

Install from local source

pip install .

Install in editable mode for development

pip install -e .[dev]

Install optional plotting dependencies

pip install -e .[plot]

Install from a published package

pip install rdacca_hp

Public API

The main public functions can be imported directly from the package top level:

from rdacca_hp import rdacca_hp, permu_hp, plot_rdaccahp, plot_comparison

Main public objects include:

  • rdacca_hp
  • RdaccaHpResult
  • calculate_rda
  • calculate_cca
  • calculate_dbrda
  • create_test_data
  • create_cca_test_data
  • create_distance_test_data
  • permu_hp
  • plot_rdaccahp
  • plot_comparison

Quick start

1. Numeric predictors only

from rdacca_hp import create_test_data, rdacca_hp

dv, iv = create_test_data()

result = rdacca_hp(
    dv=dv,
    iv=iv,
    method="RDA",
    type="adjR2",
    scale=False,
    var_part=True
)

print(result.total_explained_variation)
print(result.hier_part)
print(result.var_part)

2. Mixed predictors: numeric + unordered factor + ordered factor

If your predictors contain mixed types, you can explicitly specify factor handling.

import pandas as pd
from rdacca_hp import rdacca_hp

dv = pd.DataFrame({
    "sp1": [2, 3, 5, 4, 6, 7],
    "sp2": [1, 2, 2, 3, 4, 5],
})

iv = pd.DataFrame({
    "WatrCont": [10.1, 9.8, 8.7, 7.5, 6.9, 6.1],
    "Substrate": ["A", "B", "A", "C", "B", "A"],
    "Shrub": ["None", "Few", "Few", "Many", "Many", "Few"],
})

result = rdacca_hp(
    dv=dv,
    iv=iv,
    method="RDA",
    type="adjR2",
    scale=False,
    var_part=True,
    categorical_factors=["Substrate"],
    ordered_factors={"Shrub": ["None", "Few", "Many"]}
)

print(result.hier_part)
print(result.var_part)

3. Grouped predictors

import pandas as pd
from rdacca_hp import create_test_data, rdacca_hp

dv, iv = create_test_data(n_predictors=4)

groups = {
    "Climate": pd.DataFrame(iv[:, :2], columns=["Temp", "Rain"]),
    "Soil": pd.DataFrame(iv[:, 2:], columns=["N", "C"]),
}

result = rdacca_hp(
    dv=dv,
    iv=groups,
    method="RDA",
    type="R2",
    var_part=True
)

print(result.hier_part)
print(result.var_part)

4. Permutation test

from rdacca_hp import create_test_data, permu_hp

dv, iv = create_test_data()

perm_result = permu_hp(
    dv=dv,
    iv=iv,
    method="RDA",
    type="adjR2",
    permutations=99,
    scale=False,
    random_state=123,
    verbose=False
)

print(perm_result)

5. Plotting

from rdacca_hp import create_test_data, rdacca_hp, plot_rdaccahp

dv, iv = create_test_data()
result = rdacca_hp(dv=dv, iv=iv, method="RDA", type="R2", var_part=True)

fig = plot_rdaccahp(result, plot_type="bar")

You can also use the convenience method on the result object:

fig = result.plot(plot_type="bar")

Main functions

rdacca_hp()

Main function for hierarchical partitioning and variation partitioning.

permu_hp()

Permutation test for hierarchical partitioning results.

plot_rdaccahp()

Plot a single hierarchical partitioning result.

plot_comparison()

Compare multiple hierarchical partitioning results in one figure.


Input conventions

Response matrix (dv)

dv can be:

  • a NumPy array
  • a pandas DataFrame

For RDA, users often apply Hellinger transformation before analysis when working with community data.

For dbRDA, dv should be a square symmetric distance matrix.

Predictor matrix (iv)

iv can be:

  • a NumPy array
  • a pandas DataFrame
  • a grouped structure such as dict
  • a grouped structure such as list

Supported predictor types include:

  • continuous numeric columns
  • unordered categorical columns
  • ordered categorical columns

Predictor handling

rdacca_hp supports several predictor formats.

1. Numeric matrix or array

If iv is given as a numeric array or numeric matrix, all predictors are treated as numeric variables.

result = rdacca_hp(dv=dv, iv=iv_numeric)

2. pandas DataFrame with mixed predictor types

If iv is given as a pandas DataFrame, the package can handle mixed predictor types, including:

  • continuous numeric variables
  • unordered categorical factors
  • ordered factors

Numeric columns are handled directly as numeric predictors.

For non-numeric predictors, users can explicitly specify variable types when needed:

  • use categorical_factors=[...] for unordered categorical variables
  • use ordered_factors={...} for ordered variables with a declared level order
result = rdacca_hp(
    dv=dv,
    iv=iv_df,
    categorical_factors=["Substrate"],
    ordered_factors={"Shrub": ["None", "Few", "Many"]},
)

For mixed-type DataFrames, explicit specification is recommended, especially when:

  • the dataset contains string-based predictors
  • factor level order matters
  • reproducible encoding behavior is important

In practice:

  • numeric variables are supported directly
  • unordered factors should be declared with categorical_factors
  • ordered factors should be declared with ordered_factors

This makes the package easy to use for standard numeric analyses, while still allowing precise control over how mixed predictor data are encoded.

3. Grouped predictors as a dictionary

result = rdacca_hp(dv=dv, iv={"Climate": climate_df, "Soil": soil_df})

4. Grouped predictors as a list

result = rdacca_hp(dv=dv, iv=[group1_df, group2_df, group3_df])

Returned object

rdacca_hp() returns a RdaccaHpResult object containing at least:

  • method_type
  • total_explained_variation
  • hier_part

and optionally:

  • var_part

It also provides:

  • summary()
  • plot()

Example:

result = rdacca_hp(dv=dv, iv=iv)
result.summary()
fig = result.plot(plot_type="bar")

hier_part

A table containing:

  • Unique
  • Average.share
  • Individual
  • I.perc(%)

var_part

A table containing:

  • Fractions
  • % Total

Running tests

Run all tests:

pytest -q

Run coverage:

pytest --cov=rdacca_hp --cov-report=term-missing

Run only R baseline tests:

pytest tests/test_r_baselines.py -q

R baseline validation

This project includes a benchmark workflow against R outputs.

Benchmark directories

  • benchmark/data/: fixed input data
  • benchmark/expected/: expected outputs exported from R
  • benchmark/r_scripts/: scripts used to generate expected R outputs

Current validated RDA baselines

  • rda_numeric_2vars
  • rda_unordered_factor
  • rda_mite_full_mixed
  • rda_ordered_factor_mixed

These baselines are used to check that Python results remain aligned with the corresponding R workflow for validated RDA scenarios.


Important notes

1. Small p-value differences are normal

Permutation p-values may differ slightly from R because:

  • permutation sequences differ
  • random seeds differ across platforms
  • permutation p-values are Monte Carlo estimates

2. Ordered factors matter

Ordered factors should not be treated the same way as ordinary categorical variables. If you have ordered predictor levels, specify them explicitly.

3. CSV reading and "None"

If a valid category level is literally "None", make sure it is not accidentally parsed as missing data when reading CSV files.

For example:

import pandas as pd
pd.read_csv("file.csv", keep_default_na=False)

Limitations

  • RDA is currently the most thoroughly validated workflow
  • CCA and dbRDA are available, but more benchmark expansion is still desirable
  • very large permutation jobs may be slow in pure Python workflows

Recommended usage for reproducibility

For the most reproducible results:

  1. keep benchmark datasets fixed
  2. explicitly specify unordered and ordered factors when needed
  3. use baseline tests against R outputs
  4. report the package version and analysis settings

Package structure

rdacca_hp/
├── rdacca_hp/
│   ├── __init__.py
│   ├── core.py
│   ├── utils.py
│   ├── permutation.py
│   └── plotting.py
│
├── tests/
│   ├── test_r_baselines.py
│   ├── test_assertions.py
│   ├── test_core.py
│   ├── test_cca.py
│   ├── test_dbrda.py
│   ├── test_permutation.py
│   ├── test_plotting.py
│   └── test_public_api.py
│
├── benchmark/
│   ├── data/
│   ├── expected/
│   └── r_scripts/
│
├── scripts/
│   └── test_time.py
│
├── README.md
├── pyproject.toml
└── LICENSE

Citation / inspiration

This Python project is inspired by the R package rdacca.hp and its hierarchical partitioning framework for canonical analyses.

If you use this package in academic work, you should also cite the original methodological and/or R package sources as appropriate.


License

This project is licensed under the MIT License.


Contact

Author: Jiangshan Lai Email: lai@njfu.edu.cn

Repository: https://github.com/peony-peo/rdacca_hp

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdacca_hp-0.1.1.tar.gz (40.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rdacca_hp-0.1.1-py3-none-any.whl (33.2 kB view details)

Uploaded Python 3

File details

Details for the file rdacca_hp-0.1.1.tar.gz.

File metadata

  • Download URL: rdacca_hp-0.1.1.tar.gz
  • Upload date:
  • Size: 40.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for rdacca_hp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ab4d9ebb61a457b28adff2312451056328b4f36706b89f8c06f98fe4056346c5
MD5 1ddedd557853e788aaa29ce41ef96dcd
BLAKE2b-256 9ca590d421890882153797edcc8bc10b93a9ee019b9e1bf631ff2020aa4c883d

See more details on using hashes here.

File details

Details for the file rdacca_hp-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: rdacca_hp-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 33.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for rdacca_hp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6455d097d290ac426b38899af6d2d2c7d7e0fb6e35c12ccc7a45bdd44f771f31
MD5 3dd5803e8909ea365c19d0ec3974ab2a
BLAKE2b-256 89fea52a0b717e45da8ba220a0bc343ca5cb51b831ccf1961f168a32262db3ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page