Hierarchical partitioning and variation partitioning for canonical analyses in Python.
Project description
rdacca_hp
Python implementation of hierarchical partitioning and variation partitioning for canonical analyses, inspired by the R package rdacca.hp.
rdacca_hp provides hierarchical partitioning and variation partitioning for:
- RDA (Redundancy Analysis)
- CCA (Canonical Correspondence Analysis)
- dbRDA (distance-based Redundancy Analysis)
It is designed for users who want a Python workflow similar to rdacca.hp, while supporting mixed predictor types such as:
- numeric variables
- unordered categorical variables
- ordered categorical variables
- grouped predictor sets
The package also provides:
- permutation-based significance testing
- plotting utilities for hierarchical partitioning and variation partitioning results
Features
- Hierarchical partitioning (
hier_part) - Variation partitioning (
var_part) - Support for:
- numeric predictors
- unordered factors
- ordered factors
- grouped predictors
- Permutation testing with
permu_hp() - Plotting utilities for single results and result comparison
- Baseline validation against R outputs for key RDA use cases
Current status
This package is currently in an early public release stage.
At the current stage:
- the RDA workflow has been checked carefully against the R package rdacca.hp
- mixed predictor inputs (numeric + unordered factor + ordered factor) are supported
- permutation testing is available
- baseline tests against R outputs are included for selected cases
Notes:
- results for RDA are expected to closely match the R implementation in validated scenarios
- CCA and dbRDA are implemented and tested, and further benchmark expansion is planned in future releases.
- permutation p-values may show small Monte Carlo differences relative to R because random permutation sequences differ across platforms
Installation
Install from local source
pip install .
Install in editable mode for development
pip install -e .[dev]
Install optional plotting dependencies
pip install -e .[plot]
Install from a published package
pip install rdacca_hp
Public API
The main public functions can be imported directly from the package top level:
from rdacca_hp import rdacca_hp, permu_hp, plot_rdaccahp, plot_comparison
Main public objects include:
rdacca_hpRdaccaHpResultcalculate_rdacalculate_ccacalculate_dbrdacreate_test_datacreate_cca_test_datacreate_distance_test_datapermu_hpplot_rdaccahpplot_comparison
Quick start
1. Numeric predictors only
from rdacca_hp import create_test_data, rdacca_hp
dv, iv = create_test_data()
result = rdacca_hp(
dv=dv,
iv=iv,
method="RDA",
type="adjR2",
scale=False,
var_part=True
)
print(result.total_explained_variation)
print(result.hier_part)
print(result.var_part)
2. Mixed predictors: numeric + unordered factor + ordered factor
If your predictors contain mixed types, you can explicitly specify factor handling.
import pandas as pd
from rdacca_hp import rdacca_hp
dv = pd.DataFrame({
"sp1": [2, 3, 5, 4, 6, 7],
"sp2": [1, 2, 2, 3, 4, 5],
})
iv = pd.DataFrame({
"WatrCont": [10.1, 9.8, 8.7, 7.5, 6.9, 6.1],
"Substrate": ["A", "B", "A", "C", "B", "A"],
"Shrub": ["None", "Few", "Few", "Many", "Many", "Few"],
})
result = rdacca_hp(
dv=dv,
iv=iv,
method="RDA",
type="adjR2",
scale=False,
var_part=True,
categorical_factors=["Substrate"],
ordered_factors={"Shrub": ["None", "Few", "Many"]}
)
print(result.hier_part)
print(result.var_part)
3. Grouped predictors
import pandas as pd
from rdacca_hp import create_test_data, rdacca_hp
dv, iv = create_test_data(n_predictors=4)
groups = {
"Climate": pd.DataFrame(iv[:, :2], columns=["Temp", "Rain"]),
"Soil": pd.DataFrame(iv[:, 2:], columns=["N", "C"]),
}
result = rdacca_hp(
dv=dv,
iv=groups,
method="RDA",
type="R2",
var_part=True
)
print(result.hier_part)
print(result.var_part)
4. Permutation test
from rdacca_hp import create_test_data, permu_hp
dv, iv = create_test_data()
perm_result = permu_hp(
dv=dv,
iv=iv,
method="RDA",
type="adjR2",
permutations=99,
scale=False,
random_state=123,
verbose=False
)
print(perm_result)
5. Plotting
from rdacca_hp import create_test_data, rdacca_hp, plot_rdaccahp
dv, iv = create_test_data()
result = rdacca_hp(dv=dv, iv=iv, method="RDA", type="R2", var_part=True)
fig = plot_rdaccahp(result, plot_type="bar")
You can also use the convenience method on the result object:
fig = result.plot(plot_type="bar")
Main functions
rdacca_hp()
Main function for hierarchical partitioning and variation partitioning.
permu_hp()
Permutation test for hierarchical partitioning results.
plot_rdaccahp()
Plot a single hierarchical partitioning result.
plot_comparison()
Compare multiple hierarchical partitioning results in one figure.
Input conventions
Response matrix (dv)
dv can be:
- a NumPy array
- a pandas DataFrame
For RDA, users often apply Hellinger transformation before analysis when working with community data.
For dbRDA, dv should be a square symmetric distance matrix.
Predictor matrix (iv)
iv can be:
- a NumPy array
- a pandas DataFrame
- a grouped structure such as
dict - a grouped structure such as
list
Supported predictor types include:
- continuous numeric columns
- unordered categorical columns
- ordered categorical columns
Predictor handling
rdacca_hp supports several predictor formats.
1. Numeric matrix or array
If iv is given as a numeric array or numeric matrix, all predictors are treated as numeric variables.
result = rdacca_hp(dv=dv, iv=iv_numeric)
2. pandas DataFrame with mixed predictor types
If iv is given as a pandas DataFrame, the package can handle mixed predictor types, including:
- continuous numeric variables
- unordered categorical factors
- ordered factors
Numeric columns are handled directly as numeric predictors.
For non-numeric predictors, users can explicitly specify variable types when needed:
- use
categorical_factors=[...]for unordered categorical variables - use
ordered_factors={...}for ordered variables with a declared level order
result = rdacca_hp(
dv=dv,
iv=iv_df,
categorical_factors=["Substrate"],
ordered_factors={"Shrub": ["None", "Few", "Many"]},
)
For mixed-type DataFrames, explicit specification is recommended, especially when:
- the dataset contains string-based predictors
- factor level order matters
- reproducible encoding behavior is important
In practice:
- numeric variables are supported directly
- unordered factors should be declared with
categorical_factors - ordered factors should be declared with
ordered_factors
This makes the package easy to use for standard numeric analyses, while still allowing precise control over how mixed predictor data are encoded.
3. Grouped predictors as a dictionary
result = rdacca_hp(dv=dv, iv={"Climate": climate_df, "Soil": soil_df})
4. Grouped predictors as a list
result = rdacca_hp(dv=dv, iv=[group1_df, group2_df, group3_df])
Returned object
rdacca_hp() returns a RdaccaHpResult object containing at least:
method_typetotal_explained_variationhier_part
and optionally:
var_part
It also provides:
summary()plot()
Example:
result = rdacca_hp(dv=dv, iv=iv)
result.summary()
fig = result.plot(plot_type="bar")
hier_part
A table containing:
UniqueAverage.shareIndividualI.perc(%)
var_part
A table containing:
Fractions% Total
Running tests
Run all tests:
pytest -q
Run coverage:
pytest --cov=rdacca_hp --cov-report=term-missing
Run only R baseline tests:
pytest tests/test_r_baselines.py -q
R baseline validation
This project includes a benchmark workflow against R outputs.
Benchmark directories
benchmark/data/: fixed input databenchmark/expected/: expected outputs exported from Rbenchmark/r_scripts/: scripts used to generate expected R outputs
Current validated RDA baselines
rda_numeric_2varsrda_unordered_factorrda_mite_full_mixedrda_ordered_factor_mixed
These baselines are used to check that Python results remain aligned with the corresponding R workflow for validated RDA scenarios.
Important notes
1. Small p-value differences are normal
Permutation p-values may differ slightly from R because:
- permutation sequences differ
- random seeds differ across platforms
- permutation p-values are Monte Carlo estimates
2. Ordered factors matter
Ordered factors should not be treated the same way as ordinary categorical variables. If you have ordered predictor levels, specify them explicitly.
3. CSV reading and "None"
If a valid category level is literally "None", make sure it is not accidentally parsed as missing data when reading CSV files.
For example:
import pandas as pd
pd.read_csv("file.csv", keep_default_na=False)
Limitations
- RDA is currently the most thoroughly validated workflow
- CCA and dbRDA are available, but more benchmark expansion is still desirable
- very large permutation jobs may be slow in pure Python workflows
Recommended usage for reproducibility
For the most reproducible results:
- keep benchmark datasets fixed
- explicitly specify unordered and ordered factors when needed
- use baseline tests against R outputs
- report the package version and analysis settings
Package structure
rdacca_hp/
├── rdacca_hp/
│ ├── __init__.py
│ ├── core.py
│ ├── utils.py
│ ├── permutation.py
│ └── plotting.py
│
├── tests/
│ ├── test_r_baselines.py
│ ├── test_assertions.py
│ ├── test_core.py
│ ├── test_cca.py
│ ├── test_dbrda.py
│ ├── test_permutation.py
│ ├── test_plotting.py
│ └── test_public_api.py
│
├── benchmark/
│ ├── data/
│ ├── expected/
│ └── r_scripts/
│
├── scripts/
│ └── test_time.py
│
├── README.md
├── pyproject.toml
└── LICENSE
Citation / inspiration
This Python project is inspired by the R package rdacca.hp and its hierarchical partitioning framework for canonical analyses.
If you use this package in academic work, you should also cite the original methodological and/or R package sources as appropriate.
License
This project is licensed under the MIT License.
Contact
Author: Jiangshan Lai Email: lai@njfu.edu.cn
Repository: https://github.com/peony-peo/rdacca_hp
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rdacca_hp-0.1.1.tar.gz.
File metadata
- Download URL: rdacca_hp-0.1.1.tar.gz
- Upload date:
- Size: 40.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab4d9ebb61a457b28adff2312451056328b4f36706b89f8c06f98fe4056346c5
|
|
| MD5 |
1ddedd557853e788aaa29ce41ef96dcd
|
|
| BLAKE2b-256 |
9ca590d421890882153797edcc8bc10b93a9ee019b9e1bf631ff2020aa4c883d
|
File details
Details for the file rdacca_hp-0.1.1-py3-none-any.whl.
File metadata
- Download URL: rdacca_hp-0.1.1-py3-none-any.whl
- Upload date:
- Size: 33.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6455d097d290ac426b38899af6d2d2c7d7e0fb6e35c12ccc7a45bdd44f771f31
|
|
| MD5 |
3dd5803e8909ea365c19d0ec3974ab2a
|
|
| BLAKE2b-256 |
89fea52a0b717e45da8ba220a0bc343ca5cb51b831ccf1961f168a32262db3ff
|