Diagnostics for pre- and post- Harmonisation

These details have not been verified by PyPI

Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3 :: Only

Project description

DiagnoseHarmonisation (DHARM)

Installation

DiagnoseHarmonisation is an in-development library for the streamlined application and assessment of harmonisation algorithms at the summary-measure level. It also serves as a centralised location for popular, well-validated harmonisation methods from the literature. Full documentation is available here DiagnoseHarmonisation.

In an upcoming paper, we plan to demonstrate that systematic evaluation and reporting of different components of batch effects is not only beneficial for choosing an appropriate harmonisation strategy, but essential for evaluating how well harmonisation has worked.

Installation and Usage

Install by downloading directly or by running: pip install DiagnoseHarmonisation in the terminal.

Load different components of the module by calling

from DiagnoseHarmonisation import ModuleName

The commands can then be ran using ModuleName.FunctionName()

The main commands are those in the DiagnosticReport module:

DiagnosticReport.CrossSectionalReport()
DiagnosticReport.CrossSectionalComparisonReport()
DiagnosticReport.LongitudinalReport()

CrossSectionalReport() is the single-dataset workflow. CrossSectionalComparisonReport() compares multiple harmonised outputs side by side and generates a category-based scorecard across additive, multiplicative, linear-modelling, distributional, and PCA diagnostics.

Support and Contact

If you find any issues or bugs in the code, please raise an issue or contact one of the following:

Jake Turnbull: jacob.turnbull@ndcn.ox.ac.uk
Gaurav Bhalerao: gaurav.bhalerao@ndcn.ox.ac.uk

Overview

This library is intended to support the streamlined analysis and application of harmonisation for MRI data. Consistent reporting of different components of batch differences should be carried out both pre- and post-harmonisation, both to confirm that harmonisation was needed and to verify that it was successful.

While this tool was developed for MRI data, there is no inherent reason it cannot be used in other research scenarios.

The purpose of harmonisation is to remove technical variation driven by differences in data acquisition (e.g. across sites), while preserving meaningful biological signals of interest.

Harmonisation efficacy should therefore be assessed across two broad categories:

Reduction or removal of batch effects, i.e. unwanted technical differences between datasets.
Preservation of biological signal, ensuring that meaningful variability is retained.

This library provides a set of functions to assess the severity, nature, and distribution of batch effects across features in multi-batch data. These diagnostics are intended to provide guidance on the most appropriate harmonisation strategy to apply.

Harmonisation is goal-specific, so its integration into experimental design should be carefully considered. Diagnostic reports can serve as a practical method for informing experimental design decisions.

DiagnosticReport.py

Main set of callable functions. Takes in data, batch and covariates to provide a statistical analysis of batch differences and covariate effects within the data, returning a structured report that assess each component of the data.

The library currently offers three main report entry points, one for single cross-sectional data, one for multi-method cross-sectional comparison, and one for longitudinal data:

CrossSectionalReport():

Single callable function that takes a data set and batch, returning a full organised analysis as a single easily understood HTML file.

Arguments:  
    data (np.ndarray): Data matrix (samples x features).
    batch (list or np.ndarray): Batch labels for each sample.
    
Optional arguments:
    covariates (np.ndarray, optional): Covariate matrix (samples x covariates).
    covariate_names (list of str, optional): Names of covariates.
    save_data (bool, optional): Whether to save input data and results. Default = True
    save_data_name (str, optional): Filename for saved data. Default = Report name
    save_dir (str or os.PathLike, optional): Directory to save report and data. Default = pwd
    report_name (str, optional): Name of the report file. Default = CrossSectionalReport_timestamp
    SaveArtifacts (bool, optional): Whether to save plots as pngs. Default = False 
    rep (StatsReporter, optional): Existing report object to use. Default = Generate report object
    show (bool, optional): Whether to display plots interactively. Default = False (recommended to keep as false)

CrossSectionalComparisonReport(): Compares multiple harmonised datasets against the same batch and covariate structure.

Arguments:
    datasets (dict[str, np.ndarray]): Mapping of method name to data matrix (samples x features).
    batch (list or np.ndarray): Batch labels for each sample.

Optional arguments:
    covariates (np.ndarray, optional): Covariate matrix (samples x covariates).
    covariate_names (list of str, optional): Names of covariates.
    save_data (bool, optional): Whether to save input data and results. Default = True
    save_data_name (str, optional): Filename prefix for saved data.
    feature_names (list of str, optional): Feature names.
    save_dir (str or os.PathLike, optional): Directory to save report and data. Default = pwd
    report_name (str, optional): Name of the report file. Default = CrossSectionalComparisonReport_timestamp
    include_raw (bool, optional): Whether to keep the raw input alongside harmonised methods. Default = True
    raw_name (str, optional): Name assigned to the raw dataset if present. Default = Raw
    scoring_config (dict, optional): Optional score weighting configuration.
    rep (StatsReporter, optional): Existing report object to use.
    SaveArtifacts (bool, optional): Whether to save plots as pngs. Default = False 
    show (bool, optional): Whether to display plots interactively. Default = False
    timestamped_reports (bool, optional): Whether to append a timestamp to the report filename. Default = True
    covariate_types (list, optional): Covariate type hints.
    ratio_type (str, optional): Variance-ratio comparison mode.
    UMAP_embedding (bool, optional): Whether to include UMAP comparison plots. Default = True
    UMAP_tuning (str, optional): UMAP tuning mode.
    plot_covariate_embeddings (bool, optional): Whether to colour embeddings by covariates. Default = True
    allow_many_covariate_embeddings (bool, optional): Allow more than five covariate-coloured embedding plots. Default = False

LongitudinalReport(): Requires an additional vector of subject IDs. Longitudinal harmonisation has the added goal of ensuring that between-subject variability is preserved or recovered after harmonisation. This report assesses additive, multiplicative, and distributional components of batch effects under the assumption that batch effects affect all observations of a participant similarly across features. It also evaluates consistency of subject ranking across sites (e.g. if subject A has larger ROI values than subject B at one site, this ordering should be preserved across sites).

Arguments: 
    data (np.ndarray): Data matrix (samples x features).
    batch (list or np.ndarray): Batch labels for each sample.
    subject_ids (list or np.ndarray): Subject IDs for each sample.

Optional Arguments:
    covariates (np.ndarray, optional): Covariate matrix (samples x covariates).
    covariate_names (list of str, optional): Names of covariates.
    save_data (bool, optional): Whether to save input data and results.
    save_data_name (str, optional): Filename for saved data.
    save_dir (str or os.PathLike, optional): Directory to save report and data.
    report_name (str, optional): Name of the report file.
    SaveArtifacts (bool, optional): Whether to save intermediate artifacts.
    rep (StatsReporter, optional): Existing report object to use.
    show (bool, optional): Whether to display plots interactively.

LoggingTool.py

Enhanced logging and HTML report generation for diagnostic reports. Provides the StatsReporter class that allows logging text and plots, organizing them into sections, and writing a structured HTM report with a table of contents. If individuals would like to use this library to create their own analysis scripts, we suggest using the logging tool as an easy way to organise and return results (see script for more detail)

DiagnosticFunctions.py

Definitions for each of the functions called by the different reporting tools in DiagnosticReport.py are written here. These functions can be called independantly of the main diagnostic reports if the user would prefer to focus on a single test.

PlotDiagnosticResults.py

Complementary plotting functions for the functions in DiagnosticFunctions.py. Some of these functions require the output from a corresponding diagnostic function in order to run so keep this in mind this if using them outside of the reports

PlotComparisonResults.py

Comparison plotting helpers for the multi-method cross-sectional report. These produce side-by-side figures for the same diagnostics across methods, including compact per-panel colourbars for PCA, UMAP, and covariance visualisations.

HarmonisationFunctions.py

While not the main purpose of this library, we do provide access to some well validated harmonisation methods for derived measures. These have all been tested and confirmed to be within machine precision to the other more widely used publically available versions.

Simulator.py

Batch effect simulator that opens an interactive web-browser and allows the user to generate simulated datasets with varying numbers of unique batches, severity of batch effects (additive and multiplicative) and different covariate effects.

The user can then visualise the feature-wise difference in batches using histograms and box-plots, generate a cross-sectional diagnostic report to view the effects in more detail and apply harmonisation (using ComBat). This allows the user to get a direct comparisson of the before/after of applying harmonisation by comparing the reports in a semi-realistic scenario.

To run the simulator, run streamlit run simulator.py in the terminal.

Project details

These details have not been verified by PyPI

Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3 :: Only

Release history Release notifications | RSS feed

1.1.2

Jun 26, 2026

This version

1.1.1

Jun 25, 2026

1.1.0

Jun 24, 2026

1.0.1

May 11, 2026

1.0.0.post6

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diagnoseharmonisation-1.1.1.tar.gz (277.3 kB view details)

Uploaded Jun 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

diagnoseharmonisation-1.1.1-py3-none-any.whl (175.8 kB view details)

Uploaded Jun 25, 2026 Python 3

File details

Details for the file diagnoseharmonisation-1.1.1.tar.gz.

File metadata

Download URL: diagnoseharmonisation-1.1.1.tar.gz
Upload date: Jun 25, 2026
Size: 277.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for diagnoseharmonisation-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`618f7ff202c66507f302d7b0835f6e17acb2db2ae3d259d995a7e78462afa937`
MD5	`768e1a4aeed8387dc3caca5e3abf32f4`
BLAKE2b-256	`7a44a2f5b3cdfeafb07d01043391d345f61ceb27fcea9d53b4e09679c8cc0a91`

See more details on using hashes here.

Provenance

The following attestation bundles were made for diagnoseharmonisation-1.1.1.tar.gz:

Publisher: publish.yml on Jake-Turnbull/HarmonisationDiagnostics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: diagnoseharmonisation-1.1.1.tar.gz
- Subject digest: 618f7ff202c66507f302d7b0835f6e17acb2db2ae3d259d995a7e78462afa937
- Sigstore transparency entry: 1954277636
- Sigstore integration time: Jun 25, 2026
Source repository:
- Permalink: Jake-Turnbull/HarmonisationDiagnostics@a12cdf4a1b7375583c2ce8955be2101b59849628
- Branch / Tag: refs/tags/v1.1.1
- Owner: https://github.com/Jake-Turnbull
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a12cdf4a1b7375583c2ce8955be2101b59849628
- Trigger Event: release

File details

Details for the file diagnoseharmonisation-1.1.1-py3-none-any.whl.

File metadata

Download URL: diagnoseharmonisation-1.1.1-py3-none-any.whl
Upload date: Jun 25, 2026
Size: 175.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for diagnoseharmonisation-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a18fbafbcf333f2a63978071a7299b050d900ad962f966fca955479e8342b2fe`
MD5	`612035f7d0bb63ad9498e61b9159aab6`
BLAKE2b-256	`98fb86c836d6858b8d57d0ddbcd2693d1cad49be2d6254738171d370bad172f2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for diagnoseharmonisation-1.1.1-py3-none-any.whl:

Publisher: publish.yml on Jake-Turnbull/HarmonisationDiagnostics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: diagnoseharmonisation-1.1.1-py3-none-any.whl
- Subject digest: a18fbafbcf333f2a63978071a7299b050d900ad962f966fca955479e8342b2fe
- Sigstore transparency entry: 1954277760
- Sigstore integration time: Jun 25, 2026
Source repository:
- Permalink: Jake-Turnbull/HarmonisationDiagnostics@a12cdf4a1b7375583c2ce8955be2101b59849628
- Branch / Tag: refs/tags/v1.1.1
- Owner: https://github.com/Jake-Turnbull
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a12cdf4a1b7375583c2ce8955be2101b59849628
- Trigger Event: release

DiagnoseHarmonisation 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

DiagnoseHarmonisation (DHARM)

Installation

Installation and Usage

Support and Contact

Overview

DiagnosticReport.py

LoggingTool.py

DiagnosticFunctions.py

PlotDiagnosticResults.py

PlotComparisonResults.py

HarmonisationFunctions.py

Simulator.py

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance