Skip to main content

Git-integrated CLI toolkit for scaffolding and running Nextflow-based data analysis pipelines

Project description

AnalysisToolbox

A modular framework for automated data processing and statistical analysis pipelines. Built on Nextflow for scalable, reproducible workflows with automatic result synchronization.

Overview

The AnalysisToolbox provides infrastructure for building data processing pipelines that:

  • Process multiple datasets in parallel with automatic participant discovery
  • Handle diverse data types through a generic reader/processor/analyzer architecture
  • Track progress via per-participant logging visible live in the web UI
  • Recover gracefully from failures without losing completed work

The framework is domain-agnostic — modules follow simple input/output conventions (Parquet/FIF files) and can implement any processing logic.

Repository Structure

AnalysisToolbox/
├── gitatbx/               # pip-installable package (installed via pip install GitAtbx)
│   ├── bin/               # workflow_wrapper.nf, log_to_parquet.py, nextflow.config, ...
│   ├── modules/           # analyzers/, processors/, readers/, utils/
│   ├── templates/         # workflow_template.nf, modules_template.nf, parameters_template.config
│   └── utils/             # serve_html.ps1, reinject.sh, result_collector.py
├── pyproject.toml
└── README.md

On first run gitatbx creates a symlink ~/Documents/GitAtbxModules<site-packages>/gitatbx/modules/. The symlink always reflects the live installed version — upgrading via pip install --upgrade GitAtbx automatically shows updated modules through the same symlink path.

Prerequisites

Java Runtime (required by Nextflow)

sudo apt update && sudo apt install default-jre
java -version

Nextflow

curl -s https://get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
nextflow -version

Install

pip install GitAtbx

Python dependencies (numpy, scipy, polars, mne, neurokit2, …) are installed automatically.

On first run gitatbx creates a symlink ~/Documents/GitAtbxModules<site-packages>/gitatbx/modules/ and saves the path to ~/.gitatbx_config.

Usage

Commands

Command What it does
gitatbx init <dir> Scaffold a new analysis project
gitatbx run [pattern] Find and run a pipeline by project name pattern
gitatbx serve [--dir DIR] [--port PORT] Serve results HTML locally in browser
gitatbx reinject <PID> [options] Reinject a corrected output for one participant
gitatbx move <dest> Move the deployed modules folder to a new location
gitatbx config show Print current configuration (~/.gitatbx_config)

gitatbx init

Prompts for project name, raw data directory, Python executable, toolbox path, git author identity, and an optional GitHub remote URL for the results repo. Then creates:

<dir>/
├── {name}_analysis/
│   ├── {name}_pipeline.nf       ← edit your workflow here
│   ├── {name}_modules.nf        ← add IOInterface includes here
│   └── {name}_parameters.config ← pre-filled paths, params, and git identity
└── {name}_results/
    └── .git/                    ← initialised + remote added (if URL provided)

Git author name and email default to your global git config values if already set. The remote URL is validated immediately with git ls-remote — if authentication fails (e.g. SSH key not yet added to GitHub), a warning is printed with a link to the GitHub SSH setup guide.

The pipeline uses the stamped params.git_user_name / params.git_user_email as the commit author for all automatic result syncs.

gitatbx run

GitAtbx searches the entire accessible filesystem (home directory and all drives on Windows) for a directory named (name)_analysis containing a *_pipeline.nf, then runs it automatically. No need to cd anywhere.

gitatbx run (name) --resume   # continue a previous run
gitatbx run               # no pattern: use current directory

Found paths are cached in ~/.gitatbx_config so subsequent calls are instant.

gitatbx serve

Starts a local HTTP server to browse results HTML generated by the pipeline.

gitatbx serve --dir ../EV_results --port 8080

gitatbx reinject

Places a corrected parquet into corrections/<script_name>/, marks the participant for replay, invalidates relevant Nextflow cache entries, and resumes the pipeline for that participant only.

gitatbx reinject EV_002 --corrected-file fixed.parquet --script-name filtering_processor

gitatbx move

To move the symlink to a custom location:

gitatbx move /mnt/d/repoShaggy/GitAtbxModules

gitatbx moves the folder and updates ~/.gitatbx_config automatically. gitatbx init will then default toolbox_dir to that path when scaffolding new projects.

Key Components

bin/workflow_wrapper.nf

Discovers participant directories, manages per-participant output folders, runs log_to_parquet.py and interactive_plotter.py automatically, and handles per-participant git sync on completion.

IOInterface

Generic Nextflow process that runs any script (reader/processor/analyzer) with automatic logging. Every module in modules/ is called through IOInterface.

Modules

The modules/ folder is a curated but non-exhaustive starting collection of commonly useful scripts. The first time gitatbx is run it mirrors the full collection to ~/Documents/GitAtbxModules (Windows) or ~/GitAtbxModules (Linux/macOS) automatically. You can move that folder anywhere with gitatbx move. The local copy is yours to extend: add domain-specific scripts, modify existing ones, or organise them into subfolders. Any script there can be called via IOInterface identically to the built-in modules, as long as it follows the same convention: positional CLI arguments in, Parquet or FIF outputs, non-zero exit on failure.

Authors

Cagatay Özcan Jagiello Gutt — Lead Developer ORCID: https://orcid.org/0000-0002-1774-532X

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gitatbx-0.1.1.tar.gz (122.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gitatbx-0.1.1-py3-none-any.whl (159.9 kB view details)

Uploaded Python 3

File details

Details for the file gitatbx-0.1.1.tar.gz.

File metadata

  • Download URL: gitatbx-0.1.1.tar.gz
  • Upload date:
  • Size: 122.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gitatbx-0.1.1.tar.gz
Algorithm Hash digest
SHA256 39535ef7f33b3320d6368cf5e5674a18e3dbcbee646cb45c4cae3a151e9ffa37
MD5 1c0d9099497a182bdb822e96bdbda03f
BLAKE2b-256 88cfdf069cff74af59b7666d8e3751bfb599235a8f9c2876fbcc27c564f96bf9

See more details on using hashes here.

Provenance

The following attestation bundles were made for gitatbx-0.1.1.tar.gz:

Publisher: publish.yml on CGutt-hub/AnalysisToolbox

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gitatbx-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: gitatbx-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 159.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gitatbx-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9f765c8d142bbdc8bc0df75eeae89a6753579eba574b9f6f8a75df870048ee7c
MD5 0c61770b5fa78c1d4e29b17a5bae5310
BLAKE2b-256 b31af51d53b21b6bcfcc9a2d093da153d72d4599351a57fcacb0199d72b63dba

See more details on using hashes here.

Provenance

The following attestation bundles were made for gitatbx-0.1.1-py3-none-any.whl:

Publisher: publish.yml on CGutt-hub/AnalysisToolbox

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page