Git-integrated CLI toolkit for scaffolding and running Nextflow-based data analysis pipelines
Project description
AnalysisToolbox
A modular framework for automated data processing and statistical analysis pipelines. Built on Nextflow for scalable, reproducible workflows with automatic result synchronization.
Overview
The AnalysisToolbox provides infrastructure for building data processing pipelines that:
- Process multiple datasets in parallel with automatic participant discovery
- Handle diverse data types through a generic reader/processor/analyzer architecture
- Track progress via per-participant logging visible live in the web UI
- Recover gracefully from failures without losing completed work
The framework is domain-agnostic — modules follow simple input/output conventions (Parquet/FIF files) and can implement any processing logic.
Repository Structure
AnalysisToolbox/
├── gitatbx/ # pip-installable package (installed via pip install GitAtbx)
│ ├── bin/ # workflow_wrapper.nf, log_to_parquet.py, nextflow.config, ...
│ ├── modules/ # analyzers/, processors/, readers/, utils/
│ ├── templates/ # workflow_template.nf, modules_template.nf, parameters_template.config
│ └── utils/ # serve_html.ps1, reinject.sh, result_collector.py
├── pyproject.toml
└── README.md
On first run gitatbx creates a symlink ~/Documents/GitAtbxModules → <site-packages>/gitatbx/modules/. The symlink always reflects the live installed version — upgrading via pip install --upgrade GitAtbx automatically shows updated modules through the same symlink path.
Prerequisites
Java Runtime (required by Nextflow)
sudo apt update && sudo apt install default-jre
java -version
Nextflow
curl -s https://get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
nextflow -version
Install
pip install GitAtbx
Python dependencies (numpy, scipy, polars, mne, neurokit2, …) are installed automatically.
On first run gitatbx creates a symlink ~/Documents/GitAtbxModules → <site-packages>/gitatbx/modules/ and saves the path to ~/.gitatbx_config.
Usage
Commands
| Command | What it does |
|---|---|
gitatbx init <dir> |
Scaffold a new analysis project |
gitatbx run [pattern] |
Find and run a pipeline by project name pattern |
gitatbx serve [--dir DIR] [--port PORT] |
Serve results HTML locally in browser |
gitatbx reinject <PID> [options] |
Reinject a corrected output for one participant |
gitatbx move <dest> |
Move the deployed modules folder to a new location |
gitatbx config show |
Print current configuration (~/.gitatbx_config) |
gitatbx init
Prompts for project name, raw data directory, Python executable, toolbox path, git author identity, and an optional GitHub remote URL for the results repo. Then creates:
<dir>/
├── {name}_analysis/
│ ├── {name}_pipeline.nf ← edit your workflow here
│ ├── {name}_modules.nf ← add IOInterface includes here
│ └── {name}_parameters.config ← pre-filled paths, params, and git identity
└── {name}_results/
└── .git/ ← initialised + remote added (if URL provided)
Git author name and email default to your global git config values if already set. The remote URL is validated immediately with git ls-remote — if authentication fails (e.g. SSH key not yet added to GitHub), a warning is printed with a link to the GitHub SSH setup guide.
The pipeline uses the stamped params.git_user_name / params.git_user_email as the commit author for all automatic result syncs.
gitatbx run
GitAtbx searches the entire accessible filesystem (home directory and all drives on Windows) for a directory named (name)_analysis containing a *_pipeline.nf, then runs it automatically. No need to cd anywhere.
gitatbx run (name) --resume # continue a previous run
gitatbx run # no pattern: use current directory
Found paths are cached in ~/.gitatbx_config so subsequent calls are instant.
gitatbx serve
Starts a local HTTP server to browse results HTML generated by the pipeline.
gitatbx serve --dir ../EV_results --port 8080
gitatbx reinject
Places a corrected parquet into corrections/<script_name>/, marks the participant for replay, invalidates relevant Nextflow cache entries, and resumes the pipeline for that participant only.
gitatbx reinject EV_002 --corrected-file fixed.parquet --script-name filtering_processor
gitatbx move
To move the symlink to a custom location:
gitatbx move /mnt/d/repoShaggy/GitAtbxModules
gitatbx moves the folder and updates ~/.gitatbx_config automatically. gitatbx init will then default toolbox_dir to that path when scaffolding new projects.
Key Components
bin/workflow_wrapper.nf
Discovers participant directories, manages per-participant output folders, runs log_to_parquet.py and interactive_plotter.py automatically, and handles per-participant git sync on completion.
IOInterface
Generic Nextflow process that runs any script (reader/processor/analyzer) with automatic logging. Every module in modules/ is called through IOInterface.
Modules
The modules/ folder is a curated but non-exhaustive starting collection of commonly useful scripts. The first time gitatbx is run it mirrors the full collection to ~/Documents/GitAtbxModules (Windows) or ~/GitAtbxModules (Linux/macOS) automatically. You can move that folder anywhere with gitatbx move. The local copy is yours to extend: add domain-specific scripts, modify existing ones, or organise them into subfolders. Any script there can be called via IOInterface identically to the built-in modules, as long as it follows the same convention: positional CLI arguments in, Parquet or FIF outputs, non-zero exit on failure.
Authors
Cagatay Özcan Jagiello Gutt — Lead Developer ORCID: https://orcid.org/0000-0002-1774-532X
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gitatbx-0.1.1.tar.gz.
File metadata
- Download URL: gitatbx-0.1.1.tar.gz
- Upload date:
- Size: 122.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39535ef7f33b3320d6368cf5e5674a18e3dbcbee646cb45c4cae3a151e9ffa37
|
|
| MD5 |
1c0d9099497a182bdb822e96bdbda03f
|
|
| BLAKE2b-256 |
88cfdf069cff74af59b7666d8e3751bfb599235a8f9c2876fbcc27c564f96bf9
|
Provenance
The following attestation bundles were made for gitatbx-0.1.1.tar.gz:
Publisher:
publish.yml on CGutt-hub/AnalysisToolbox
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gitatbx-0.1.1.tar.gz -
Subject digest:
39535ef7f33b3320d6368cf5e5674a18e3dbcbee646cb45c4cae3a151e9ffa37 - Sigstore transparency entry: 1409685762
- Sigstore integration time:
-
Permalink:
CGutt-hub/AnalysisToolbox@0d8cd2e27ad1ade18527898e24bb552eade28791 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/CGutt-hub
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0d8cd2e27ad1ade18527898e24bb552eade28791 -
Trigger Event:
push
-
Statement type:
File details
Details for the file gitatbx-0.1.1-py3-none-any.whl.
File metadata
- Download URL: gitatbx-0.1.1-py3-none-any.whl
- Upload date:
- Size: 159.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f765c8d142bbdc8bc0df75eeae89a6753579eba574b9f6f8a75df870048ee7c
|
|
| MD5 |
0c61770b5fa78c1d4e29b17a5bae5310
|
|
| BLAKE2b-256 |
b31af51d53b21b6bcfcc9a2d093da153d72d4599351a57fcacb0199d72b63dba
|
Provenance
The following attestation bundles were made for gitatbx-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on CGutt-hub/AnalysisToolbox
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gitatbx-0.1.1-py3-none-any.whl -
Subject digest:
9f765c8d142bbdc8bc0df75eeae89a6753579eba574b9f6f8a75df870048ee7c - Sigstore transparency entry: 1409685788
- Sigstore integration time:
-
Permalink:
CGutt-hub/AnalysisToolbox@0d8cd2e27ad1ade18527898e24bb552eade28791 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/CGutt-hub
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0d8cd2e27ad1ade18527898e24bb552eade28791 -
Trigger Event:
push
-
Statement type: