A project about Audio models and it's fragility

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pkitlo

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Typing
- Typed

Project description

Reference repo

The real repo is here Audio-XAI

Milestone

02.04.2026: acquiring access Athena super computer (GPU A100) 🌐🎉🎉

A project about Audio models and it's fragility

GitHub | PyPI | Documentation
Created by Piotr Kitłowski | GitHub @cncPomper | PyPI @pkitlo
MIT License

Features

TODO

Documentation

Documentation is built with Zensical and deployed to GitHub Pages.

Live site: https://cncPomper.github.io/Audio-XAI-Fragility/
Preview locally: just docs-serve (serves at http://localhost:8000)
Build: just docs-build

API documentation is auto-generated from docstrings using mkdocstrings.

Docs deploy automatically on push to master via GitHub Actions. To enable this, go to your repo's Settings > Pages and set the source to GitHub Actions.

Development

To set up for local development:

# Clone your fork
git clone git@github.com:your_username/Audio-XAI-Fragility.git
cd Audio-XAI-Fragility

# Install in editable mode with live updates
uv tool install --editable .

This installs the CLI globally but with live updates - any changes you make to the source code are immediately available when you run audio_xai_fragility.

Run tests:

uv run pytest

Run quality checks (format, lint, type check, test):

just qa

Author

Audio XAI Fragility was created in 2026 by Piotr Kitłowski.

Built with Cookiecutter and the audreyfeldroy/cookiecutter-pypackage project template.

1. General Information and Project Objective

The main objective of the project is to investigate the perceptual fragility of explanations (XAI methods) for deep learning models in the audio domain while keeping predictions unchanged.

2. Planned scope of experiments

Datasets: Public datasets such as the Speech Commands Dataset (speech) and Sonics (synthetic/real music) will be used. The project will strictly ensure the immutability of the original data.
Research models: Utilization and adaptation of audio recognition architectures: Audio Spectrogram Transformer, VGGish, Spectra, and ViT.
XAI methods: Investigation of the vulnerability of gradient-based methods such as Grad-CAM and Integrated Gradients.
Perceptual constraints: Instead of optimizing attacks against standard metrics, perceptual metrics will be considered (PESQ and STOI for speech, PEAQ for music).
Computational resources and training: The project will require hardware acceleration (GPUs with a minimum of 16 GB VRAM). The estimated training and fine-tuning time for the base models is approximately 15 hours, while the main process of optimizing perceptual perturbations (XAI attack) for the entire test set is estimated to take an additional 25–30 hours of computation.

3. Planned Program Features

Classification and Attribution Module: Reading models and generating explanation maps for them.
Perturbation module: Generating subtle modifications to the audio signal with optimization that preserves high perceptual metrics (e.g., maintaining a PESQ score above 4.0).
Deployment and Automation: Scripted building, testing, and deployment of applications using tools such as just and Python scripts built with typer or argparse.
Final deliverables: The project will include clear documentation, user instructions, and tests relevant to the project’s scope.

4. Planned Technology Stack

The project will implement a robust base structure, automatically generated by tools such as cookiecutter or copier.

Environment management: Use of an isolated virtual environment managed by uv or conda.
Code cleanliness: Enforced PEP8-compliant coding style with an increased line length limit. Syntax checking provided by an autoformatter (e.g., black or ruff) and a linter (ruff).
Version control: Rigorous use of a code repository with the conventional commits specification implemented.
Frameworks and AI: Implementation of learning logic in dedicated frameworks such as PyTorch Lightning in conjunction with Huggingface libraries. Code used for experiments will be continuously exported from Jupyter Lab notebooks into structured library code.
Experiments and configuration: Tracking progress, metrics, and logs using the Weights & Biases or Tensorboard platform. The configuration of model parameters and experiments will be completely separated from the execution code.
Documentation: Use of mkdocs to fast and simple write documentation

5. Project schedule

Deadline dates	Planned scope of work and progress
30.03.2026 - 05.04.2026	Repository configuration (Cookiecutter, Ruff, Uv). Defining the directory structure and ensuring that audio files remain immutable.
06.04.2026 - 12.04.2026	Connecting W&B/TensorBoard. Training base classifiers using the PyTorch Lightning framework. (Estimated resource requirements: 15 hours of GPU computation)
13.04.2026 - 19.04.2026	Implementation of explanation-generating (XAI) modules in clean code, after first exporting experiments from notebooks. Writing the first tests.
20.04.2026 - 26.04.2026	Separating configuration from executable code. Preparing baseline attacks on attribution maps using standard distance metrics.
27.04.2026 - 03.05.2026	Implementation of PESQ/STOI/PEAQ metric approximations directly into the attack optimization loop (generation of perceptual perturbations).
04.05.2026 - 10.05.2026	Launch of the main research experiments on a dedicated cluster. (Estimated resource requirements: 25–30 hours of GPU computing for iterative processes).
11.05.2026 - 17.05.2026	Scripting the execution of the entire experiment using the `just` tool and CLI libraries (e.g., `typer`). Aggregating tables containing the results.
18.05.2026 - 24.05.2026	Finalization of the work: creating documentation and clear instructions for using the finished system. Organizing the code in accordance with PEP8. Preparation of the paper(?)

6. Bibliography Review

Paper	Notes
Interpretation of neural networks is fragile	TODO
Explanations can be manipulated and geometry is to blame	TODO
Constructing adversarial examples to investigate the plausibility of explanations in deep audio and image classifiers	TODO
Perceptual Coding In Python	TODO
EVALUATING FAKE MUSIC DETECTION PERFORMANCE UNDER AUDIO AUGMENTATIONS	TODO

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pkitlo

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Typing
- Typed

Release history Release notifications | RSS feed

0.0.5

Apr 10, 2026

0.0.4

Apr 2, 2026

This version

0.0.3

Apr 2, 2026

0.0.2

Mar 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_xai_fragility-0.0.3.tar.gz (111.4 kB view details)

Uploaded Apr 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

audio_xai_fragility-0.0.3-py3-none-any.whl (9.9 kB view details)

Uploaded Apr 2, 2026 Python 3

File details

Details for the file audio_xai_fragility-0.0.3.tar.gz.

File metadata

Download URL: audio_xai_fragility-0.0.3.tar.gz
Upload date: Apr 2, 2026
Size: 111.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for audio_xai_fragility-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`0b70bc7e478b55a0823fe0711af9236db37cc04565ab1379060933a9e78cd67e`
MD5	`19b502f7fca194032a3a269eb5ba28ad`
BLAKE2b-256	`6f9cdf326ce9025b2e063be9be95f6c05d23bf29636f1a301fdcef819f1e4ba3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for audio_xai_fragility-0.0.3.tar.gz:

Publisher: publish.yml on cncPomper/Audio-XAI-Fragility

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: audio_xai_fragility-0.0.3.tar.gz
- Subject digest: 0b70bc7e478b55a0823fe0711af9236db37cc04565ab1379060933a9e78cd67e
- Sigstore transparency entry: 1217474558
- Sigstore integration time: Apr 2, 2026
Source repository:
- Permalink: cncPomper/Audio-XAI-Fragility@d1931f1f85dde62f798934dd8e13fadadb4c0436
- Branch / Tag: refs/tags/v0.0.3
- Owner: https://github.com/cncPomper
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d1931f1f85dde62f798934dd8e13fadadb4c0436
- Trigger Event: push

File details

Details for the file audio_xai_fragility-0.0.3-py3-none-any.whl.

File metadata

Download URL: audio_xai_fragility-0.0.3-py3-none-any.whl
Upload date: Apr 2, 2026
Size: 9.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for audio_xai_fragility-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`836214bb9117691663da175307e0ca075b681c14e02ddf9403bf3af021d01312`
MD5	`938365a1692202dee580661d6ef109f6`
BLAKE2b-256	`06cc9a4c5ea8c183f7ab1677409d8017480672226c43424fe480bd557d59f01b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for audio_xai_fragility-0.0.3-py3-none-any.whl:

Publisher: publish.yml on cncPomper/Audio-XAI-Fragility

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: audio_xai_fragility-0.0.3-py3-none-any.whl
- Subject digest: 836214bb9117691663da175307e0ca075b681c14e02ddf9403bf3af021d01312
- Sigstore transparency entry: 1217474679
- Sigstore integration time: Apr 2, 2026
Source repository:
- Permalink: cncPomper/Audio-XAI-Fragility@d1931f1f85dde62f798934dd8e13fadadb4c0436
- Branch / Tag: refs/tags/v0.0.3
- Owner: https://github.com/cncPomper
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d1931f1f85dde62f798934dd8e13fadadb4c0436
- Trigger Event: push

audio-xai-fragility 0.0.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

Reference repo

The real repo is here Audio-XAI

Milestone

Features

Documentation

Development

Author

1. General Information and Project Objective

2. Planned scope of experiments

3. Planned Program Features

4. Planned Technology Stack

5. Project schedule

6. Bibliography Review

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance