A Python package for synthetic proteomics data augmentation using ProtoGAIN

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3.10
- Python :: 3.11

Project description

GenerativeProteomics

WORK STILL IN PROGRESS

In this repository you may find a PyTorch implementation of Generative Adversarial Imputation Networks (GAIN) [1] for imputing missing iBAQ values in proteomics datasets.

Installation
How to Use
Demo
References

Installation

Clone this repository: git clone https://github.com/QuantitativeBiology/ProtoGain/
Create a Python environment: conda create -n proto python=3.10 if you have conda installed
Activate the previously created environment: conda activate proto
Install the necessary packages: pip install -r requirements.txt

How to Use

If you just want to impute a general dataset, the most straightforward and simplest way to run ProtoGain is to run: python protogain.py -i /path/to/file_to_impute.csv Running in this manner will result in two separate training phases.

Evaluation run: In this run a percentage of the values (10% by default) are concealed during the training phase and then the dataset is imputed. The RMSE is calculated with those hidden values as targets and at the end of the training phase a test_imputed.csv file will be created containing the original hidden values and the resulting imputation, this way you can have an estimation of the imputation accuracy.
Imputation run: Then a proper training phase takes place using the entire dataset. An imputed.csv file will be created containing the imputed dataset.

However, there are a few arguments which you may want to change. You can do this using a parameters.json file (you may find an example in GenerativeProteomics/breast/parameters.json) or you can choose them directly in the command line.

Run with a parameters.json file: python protogain.py --parameters /path/to/parameters.json
Run with command line arguments: python protogain.py -i /path/to/file_to_impute.csv -o imputed_name --ofolder ./results/ --it 2001

Arguments:

-i: Path to file to impute
-o: Name of imputed file
--ofolder: Path to the output folder
--it: Number of iterations to train the model
--miss: The percentage of values to be concealed during the evaluation run (from 0 to 1)
--outall: Set this argument to 1 if you want to output every metric
--override: Set this argument to 1 if you want to delete the previously created files when writing the new output

If you want to test the efficacy of the code you may give a reference file containing a complete version of the dataset (without missing values): python protogain.py -i /path/to/file_to_impute.csv --ref /path/to/complete_dataset.csv

Running this way will calculate the RMSE of the imputation in relation to the complete dataset.

Demo

In this repository you may find a folder named breast, inside it you have a breast cancer diagnostic dataset [2] which you may use to try out the code.

breast.csv: complete dataset
breastMissing_20.csv: the same dataset but with 20% of its values taken out

To simply impute breastMissing_20.csv run: python protogain.py -i ./breast/breastMissing_20.csv
If you want to compare the imputation with the original dataset run: python protogain.py -i ./breast/breastMissing_20.csv --ref ./breast/breast.csv or python protogain.py --parameters ./breast/parameters.json

If you want to go deep in the analysis of every metric you either set --outall to 1 or you run the code in an IPython console, this way you can access every variable you want in the metrics object, e.g. metrics.loss_D.

References

[1] J. Yoon, J. Jordon & M. van der Schaar (2018). GAIN: Missing Data Imputation using Generative Adversarial Nets
[2] https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Python :: 3.10
- Python :: 3.11

Release history Release notifications | RSS feed

This version

0.2.1

Jun 22, 2025

0.2.0

Jun 22, 2025

0.1.0

Dec 28, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GenerativeProteomics-0.2.1.tar.gz (10.8 kB view details)

Uploaded Jun 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

GenerativeProteomics-0.2.1-py3-none-any.whl (12.6 kB view details)

Uploaded Jun 22, 2025 Python 3

File details

Details for the file GenerativeProteomics-0.2.1.tar.gz.

File metadata

Download URL: GenerativeProteomics-0.2.1.tar.gz
Upload date: Jun 22, 2025
Size: 10.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for GenerativeProteomics-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`4da2bf87ef19d61b24489ec68a6522fab859d0aab7f60f36c45de183811353b9`
MD5	`e10a828364e8a023cd155c278a8706c5`
BLAKE2b-256	`03681260f63f241065814fa225413eef3d0e120a4c03fa6d48836de1d3766395`

See more details on using hashes here.

File details

Details for the file GenerativeProteomics-0.2.1-py3-none-any.whl.

File metadata

Download URL: GenerativeProteomics-0.2.1-py3-none-any.whl
Upload date: Jun 22, 2025
Size: 12.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for GenerativeProteomics-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bbc9b1ac8866c322f42f19967356c00aabe003f1b6ea9f81a13cb62c986ad75f`
MD5	`1cabeac2ba405ecb36f2f5fc72a783da`
BLAKE2b-256	`7bc984fd37f95fafa33c6a4a75b76555bdc86f54bd9a4520d85b9f78d32b43b7`

See more details on using hashes here.

GenerativeProteomics 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GenerativeProteomics

Table of Contents

Installation

How to Use

Arguments:

Demo

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes