A Python interface to proteomics data repositories

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

A Python interface to proteomics data repositories

Overview

ppx provides a simple, programmatic means to access proteomics data that are publicly available in ProteomeXchange partner repositories. ppx allows users to easily find and download files associated with projects in PRIDE and MassIVE. In doing so, ppx promotes the reproducible analysis of proteomics data.

For full documentation and examples, visit: https://ppx.readthedocs.io

Installation

ppx requires Python 3.6+ and depends upon the requests and tqdm Python packages. ppx and any missing dependencies are easily installed with pip or with conda through the bioconda channel.

Install with conda:

conda install -c bioconda ppx

Or install with pip:

pip3 install ppx

Configuration

By default, ppx will download project files in the .ppx directory under the current user's home directory (~/.ppx on Linux and MacOS). There are several ways to specify different data directories:

Change the ppx data directory for all future Python sessions by setting the PPX_DATA_DIR environment variable to your preferred directory.
Change the ppx data directory for a Python session using the ppx.set_data_dir() function.
Specify a data directory for a project using the local argument:

>>> import ppx

>>> proj = ppx.find_project("PXD000001", local="my/data/dir")

Why does ppx set a default data directory? We found that this makes it easier to reuse the same proteomics data files in multiple tasks that we're working on.

As of ppx v1.3.0, cloud paths can also be used as the data directory. This allows you to stream downloaded files to AWS S3, Google Cloud Storage, or Azure Blob Storage. To use a cloud storage provider, simply set the data directory to a cloud URI, such as :code:s3://my-data-bucket/ppx using any of the methods above. Please note that you'll also need to setup credentials for your cloud provider---see the CloudPathLib documentation <https://cloudpathlib.drivendata.org/v0.6/authentication/>_ for details.

Examples

First, find a project using its ProteomeXchange or MassIVE identifier:

>>> import ppx

>>> proj = ppx.find_project("PXD000001")

We can then view the files associated with the project in the repository (PRIDE in this case):

>>> proj.remote_files()
#['F063721.dat',
# 'F063721.dat-mztab.txt',
# 'PRIDE_Exp_Complete_Ac_22134.xml.gz',
# 'PRIDE_Exp_mzData_Ac_22134.xml.gz',
# 'PXD000001_mztab.txt',
# 'README.txt',
# 'TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML',
# 'TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzXML',
# 'TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzXML',
# 'TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.raw',
# 'erwinia_carotovora.fasta',
# 'generated/PRIDE_Exp_Complete_Ac_22134.pride.mgf.gz',
# 'generated/PRIDE_Exp_Complete_Ac_22134.pride.mztab.gz']

We can also glob for specific types of files:

>>> proj.remote_files("*.mzML")
# ['TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML']

Then we can download one or more files to the projects local data directory:

>>> proj.download("README.txt")
# [PosixPath('/Users/wfondrie/.ppx/PXD000001/README.txt')]

Once we've downloaded files, ppx no longer needs an internet connection to retrieve a project's local data. However, you will need to specify the repository in which the project data resides. If we start a new Python session, we can find our previous file easily:

>>> import ppx

>>> proj = ppx.find_project("PXD000001", repo="PRIDE")
>>> proj.local_files()
# [PosixPath('/Users/wfondrie/.ppx/PXD000001/README.txt')]

Downloading to cloud storage backend

We use CloudPathlib to power support for AWS S3, Google Cloud Storage, and Azure Blob Storage. To use a cloud storage provider, create the bucket for ppx to use and set it as the ppx data directory.

For example using AWS S3, we can save the files of a project to an S3 bucket:

>>> proj = ppx.find_project("PXD000001", local="s3://my-bucket/PXD000001")
>>> proj.download("README.txt")
# [S3Path('s3://my-bucket/PXD000001/README.txt')]

CloudPathLib then provides methods to download files from S3 when you need them:

>>> readme_on_s3 = proj.local_files("README.txt")[0]
>>> readme_on_s3.download_to("README.txt")
# [PosixPath(README.txt)]

If you are an R user...

ppx was inspired the rpx R package by Laurent Gatto. Check it out on Bioconductor and GitHub.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.4.4

Apr 25, 2024

1.4.2

Apr 16, 2024

This version

1.3.0

Apr 30, 2022

1.2.6

Mar 17, 2022

1.2.5

Jan 5, 2022

1.2.4

Nov 23, 2021

1.2.3

Nov 5, 2021

1.2.2

Oct 12, 2021

1.2.1

Oct 11, 2021

1.2.0

Sep 14, 2021

1.1.1

Jul 2, 2021

1.1.0

May 19, 2021

1.0.0

May 14, 2021

0.5.0

Nov 25, 2020

0.4.2

Jun 20, 2020

0.4.1

Jun 20, 2020

0.3.0

Apr 8, 2019

0.2.1

Oct 24, 2018

0.1.3

Sep 21, 2018

0.1.2

Sep 21, 2018

0.1.1

Sep 18, 2018

0.1.0

Sep 18, 2018

0.0.1

Sep 18, 2018

0.0.0

Jun 20, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ppx-1.3.0.tar.gz (82.3 kB view hashes)

Uploaded Apr 30, 2022 Source

Built Distribution

ppx-1.3.0-py3-none-any.whl (27.9 kB view hashes)

Uploaded Apr 30, 2022 Python 3

Hashes for ppx-1.3.0.tar.gz

Hashes for ppx-1.3.0.tar.gz
Algorithm	Hash digest
SHA256	`a94f94071fca29e98fc13e4a306fc0c4c1deee2306dea9680e8a292a5c8b4376`
MD5	`a8d70a94121e20c00795a39c53bbddd5`
BLAKE2b-256	`905bd31f309b5470b6f39dfa1870eb539a9af106bb7f8942f00cc9bcea9fb898`

Hashes for ppx-1.3.0-py3-none-any.whl

Hashes for ppx-1.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5b6e107b86e20d91a1e07cb6b293f7befba4de10d0e8ba8a84547d234476d03d`
MD5	`9beff19aad91512f46fac020948490c7`
BLAKE2b-256	`caba6dbef69e4d5af47c63eda6d6a9b1b7ad3cac53ed207eb7ddb4e657b7770b`