Skip to main content

Python Client library for PRIDE Rest API

Project description

pridepy: A Python package to download and search data from PRIDE database

Python package PyPI version PyPI - Downloads

Python Client library for PRIDE Rest API

Installation

From PyPI

To install, simply use pip:

$ pip install --upgrade pridepy

From Source

First, clone the repository on your local machine and then install the package using pip:

$ git clone https://github.com/PRIDE-Archive/pridepy
$ cd pridepy
$ poetry build
$ pip install dist/*.whl

Install with setup.py:

$ git clone https://github.com/PRIDE-Archive/pridepy
$ cd pridepy
$ poetry build
$ pip install dist/pridepy-{version}.tar.gz

Usage and Documentation

This Python CLI tool, built using the Click module, already provides detailed usage instructions for each command. To avoid redundancy and potential clutter in this README, you can access the usage instructions directly from the CLI Use the below command to view a list of commands available:

$ pridepy --help
Usage: pridepy [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  download-all-public-raw-files   Download all public raw files...
  download-file-by-name           Download a single file from a...
  get-files-by-filter             get paged files :return:
  get-files-by-project-accession  get files by project accession...
  get-private-files               Get private files by project...
  get-projects                    get paged projects :return:
  get-projects-by-accession       get projects by accession... 
  stream-files-metadata           Stream all files metadata in...
  stream-projects-metadata        Stream all projects metadata...
  search-projects-by-keywords-and-filters Search all projects by keywords...
    

[!NOTE] Please make sure you are using Python3, not Python 2.7 version.

Downloading a project from PRIDE Archive

The main purpose of this tool is to download data from the PRIDE Archive. Here, how to download all the raw files from a dataset(eg: PXD012353).

$ pridepy download-all-public-raw-files -a PXD012353 -o /Users/yourname/Downloads/foldername/ -p aspera
  • -a flag is used to specify the project accession number.
  • -o flag is used to specify the output directory.
  • -p flag is used to specify the protocol (aspera, ftp, globus)

[!IMPORTANT] Currently, pridepy supports multiple protocols for downloading including ftp, aspera, globus, s3. ftp, aspera uses those protocols to download the files; the pridepy includes the aspera client. For globus and s3, the tool uses https of both services endpoints. Read the whitepaper to know more about the performance of each protocol.

Additional options:

  • --skip-if-downloaded-already flag is used to skip files that already exist in the output directory. By default, files are re-downloaded even if they already exist. Use this flag to avoid re-downloading existing files.
  • --aspera-maximum-bandwidth flag is used to specify the maximum bandwidth for the Aspera download. The default value is 100M.
  • --checksum-check flag is used to check the checksum of the downloaded files. The default value is False.

Downloading raw files from ProteomeXchange (PX)

You can download all raw files referenced by a ProteomeXchange dataset by passing only the accession:

$ pridepy download-px-raw-files -a PXD039236 -o /Users/yourname/Downloads/foldername/
  • The tool resolves the ProteomeXchange XML and downloads via FTP when available, otherwise HTTP(S).
  • Resume is supported. Use --skip-if-downloaded-already flag to skip files that have already been downloaded.

Download single file by name

Users instead of downloading an entire project files may be interested in downloading a single file if they know it by name. Here is how to download a single file by name.

$ pridepy download-file-by-name -a PXD022105 -o /Users/yourname/Downloads/foldername/ -f checksum.txt -p globus

Please be aware that the additional parameters are the same as the previous command Downloading a project from PRIDE Archive.

Download project files by category

Users may be interested in downloading files by category. Here is how to download files by category. The different categories are available in the PRIDE Archive:

  • RAW: Raw data files
  • PEAK: Peak list files
  • SEARCH: Search engine output files
  • OTHER: Other files
  • RESULT: Result files
  • SPECTRUM LIBRARIES: Spectrum libraries
  • FASTA: FASTA files
$ pridepy download-files-by-category -a PXD022105 -o /Users/yourname/Downloads/foldername/ -c RAW -p ftp

Please be aware that the additional parameters are the same as the previous command Downloading a project from PRIDE Archive.

[!IMPORTANT] We also implemented a direct command to download RAW files from a project which is the most common use case.

Download private files

Users and especially reviewers may be interested in downloading private files. Here is how to download private files.

First, the user can list the private files of a project:

$ pridepy list-private-files -a PXD022105 -u yourusername -p yourpassword

This command will list the private files of the project PXD022105. Including the file name, file size, and download link.

Then the user can download the private files:

$ pridepy download-file-by-name -a PXD022105 -o /Users/yourname/Downloads/foldername/ --username yourusername --password yourpassword -f checksum.txt 

[!WARNING] To download preivate files, the user should use the same command as downloading a single file by name. The only difference is that the user should provide the username and password. However, protocol in this case is unnecessary as the tool will use the https protocol to download the files. At the moment we only allow this protocol because of the infrastructure of PRIDE private files (read the whitepaper for more information).

Streaming metadata

One of the great features of PRIDE and pridepy is the ability to stream metadata of all projects and files. This is useful for users who want to analyze the metadata of all projects and files locally.

Stream metadata of all projects as JSON and write it to a file:

$ pridepy stream-projects-metadata -o all_pride_projects.json

Stream all files metadata in a specific project as JSON and write it to a file:

$ pridepy stream-files-metadata -o all_pride_files_metadata.json

Stream the files metadata of a specific project as JSON and write it to a file:

$ pridepy stream-files-metadata -o PXD005011_files.json -a PXD005011

Search projects by keywords and filters

Get the Project metadata by keywords and filters

$  python -m pridepy.pridepy search-projects-by-keywords-and-filters -f projectTags==Proteometools,organismsPart==Pancreas -k human -sd DESC -sf accession -sf submissionDate

White paper

A white paper is available at here. We can build it as PDF using pandoc.

$docker run --rm --platform linux/amd64 -v /Users/yperez/work/pridepy/paper/:/data -w /data openjournals/inara:latest paper.md -p -o pdf

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement."

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Citation

Kamatchinathan, S., Hewapathirana, S., Bandla, C., Insua, S., Vizcaíno, J. A., & Perez-Riverol, Y. (2025). pridepy: A Python package to download and search data from PRIDE database. Journal of Open Source Software, 10(107), 7563. doi:10.21105/joss.07563

Zenodo DOI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pridepy-0.0.12.tar.gz (38.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pridepy-0.0.12-py3-none-any.whl (38.5 MB view details)

Uploaded Python 3

File details

Details for the file pridepy-0.0.12.tar.gz.

File metadata

  • Download URL: pridepy-0.0.12.tar.gz
  • Upload date:
  • Size: 38.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pridepy-0.0.12.tar.gz
Algorithm Hash digest
SHA256 39a71cc03d1c27161da1ef584441a6e82c0258db5d6448278bb680e9aa136df8
MD5 6c69707e7e0aefff4d55c5dd638a5550
BLAKE2b-256 b854a98e6931c68c481579a6ae6b25413053bd1dbab710446bf208ead81d3d38

See more details on using hashes here.

File details

Details for the file pridepy-0.0.12-py3-none-any.whl.

File metadata

  • Download URL: pridepy-0.0.12-py3-none-any.whl
  • Upload date:
  • Size: 38.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pridepy-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 a82ed496a183e41ce8c68047426a1f4966a212b0c1062b668563e23cb7297ba6
MD5 24f77944b50fa667512282fefd2370d9
BLAKE2b-256 35fb0d87c432d864bc7ac06bdaa9c592709a2fc291be0257a36cedfcc59f8542

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page