Skip to main content

Python Client library for PRIDE Rest API

Project description

pridepy

Python package PyPI version PyPI - Downloads

pridepy is a Python client and CLI for the PRIDE Archive API.

You can:

  • download public and private PRIDE files
  • download by category (RAW, SEARCH, RESULT, etc.)
  • stream project and file metadata
  • search projects by keyword and filters
  • download raw files from ProteomeXchange XML metadata

The downloader supports ftp, aspera, s3, and globus.
By default it starts with FTP, falls back across the remaining protocols when needed, and validates downloaded files (non-empty, and checksum validation when enabled).

Requirements

  • Python >=3.9

Installation

Option 1: Install from PyPI with uv (recommended)

Install as a CLI tool:

uv tool install pridepy
pridepy --help

Or run without installing globally:

uvx pridepy --help

Option 2: Install from PyPI with pip

pip install --upgrade pridepy
pridepy --help

Option 3: Install from source (development)

git clone https://github.com/PRIDE-Archive/pridepy
cd pridepy
uv sync --extra dev
uv run pridepy --help

Quick Start (New Users)

1) Download all raw files for a project (robust mode)

pridepy download-all-public-raw-files \
  -a PXD008644 \
  -o ./downloads/PXD008644 \
  --checksum-check

What this does:

  • default ftp starts with FTP and falls back (ftp -> aspera -> s3 -> globus)
  • --checksum-check downloads project checksums and validates files
  • empty/corrupt files are retried automatically

2) Continue interrupted downloads safely

pridepy download-all-public-raw-files \
  -a PXD008644 \
  -o ./downloads/PXD008644 \
  --skip-if-downloaded-already \
  --checksum-check

3) Download only selected categories

pridepy download-all-public-category-files \
  -a PXD022105 \
  -o ./downloads/PXD022105 \
  -c RAW,SEARCH

4) Download one file by name

pridepy download-file-by-name \
  -a PXD022105 \
  -f checksum.txt \
  -o ./downloads/PXD022105 \
  --checksum-check

5) Download raw files from ProteomeXchange

pridepy download-px-raw-files \
  -a PXD039236 \
  -o ./downloads/PXD039236

6) Download a named subset of files (manifest)

pridepy download-files-by-list \
  -a PXD001819 \
  -F files.txt \
  -o ./downloads/PXD001819 \
  --checksum-check

files.txt is one filename per line (blank lines and # comments are ignored). Internally each filename is resolved against the project metadata API and downloaded via the same batch + protocol-fallback engine as download-all-public-raw-files. Use -f a.raw,b.raw,c.raw instead of -F for a small inline list.

Useful options:

  • -p globus — use the globus download strategy (HTTP Range + resume)
  • -w 3 — download up to 3 files in parallel (globus only, max 3)
  • --checksum-check — validate files against PRIDE checksums after download

7) Download files from raw URLs

pridepy download-files-by-url \
  -F urls.txt \
  -o ./downloads/urls

urls.txt is one fully-qualified URL per line. Schemes http, https, and ftp are dispatched to the matching downloader. Use -u/--urls for one or more comma-separated URLs, e.g. --urls https://a.com/x.raw,ftp://b.com/y.raw. Note: URLs containing literal commas are not supported with --urls; use a manifest file (-F) instead.

Useful options:

  • -p globus — use globus download strategy for http/https URLs (resume-capable)
  • -w 3 — download up to 3 files in parallel (globus only, max 3)
  • --checksum-check — validate against PRIDE checksums (accession inferred from PRIDE URL paths; only PRIDE archive URLs are supported)

CLI Command Overview

pridepy --help

Main commands:

  • download-all-public-raw-files
  • download-all-public-category-files
  • download-file-by-name
  • download-files-by-list
  • download-files-by-url
  • download-px-raw-files
  • list-private-files
  • stream-files-metadata
  • stream-projects-metadata
  • search-projects-by-keywords-and-filters

More CLI Examples

Search projects

pridepy search-projects-by-keywords-and-filters \
  -k human \
  -f projectTags==ProteomeTools,organismsPart==Pancreas \
  -sd DESC \
  -sf accession \
  -sf submissionDate

Stream all project metadata to JSON

pridepy stream-projects-metadata -o all_pride_projects.json

Stream all file metadata for one accession

pridepy stream-files-metadata -a PXD005011 -o PXD005011_files.json

Download private files

List files:

pridepy list-private-files -a PXD022105 -u YOUR_USER -p YOUR_PASSWORD

Download a private file:

pridepy download-file-by-name \
  -a PXD022105 \
  -f checksum.txt \
  -o ./downloads/private \
  --username YOUR_USER \
  --password YOUR_PASSWORD

Python API Examples

Example: get raw files for a project

from pridepy.files.files import Files

files = Files()
raw_files = files.get_all_raw_file_list("PXD008644")
print(f"RAW files: {len(raw_files)}")
print(raw_files[0]["fileName"])

Example: search projects

from pridepy.project.project import Project

project = Project()
results = project.search_by_keywords_and_filters(
    keyword="PXD009476",
    query_filter="",
    page_size=25,
    page=0,
    sort_direction="DESC",
    sort_fields="accession",
)
print(f"Hits: {len(results)}")

Development and Release (uv)

Run tests:

uv run pytest

Lint:

uv run flake8 .

Build distributions:

uv build

pridepy is published via GitHub Actions (.github/workflows/python-publish.yml) using uv build and a PyPI API token secret (PYPI_API_TOKEN).

White Paper

A white paper is available in paper/paper.md.

Contributing

  1. Fork the repository
  2. Create a branch (git checkout -b feature/my-change)
  3. Install dev dependencies (uv sync --extra dev)
  4. Run tests and lint (uv run pytest, uv run flake8 .)
  5. Commit and push your branch
  6. Open a pull request

Citation

Kamatchinathan, S., Hewapathirana, S., Bandla, C., Insua, S., Vizcaíno, J. A., & Perez-Riverol, Y. (2025). pridepy: A Python package to download and search data from PRIDE database. Journal of Open Source Software, 10(107), 7563. doi:10.21105/joss.07563

Zenodo DOI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pridepy-0.0.15.tar.gz (38.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pridepy-0.0.15-py3-none-any.whl (38.5 MB view details)

Uploaded Python 3

File details

Details for the file pridepy-0.0.15.tar.gz.

File metadata

  • Download URL: pridepy-0.0.15.tar.gz
  • Upload date:
  • Size: 38.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pridepy-0.0.15.tar.gz
Algorithm Hash digest
SHA256 713ee98e8020b1b1937105b9fdeb0b5f6abaa8668008288901d21c65f334fed9
MD5 fe5917bd128a2af09bf304d4ad780089
BLAKE2b-256 fbc0d89ee56b5640ff3acbdfa22680adc6b3a67fc3c00a5d2c592af91ce794c6

See more details on using hashes here.

File details

Details for the file pridepy-0.0.15-py3-none-any.whl.

File metadata

  • Download URL: pridepy-0.0.15-py3-none-any.whl
  • Upload date:
  • Size: 38.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pridepy-0.0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 94967b5c73f06b1c1a3fa4ca130ae2ecbda9bc32606671b1c6fe85ffd9253791
MD5 bfde454b80e0cb95a81b9a109b8bf38b
BLAKE2b-256 d3e4ae1394d05e13d4093c1c1f76dc356d23f4ad37eaa0d27b144e60c21b24ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page