Skip to main content

Python Client library for PRIDE Rest API

Project description

pridepy

Python package PyPI version PyPI - Downloads

pridepy is a Python client and CLI for the PRIDE Archive API.

You can:

  • download public and private PRIDE files
  • download by category (RAW, SEARCH, RESULT, etc.)
  • stream project and file metadata
  • search projects by keyword and filters
  • download raw files from ProteomeXchange XML metadata

The downloader supports ftp, aspera, s3, and globus.
By default it starts with FTP, falls back across the remaining protocols when needed, and validates downloaded files (non-empty, and checksum validation when enabled).

Requirements

  • Python >=3.9

Installation

Option 1: Install from PyPI with uv (recommended)

Install as a CLI tool:

uv tool install pridepy
pridepy --help

Or run without installing globally:

uvx pridepy --help

Option 2: Install from PyPI with pip

pip install --upgrade pridepy
pridepy --help

Option 3: Install from source (development)

git clone https://github.com/PRIDE-Archive/pridepy
cd pridepy
uv sync --extra dev
uv run pridepy --help

Quick Start (New Users)

1) Download all raw files for a project (robust mode)

pridepy download-all-public-raw-files \
  -a PXD008644 \
  -o ./downloads/PXD008644 \
  --checksum-check

What this does:

  • default ftp starts with FTP and falls back (ftp -> aspera -> s3 -> globus)
  • --checksum-check downloads project checksums and validates files
  • empty/corrupt files are retried automatically

2) Continue interrupted downloads safely

pridepy download-all-public-raw-files \
  -a PXD008644 \
  -o ./downloads/PXD008644 \
  --skip-if-downloaded-already \
  --checksum-check

3) Download only selected categories

pridepy download-all-public-category-files \
  -a PXD022105 \
  -o ./downloads/PXD022105 \
  -c RAW,SEARCH

4) Download one file by name

pridepy download-file-by-name \
  -a PXD022105 \
  -f checksum.txt \
  -o ./downloads/PXD022105 \
  --checksum-check

5) Download raw files from ProteomeXchange

pridepy download-px-raw-files \
  -a PXD039236 \
  -o ./downloads/PXD039236

CLI Command Overview

pridepy --help

Main commands:

  • download-all-public-raw-files
  • download-all-public-category-files
  • download-file-by-name
  • download-px-raw-files
  • list-private-files
  • stream-files-metadata
  • stream-projects-metadata
  • search-projects-by-keywords-and-filters

More CLI Examples

Search projects

pridepy search-projects-by-keywords-and-filters \
  -k human \
  -f projectTags==ProteomeTools,organismsPart==Pancreas \
  -sd DESC \
  -sf accession \
  -sf submissionDate

Stream all project metadata to JSON

pridepy stream-projects-metadata -o all_pride_projects.json

Stream all file metadata for one accession

pridepy stream-files-metadata -a PXD005011 -o PXD005011_files.json

Download private files

List files:

pridepy list-private-files -a PXD022105 -u YOUR_USER -p YOUR_PASSWORD

Download a private file:

pridepy download-file-by-name \
  -a PXD022105 \
  -f checksum.txt \
  -o ./downloads/private \
  --username YOUR_USER \
  --password YOUR_PASSWORD

Python API Examples

Example: get raw files for a project

from pridepy.files.files import Files

files = Files()
raw_files = files.get_all_raw_file_list("PXD008644")
print(f"RAW files: {len(raw_files)}")
print(raw_files[0]["fileName"])

Example: search projects

from pridepy.project.project import Project

project = Project()
results = project.search_by_keywords_and_filters(
    keyword="PXD009476",
    query_filter="",
    page_size=25,
    page=0,
    sort_direction="DESC",
    sort_fields="accession",
)
print(f"Hits: {len(results)}")

Development and Release (uv)

Run tests:

uv run pytest

Lint:

uv run flake8 .

Build distributions:

uv build

pridepy is published via GitHub Actions (.github/workflows/python-publish.yml) using uv build and a PyPI API token secret (PYPI_API_TOKEN).

White Paper

A white paper is available in paper/paper.md.

Contributing

  1. Fork the repository
  2. Create a branch (git checkout -b feature/my-change)
  3. Install dev dependencies (uv sync --extra dev)
  4. Run tests and lint (uv run pytest, uv run flake8 .)
  5. Commit and push your branch
  6. Open a pull request

Citation

Kamatchinathan, S., Hewapathirana, S., Bandla, C., Insua, S., Vizcaíno, J. A., & Perez-Riverol, Y. (2025). pridepy: A Python package to download and search data from PRIDE database. Journal of Open Source Software, 10(107), 7563. doi:10.21105/joss.07563

Zenodo DOI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pridepy-0.0.14.tar.gz (38.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pridepy-0.0.14-py3-none-any.whl (38.5 MB view details)

Uploaded Python 3

File details

Details for the file pridepy-0.0.14.tar.gz.

File metadata

  • Download URL: pridepy-0.0.14.tar.gz
  • Upload date:
  • Size: 38.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pridepy-0.0.14.tar.gz
Algorithm Hash digest
SHA256 fc802803b24189ea00988adca68ea6e81a9f939fd6f3d26a9b98fa1d271fa2f5
MD5 499ad37870eb5c65b85410f79aca82d7
BLAKE2b-256 efebf4d810ba432e4fc05329d8f1859995398d905bae66a654555af7bb9c6f61

See more details on using hashes here.

File details

Details for the file pridepy-0.0.14-py3-none-any.whl.

File metadata

  • Download URL: pridepy-0.0.14-py3-none-any.whl
  • Upload date:
  • Size: 38.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pridepy-0.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 e44bca6e13c07461d7ea502ba0116726f9a7d22f7c1dd56018a4af00aed8fda4
MD5 a88ba1b2dee9efaba78e34b556726cb9
BLAKE2b-256 9ab3adf9f6f35a23552de7310e471ec7d48e860a63d57854bb0e5b1b269acdfa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page