Skip to main content

Python Client library for PRIDE Rest API

Project description

pridepy

Python package PyPI version PyPI - Downloads

pridepy is a Python client and CLI for the PRIDE Archive API.

You can:

  • download public and private PRIDE files
  • download by category (RAW, SEARCH, RESULT, etc.)
  • stream project and file metadata
  • search projects by keyword and filters
  • download raw files from ProteomeXchange XML metadata

The downloader supports ftp, aspera, s3, and globus.
By default it starts with FTP, falls back across the remaining protocols when needed, and validates downloaded files (non-empty, and checksum validation when enabled).

Requirements

  • Python >=3.9

Installation

Option 1: Install from PyPI with uv (recommended)

Install as a CLI tool:

uv tool install pridepy
pridepy --help

Or run without installing globally:

uvx pridepy --help

Option 2: Install from PyPI with pip

pip install --upgrade pridepy
pridepy --help

Option 3: Install from source (development)

git clone https://github.com/PRIDE-Archive/pridepy
cd pridepy
uv sync --extra dev
uv run pridepy --help

Quick Start (New Users)

1) Download all raw files for a project (robust mode)

pridepy download-all-public-raw-files \
  -a PXD008644 \
  -o ./downloads/PXD008644 \
  --checksum-check

What this does:

  • default ftp starts with FTP and falls back (ftp -> aspera -> s3 -> globus)
  • --checksum-check downloads project checksums and validates files
  • empty/corrupt files are retried automatically

2) Continue interrupted downloads safely

pridepy download-all-public-raw-files \
  -a PXD008644 \
  -o ./downloads/PXD008644 \
  --skip-if-downloaded-already \
  --checksum-check

3) Download only selected categories

pridepy download-all-public-category-files \
  -a PXD022105 \
  -o ./downloads/PXD022105 \
  -c RAW,SEARCH

4) Download one file by name

pridepy download-file-by-name \
  -a PXD022105 \
  -f checksum.txt \
  -o ./downloads/PXD022105 \
  --checksum-check

5) Download raw files from ProteomeXchange

pridepy download-px-raw-files \
  -a PXD039236 \
  -o ./downloads/PXD039236

CLI Command Overview

pridepy --help

Main commands:

  • download-all-public-raw-files
  • download-all-public-category-files
  • download-file-by-name
  • download-px-raw-files
  • list-private-files
  • stream-files-metadata
  • stream-projects-metadata
  • search-projects-by-keywords-and-filters

More CLI Examples

Search projects

pridepy search-projects-by-keywords-and-filters \
  -k human \
  -f projectTags==ProteomeTools,organismsPart==Pancreas \
  -sd DESC \
  -sf accession \
  -sf submissionDate

Stream all project metadata to JSON

pridepy stream-projects-metadata -o all_pride_projects.json

Stream all file metadata for one accession

pridepy stream-files-metadata -a PXD005011 -o PXD005011_files.json

Download private files

List files:

pridepy list-private-files -a PXD022105 -u YOUR_USER -p YOUR_PASSWORD

Download a private file:

pridepy download-file-by-name \
  -a PXD022105 \
  -f checksum.txt \
  -o ./downloads/private \
  --username YOUR_USER \
  --password YOUR_PASSWORD

Python API Examples

Example: get raw files for a project

from pridepy.files.files import Files

files = Files()
raw_files = files.get_all_raw_file_list("PXD008644")
print(f"RAW files: {len(raw_files)}")
print(raw_files[0]["fileName"])

Example: search projects

from pridepy.project.project import Project

project = Project()
results = project.search_by_keywords_and_filters(
    keyword="PXD009476",
    query_filter="",
    page_size=25,
    page=0,
    sort_direction="DESC",
    sort_fields="accession",
)
print(f"Hits: {len(results)}")

Development and Release (uv)

Run tests:

uv run pytest

Lint:

uv run flake8 .

Build distributions:

uv build

pridepy is published via GitHub Actions (.github/workflows/python-publish.yml) using uv build and a PyPI API token secret (PYPI_API_TOKEN).

White Paper

A white paper is available in paper/paper.md.

Build PDF with pandoc:

docker run --rm --platform linux/amd64 \
  -v "$(pwd)/paper:/data" \
  -w /data openjournals/inara:latest paper.md -p -o pdf

Contributing

  1. Fork the repository
  2. Create a branch (git checkout -b feature/my-change)
  3. Install dev dependencies (uv sync --extra dev)
  4. Run tests and lint (uv run pytest, uv run flake8 .)
  5. Commit and push your branch
  6. Open a pull request

Citation

Kamatchinathan, S., Hewapathirana, S., Bandla, C., Insua, S., Vizcaíno, J. A., & Perez-Riverol, Y. (2025). pridepy: A Python package to download and search data from PRIDE database. Journal of Open Source Software, 10(107), 7563. doi:10.21105/joss.07563

Zenodo DOI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pridepy-0.0.13.tar.gz (38.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pridepy-0.0.13-py3-none-any.whl (38.5 MB view details)

Uploaded Python 3

File details

Details for the file pridepy-0.0.13.tar.gz.

File metadata

  • Download URL: pridepy-0.0.13.tar.gz
  • Upload date:
  • Size: 38.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pridepy-0.0.13.tar.gz
Algorithm Hash digest
SHA256 cf3edfba53bbbf14968b4fc300aa037e248eb55d50f7fe147eacb76e905b9bb6
MD5 b8038ef90521902a4aabbee35423333f
BLAKE2b-256 23a159c00343a9fa562d55a72908f2ac37d92898b84dc062a550cdd6bb7510c8

See more details on using hashes here.

File details

Details for the file pridepy-0.0.13-py3-none-any.whl.

File metadata

  • Download URL: pridepy-0.0.13-py3-none-any.whl
  • Upload date:
  • Size: 38.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pridepy-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 d26626c0a2e1e0dfd735259ec8a88443307b8cd0700baa3715d311e9af655f55
MD5 b027446e583e27ffd670fcadaac5dc3b
BLAKE2b-256 184ba9acccb54c39cd7a891020846d7b2443b910fac9d2f49e79d30e67b4a892

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page