Python Client library for PRIDE Rest API
Project description
pridepy
pridepy is a Python client and CLI for the PRIDE Archive API.
You can:
- download public and private PRIDE files
- download by category (
RAW,SEARCH,RESULT, etc.) - stream project and file metadata
- search projects by keyword and filters
- download raw files from ProteomeXchange XML metadata
The downloader supports ftp, aspera, s3, and globus.
By default it starts with FTP, falls back across the remaining protocols when needed, and validates downloaded files (non-empty, and checksum validation when enabled).
Requirements
- Python
>=3.9
Installation
Option 1: Install from PyPI with uv (recommended)
Install as a CLI tool:
uv tool install pridepy
pridepy --help
Or run without installing globally:
uvx pridepy --help
Option 2: Install from PyPI with pip
pip install --upgrade pridepy
pridepy --help
Option 3: Install from source (development)
git clone https://github.com/PRIDE-Archive/pridepy
cd pridepy
uv sync --extra dev
uv run pridepy --help
Quick Start (New Users)
1) Download all raw files for a project (robust mode)
pridepy download-all-public-raw-files \
-a PXD008644 \
-o ./downloads/PXD008644 \
--checksum-check
What this does:
- default
ftpstarts with FTP and falls back (ftp -> aspera -> s3 -> globus) --checksum-checkdownloads project checksums and validates files- empty/corrupt files are retried automatically
2) Continue interrupted downloads safely
pridepy download-all-public-raw-files \
-a PXD008644 \
-o ./downloads/PXD008644 \
--skip-if-downloaded-already \
--checksum-check
3) Download only selected categories
pridepy download-all-public-category-files \
-a PXD022105 \
-o ./downloads/PXD022105 \
-c RAW,SEARCH
4) Download one file by name
pridepy download-file-by-name \
-a PXD022105 \
-f checksum.txt \
-o ./downloads/PXD022105 \
--checksum-check
5) Download raw files from ProteomeXchange
pridepy download-px-raw-files \
-a PXD039236 \
-o ./downloads/PXD039236
CLI Command Overview
pridepy --help
Main commands:
download-all-public-raw-filesdownload-all-public-category-filesdownload-file-by-namedownload-px-raw-fileslist-private-filesstream-files-metadatastream-projects-metadatasearch-projects-by-keywords-and-filters
More CLI Examples
Search projects
pridepy search-projects-by-keywords-and-filters \
-k human \
-f projectTags==ProteomeTools,organismsPart==Pancreas \
-sd DESC \
-sf accession \
-sf submissionDate
Stream all project metadata to JSON
pridepy stream-projects-metadata -o all_pride_projects.json
Stream all file metadata for one accession
pridepy stream-files-metadata -a PXD005011 -o PXD005011_files.json
Download private files
List files:
pridepy list-private-files -a PXD022105 -u YOUR_USER -p YOUR_PASSWORD
Download a private file:
pridepy download-file-by-name \
-a PXD022105 \
-f checksum.txt \
-o ./downloads/private \
--username YOUR_USER \
--password YOUR_PASSWORD
Python API Examples
Example: get raw files for a project
from pridepy.files.files import Files
files = Files()
raw_files = files.get_all_raw_file_list("PXD008644")
print(f"RAW files: {len(raw_files)}")
print(raw_files[0]["fileName"])
Example: search projects
from pridepy.project.project import Project
project = Project()
results = project.search_by_keywords_and_filters(
keyword="PXD009476",
query_filter="",
page_size=25,
page=0,
sort_direction="DESC",
sort_fields="accession",
)
print(f"Hits: {len(results)}")
Development and Release (uv)
Run tests:
uv run pytest
Lint:
uv run flake8 .
Build distributions:
uv build
pridepy is published via GitHub Actions (.github/workflows/python-publish.yml) using uv build and a PyPI API token secret (PYPI_API_TOKEN).
White Paper
A white paper is available in paper/paper.md.
Build PDF with pandoc:
docker run --rm --platform linux/amd64 \
-v "$(pwd)/paper:/data" \
-w /data openjournals/inara:latest paper.md -p -o pdf
Contributing
- Fork the repository
- Create a branch (
git checkout -b feature/my-change) - Install dev dependencies (
uv sync --extra dev) - Run tests and lint (
uv run pytest,uv run flake8 .) - Commit and push your branch
- Open a pull request
Citation
Kamatchinathan, S., Hewapathirana, S., Bandla, C., Insua, S., Vizcaíno, J. A., & Perez-Riverol, Y. (2025). pridepy: A Python package to download and search data from PRIDE database. Journal of Open Source Software, 10(107), 7563. doi:10.21105/joss.07563
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pridepy-0.0.13.tar.gz.
File metadata
- Download URL: pridepy-0.0.13.tar.gz
- Upload date:
- Size: 38.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf3edfba53bbbf14968b4fc300aa037e248eb55d50f7fe147eacb76e905b9bb6
|
|
| MD5 |
b8038ef90521902a4aabbee35423333f
|
|
| BLAKE2b-256 |
23a159c00343a9fa562d55a72908f2ac37d92898b84dc062a550cdd6bb7510c8
|
File details
Details for the file pridepy-0.0.13-py3-none-any.whl.
File metadata
- Download URL: pridepy-0.0.13-py3-none-any.whl
- Upload date:
- Size: 38.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d26626c0a2e1e0dfd735259ec8a88443307b8cd0700baa3715d311e9af655f55
|
|
| MD5 |
b027446e583e27ffd670fcadaac5dc3b
|
|
| BLAKE2b-256 |
184ba9acccb54c39cd7a891020846d7b2443b910fac9d2f49e79d30e67b4a892
|