Skip to main content

Query, filter, and retrieve proteomics dataset metadata from ProteomeXchange

Project description

pxseek

Query, filter, and retrieve proteomics dataset metadata from ProteomeXchange.

Python 3.12-3.14 v0.5.1 Beta CI 302 tests passed MIT

Changelog Citation Wiki

pxseek replaces the original Selenium-based web scraper with a clean, API-driven approach using the ProteomeCentral bulk TSV and per-dataset XML endpoints. No browser or ChromeDriver required.

pxseek has three core commands.

  • fetch downloads the clean summary table.
  • filter narrows that table by metadata.
  • lookup fetches richer XML-derived metadata for a shortlist.

Installation

Requires Python 3.12-3.14.

pip install pxseek

Or with uv:

uv tool install pxseek

For development setup and source checkout, see the Installation guide.

CLI Quickstart

The shortest useful workflow is:

uv run pxseek fetch -o px_datasets.tsv
uv run pxseek filter -i px_datasets.tsv -s "Homo sapiens" -k "cancer" -o shortlist.tsv
uv run pxseek lookup --input shortlist.tsv -o detailed.tsv

One rule matters most. filter expects the cleaned artifact written by pxseek fetch, not the raw ProteomeCentral export.

If you want machine-friendly outputs, use --format json or -o - and keep the rest of the workflow the same. The detailed format and pipeline behavior live in the docs.

Python API

pxseek is CLI-first, but it exposes a small stable workflow API for code that should not shell out to the CLI.

from pxseek import fetch_datasets, filter_datasets, lookup_datasets

summary = fetch_datasets().df
filtered, _ = filter_datasets(summary, species="Homo sapiens", keywords="cancer")
details = lookup_datasets(filtered["dataset_id"]).df

The supported root imports are fetch_datasets(), filter_datasets(), lookup_datasets(), read_artifact(), render_artifact(), and write_artifact().

Documentation

More detailed documentation and examples live in the GitHub wiki.

Development

The local development workflow matches CI.

uv sync --extra dev
uv run --extra dev pytest
uv run --extra dev ruff check src/ tests/
uv run --extra dev ruff format --check src/ tests/
uv build

Legacy

The original single-file Selenium scraper is preserved in legacy/proteomeXchange_scraper.py for reference.

Citation

If you use pxseek in your work, please cite it:

@software{pxseek2026,
  title = {pxseek: Query, filter, and retrieve proteomics dataset metadata from ProteomeXchange},
  author = {Enes K. Ergin and Kimia Rostin and Philipp F. Lange},
  year = {2026},
  url = {https://github.com/LangeLab/pxseek},
  version = {0.5.1},
}

A CITATION.cff file is also available in the repository root.

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pxseek-0.5.1.tar.gz (62.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pxseek-0.5.1-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file pxseek-0.5.1.tar.gz.

File metadata

  • Download URL: pxseek-0.5.1.tar.gz
  • Upload date:
  • Size: 62.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pxseek-0.5.1.tar.gz
Algorithm Hash digest
SHA256 0eacbeca9a3feb9ab4ee0b9ca94afc2aa4145a2977acecde6d107e85ac366b58
MD5 4d3fce2aaabdb6c0eb805f2b6ef3570b
BLAKE2b-256 24c8ac820471c38d7af97eafe9ca0615f7523c14a0bdaa493c5995ab7011928d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pxseek-0.5.1.tar.gz:

Publisher: publish.yml on LangeLab/pxseek

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pxseek-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: pxseek-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 27.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pxseek-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b0ec67424eb286138a94f91e6da420d542969a5392c667ca32f7184fa4bf7c59
MD5 9e14dd682e692571b95c8a93f8e406c9
BLAKE2b-256 a35ada52ac80c96a1d4bdeb1a47643f925f94ea908154566425c39e34bf6c32d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pxseek-0.5.1-py3-none-any.whl:

Publisher: publish.yml on LangeLab/pxseek

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page