PDBj data synchronization and database loading tool
Project description
pdb-mine-builder
Build a Mine-schema database from PDB data. Synchronizes structural biology data from wwPDB mirrors (PDBj by default) via rsync and loads it into PostgreSQL.
This project is based on PDBj's mine2updater. Thanks to the PDBj team for the original implementation and the Mine relational database design.
Documentation: https://n283t.github.io/pdb-mine-builder/
Features
- Multi-process parallel data loading with configurable workers
- Support for multiple data formats (CIF default, mmJSON optional)
- Configurable sync sources with regional wwPDB mirror support (PDBj, RCSB, PDBe)
- RDKit chemical search integration (substructure, similarity)
- SQL query interface with multi-format output (table, CSV, JSON, Parquet)
- Interactive SQL examples with 75+ queries across 10 categories
- 9 database schemas covering PDB structures, chemical components, validation reports, and more
Installation
Pixi (recommended)
Pixi manages all dependencies including Python, PostgreSQL, and RDKit in a single environment.
git clone https://github.com/N283T/pdb-mine-builder.git
cd pdb-mine-builder
pixi install
cp config.example.yml config.yml # Edit with your data paths
pixi run db-init # Initialize PostgreSQL
pixi run db-start # Start PostgreSQL
pixi run pmb sync # Sync data from wwPDB (PDBj by default)
pixi run pmb load pdbj --force # Load data
pixi run pmb stats # Check database statistics
pip (alternative)
Note: pip installs the Python package only. You must provide PostgreSQL (17+) and the RDKit PostgreSQL cartridge separately. Database management commands (
pixi run db-*) are not available.
pip install pdbminebuilder
cp config.example.yml config.yml # Edit with your data paths and connection string
pmb --help
conda + pip (alternative)
Note: Database management commands (
pixi run db-*) are not available. Use your own PostgreSQL instance.
conda create -n pmb python=3.12 rdkit-postgresql -c conda-forge
conda activate pmb
pip install pdbminebuilder
cp config.example.yml config.yml
pmb --help
Docker / Podman (alternative)
Note: Requires Docker or Podman. Data files must be mounted as volumes.
git clone https://github.com/N283T/pdb-mine-builder.git
cd pdb-mine-builder
cp config.example.yml config.yml # Edit data paths
docker compose -f docker/docker-compose.yml up -d
docker compose -f docker/docker-compose.yml run --rm pmb update pdbj --limit 10
See the Getting Started guide for detailed setup instructions.
Pipelines
| Pipeline | Description | Entries | Tables | Size | Format |
|---|---|---|---|---|---|
| pdbj | Main structure data | ~250k | 250 | 183 GB | CIF / mmJSON |
| vrpt | Validation reports | ~250k | 69 | 152 GB | CIF |
| contacts | Protein-protein contacts | ~250k | 2 | 13 GB | JSON |
| cc | Chemical components (with RDKit) | ~50k | 12 | 811 MB | CIF / mmJSON |
| ccmodel | Chemical component models | ~23k | 8 | 174 MB | CIF / mmJSON |
| prd | BIRD reference dictionary | ~1.2k | 17 | 50 MB | CIF / mmJSON |
Total: 368 tables, ~349 GB with all PDB entries loaded (as of 2026-03-08).
See the Database Reference for schema details and SQL examples.
Query
Execute SQL queries directly from the CLI with multiple output formats:
pmb query "SELECT * FROM cc.brief_summary LIMIT 5" # Rich table
pmb query "SELECT * FROM cc.brief_summary" -F csv > out.csv # CSV
pmb query "SELECT * FROM cc.brief_summary LIMIT 10" -F json # JSON
pmb query "SELECT * FROM cc.brief_summary" -F parquet -o out.parquet # Parquet
pmb query -f query.sql # SQL from file
Development
pixi run lint # Ruff check
pixi run format # Ruff format
pixi run test # Run tests (pytest)
pixi run check # All checks
Requirements
- Python 3.12+
- PostgreSQL 17+ (managed by rdkit-postgresql via conda-forge)
- Pixi — manages all dependencies (conda + PyPI)
- rsync
Note: Most dependencies are installed from conda-forge. Only
ccd2rdmol(PyPI only) andpsycopg[binary,pool](extras required) remain as PyPI dependencies. PostgreSQL version is determined by rdkit-postgresql.
License
MIT - See LICENSE for details.
Relationship to mine2updater
This project is inspired by mine2updater (LGPLv3) by PDBj, which loads PDB data into PostgreSQL using Node.js. pdb-mine-builder is an independent rewrite in Python with a completely different tech stack (gemmi, SQLAlchemy, psycopg3, RDKit), architecture, and data model. No code was copied or translated from the original project. Shared concepts (pipeline names, schema structures, PDB ID encoding) derive from PDB data specifications, not from the original codebase.
References
- Kinjo AR, Yamashita R, Nakamura H. PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan. Database (Oxford). 2010;2010:baq021. doi: 10.1093/database/baq021
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdbminebuilder-0.2.5.tar.gz.
File metadata
- Download URL: pdbminebuilder-0.2.5.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56e580c90e37d18a7254f0163373b8ab8c2f6851e126faae46ba46e1ad896fe8
|
|
| MD5 |
59d6c2d72ad7dc7d1faa6b6c649c32bb
|
|
| BLAKE2b-256 |
021993aa3b646439c7609c87e21033a184ce3aafd2d24a45f1dd853757ac5a72
|
Provenance
The following attestation bundles were made for pdbminebuilder-0.2.5.tar.gz:
Publisher:
release.yml on N283T/pdb-mine-builder
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pdbminebuilder-0.2.5.tar.gz -
Subject digest:
56e580c90e37d18a7254f0163373b8ab8c2f6851e126faae46ba46e1ad896fe8 - Sigstore transparency entry: 1109253992
- Sigstore integration time:
-
Permalink:
N283T/pdb-mine-builder@677d2d3bac439574c2a9b62bc8a9bb926baf87c4 -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/N283T
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@677d2d3bac439574c2a9b62bc8a9bb926baf87c4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pdbminebuilder-0.2.5-py3-none-any.whl.
File metadata
- Download URL: pdbminebuilder-0.2.5-py3-none-any.whl
- Upload date:
- Size: 233.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c3e6e36a69e96fe1e28a40d0bb2c0603d73d9f871e213d29f435ec932734c9a
|
|
| MD5 |
650b6ab0bfb79590acad87335d8a00ea
|
|
| BLAKE2b-256 |
d11a38670d3e8d31952a72679319e9bf2d0945ed8b5ea9f8237b3662cfa322a0
|
Provenance
The following attestation bundles were made for pdbminebuilder-0.2.5-py3-none-any.whl:
Publisher:
release.yml on N283T/pdb-mine-builder
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pdbminebuilder-0.2.5-py3-none-any.whl -
Subject digest:
3c3e6e36a69e96fe1e28a40d0bb2c0603d73d9f871e213d29f435ec932734c9a - Sigstore transparency entry: 1109253994
- Sigstore integration time:
-
Permalink:
N283T/pdb-mine-builder@677d2d3bac439574c2a9b62bc8a9bb926baf87c4 -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/N283T
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@677d2d3bac439574c2a9b62bc8a9bb926baf87c4 -
Trigger Event:
push
-
Statement type: