Secondary to primary identifier mapping
Project description
pySec2Pri
Create and use mapping files for secondary (retired/withdrawn) biological database identifiers and symbols to primary (current) identifiers and symbols.
Outputs mappings in SSSOM format by default. Subjects are secondary, objects are primary.
Installation
uv pip install pysec2pri
Or install from source:
uv pip install git+https://github.com/jmillanacosta/pysec2pri.git
Quick Start
Generating mapping sets
To obtain the secondary to primary identifier SSSOM mapping set for ChEBI:
pysec2pri chebi
This will automatically download the latest ChEBI release and generate an SSSOM mapping file in your current directory.
To process locally and specify the output:
pysec2pri chebi ChEBI_complete_3star.sdf --output my_mappings.sssom.tsv
For more options and help on any command:
pysec2pri --help
pysec2pri chebi --help
The default output is in SSSOM (Simple Standard for Sharing Ontology Mappings) TSV format.
Updating IDs and symbols
A generated mapping set can be used to update IDs and symbols in Python:
from pysec2pri import generate_chebi_synonyms, resolve_symbols
cs = generate_chebi_synonyms()
resolve_symbols(["Glucose", "ATP", "Guanine"], cs)
Or from the command line, given a TSV file gene_ex.tsv:
gene data
HGNC:131 3.5
Resolve the gene column to primary HGNC IDs (a new _primary column is
added):
pysec2pri update-ids gene_ex.tsv hgnc --at gene -o gene_ex_primary.tsv
# gene data gene_primary
# HGNC:131 3.5 HGNC:145
The same pattern works for symbols with update-symbols, and multiple columns
can be resolved by repeating --at:
pysec2pri update-ids data.tsv hgnc --at gene_id --at related_gene_id
To skip regenerating the mapping set, pass a pre-built mapping file:
pysec2pri hgnc ids # outputs hgnc_{version}_sssom.tsv
pysec2pri update-ids gene_ex.tsv hgnc --at gene --mapping hgnc_{version}_sssom.tsv
Documentation
Full documentation: https://pysec2pri.readthedocs.io/
Supported Databases
| Datasource | license | citation |
|---|---|---|
| ChEBI | CC BY 4.0. | Hastings J, Owen G, Dekker A, et al. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Research. 2016 Jan;44(D1):D1214-9. DOI: 10.1093/nar/gkv1031. PMID: 26467479; PMCID: PMC4702775. |
| HMDB | CC0 | Wishart DS, Guo A, Oler E, Wang F, Anjum A, Peters H, Dizon R, Sayeeda Z, Tian S, Lee BL, Berjanskii M, Mah R, Yamamoto M, Jovel J, Torres-Calzada C, Hiebert-Giesbrecht M, Lui VW, Varshavi D, Varshavi D, Allen D, Arndt D, Khetarpal N, Sivakumaran A, Harford K, Sanford S, Yee K, Cao X, Budinski Z, Liigand J, Zhang L, Zheng J, Mandal R, Karu N, Dambrova M, Schiöth HB, Greiner R, Gautam V. HMDB 5.0: the Human Metabolome Database for 2022. Nucleic Acids Res. 2022 Jan 7;50(D1):D622-D631. doi: 10.1093/nar/gkab1062. PMID: 34986597; PMCID: PMC8728138. |
| HGNC | link | Seal RL, Braschi B, Gray K, Jones TEM, Tweedie S, Haim-Vilmovsky L, Bruford EA. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res. 2023 Jan 6;51(D1):D1003-D1009. doi: 10.1093/nar/gkac888. PMID: 36243972; PMCID: PMC9825485. |
| NCBI | link | Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, Connor R, Funk K, Kelly C, Kim S, Madej T, Marchler-Bauer A, Lanczycki C, Lathrop S, Lu Z, Thibaud-Nissen F, Murphy T, Phan L, Skripchenko Y, Tse T, Wang J, Williams R, Trawick BW, Pruitt KD, Sherry ST. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2022 Jan 7;50(D1):D20-D26. doi: 10.1093/nar/gkab1112. PMID: 34850941; PMCID: PMC8728269. |
| UniProt | CC BY 4.0 | UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100. PMID: 33237286; PMCID: PMC7778908. |
| Wikidata | Vrandecic, D., Krotzsch, M. Wikidata: a free collaborative knowledgebase. Communications of the ACM. 2014. doi: 10.1145/2629489. |
License
MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pysec2pri-0.0.2.tar.gz.
File metadata
- Download URL: pysec2pri-0.0.2.tar.gz
- Upload date:
- Size: 60.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26ee8a96c80dbd3631990100125f2ad1ff1a1e8d1825bcacdc4e755ed0e7bf5f
|
|
| MD5 |
2cf5ed42dee5d9fb1c2c458ad2c72385
|
|
| BLAKE2b-256 |
5448c2f5ef40acd45a3740a403d6a65e67f8c7129326dfb4d0f9c8bec5ceed8b
|
Provenance
The following attestation bundles were made for pysec2pri-0.0.2.tar.gz:
Publisher:
create-release.yml on jmillanacosta/pysec2pri
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pysec2pri-0.0.2.tar.gz -
Subject digest:
26ee8a96c80dbd3631990100125f2ad1ff1a1e8d1825bcacdc4e755ed0e7bf5f - Sigstore transparency entry: 1579886694
- Sigstore integration time:
-
Permalink:
jmillanacosta/pysec2pri@935561d2782d03df993057db0958b235fda6ef2f -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/jmillanacosta
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
create-release.yml@935561d2782d03df993057db0958b235fda6ef2f -
Trigger Event:
push
-
Statement type:
File details
Details for the file pysec2pri-0.0.2-py3-none-any.whl.
File metadata
- Download URL: pysec2pri-0.0.2-py3-none-any.whl
- Upload date:
- Size: 79.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0c11abf24ea3d93a95e0c96b171c7a1b3e8ae2cc00f5360a9cd82ba4cee9e3a
|
|
| MD5 |
ecdcce1b9c48f7ea1a63cc3089b6fe05
|
|
| BLAKE2b-256 |
08c603788dda68b6f5e9bf2eae7384fb8fc7facb7814ec30411130d756f63939
|
Provenance
The following attestation bundles were made for pysec2pri-0.0.2-py3-none-any.whl:
Publisher:
create-release.yml on jmillanacosta/pysec2pri
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pysec2pri-0.0.2-py3-none-any.whl -
Subject digest:
f0c11abf24ea3d93a95e0c96b171c7a1b3e8ae2cc00f5360a9cd82ba4cee9e3a - Sigstore transparency entry: 1579886843
- Sigstore integration time:
-
Permalink:
jmillanacosta/pysec2pri@935561d2782d03df993057db0958b235fda6ef2f -
Branch / Tag:
refs/tags/v0.0.2 - Owner: https://github.com/jmillanacosta
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
create-release.yml@935561d2782d03df993057db0958b235fda6ef2f -
Trigger Event:
push
-
Statement type: