Skip to main content

Secondary to primary identifier mapping

Project description

pySec2Pri

Tests PyPI PyPI - Python Version PyPI - License Documentation Status

Create and use mapping files for secondary (retired/withdrawn) biological database identifiers and symbols to primary (current) identifiers and symbols.

Outputs mappings in SSSOM format by default. Subjects are secondary, objects are primary.

Installation

uv pip install pysec2pri

Or install from source:

uv pip install git+https://github.com/jmillanacosta/pysec2pri.git

Quick Start

Generating mapping sets

To obtain the secondary to primary identifier SSSOM mapping set for ChEBI:

pysec2pri chebi

This will automatically download the latest ChEBI release and generate an SSSOM mapping file in your current directory.

To process locally and specify the output:

pysec2pri chebi ChEBI_complete_3star.sdf --output my_mappings.sssom.tsv

For more options and help on any command:

pysec2pri --help
pysec2pri chebi --help

The default output is in SSSOM (Simple Standard for Sharing Ontology Mappings) TSV format.

Updating IDs and symbols

A generated mapping set can be used to update IDs and symbols in Python:

from pysec2pri import generate_chebi_synonyms, resolve_symbols
cs = generate_chebi_synonyms()
resolve_symbols(["Glucose", "ATP", "Guanine"], cs)

Or from the command line, given a TSV file gene_ex.tsv:

gene	data
HGNC:131	3.5

Resolve the gene column to primary HGNC IDs (a new _primary column is added):

pysec2pri update-ids gene_ex.tsv hgnc --at gene -o gene_ex_primary.tsv
# gene        data    gene_primary
# HGNC:131    3.5     HGNC:145

The same pattern works for symbols with update-symbols, and multiple columns can be resolved by repeating --at:

pysec2pri update-ids data.tsv hgnc --at gene_id --at related_gene_id

To skip regenerating the mapping set, pass a pre-built mapping file:

pysec2pri hgnc ids  # outputs hgnc_{version}_sssom.tsv
pysec2pri update-ids gene_ex.tsv hgnc --at gene --mapping hgnc_{version}_sssom.tsv

Documentation

Full documentation: https://pysec2pri.readthedocs.io/

Supported Databases

Datasource license citation
ChEBI CC BY 4.0. Hastings J, Owen G, Dekker A, et al. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Research. 2016 Jan;44(D1):D1214-9. DOI: 10.1093/nar/gkv1031. PMID: 26467479; PMCID: PMC4702775.
HMDB CC0 Wishart DS, Guo A, Oler E, Wang F, Anjum A, Peters H, Dizon R, Sayeeda Z, Tian S, Lee BL, Berjanskii M, Mah R, Yamamoto M, Jovel J, Torres-Calzada C, Hiebert-Giesbrecht M, Lui VW, Varshavi D, Varshavi D, Allen D, Arndt D, Khetarpal N, Sivakumaran A, Harford K, Sanford S, Yee K, Cao X, Budinski Z, Liigand J, Zhang L, Zheng J, Mandal R, Karu N, Dambrova M, Schiöth HB, Greiner R, Gautam V. HMDB 5.0: the Human Metabolome Database for 2022. Nucleic Acids Res. 2022 Jan 7;50(D1):D622-D631. doi: 10.1093/nar/gkab1062. PMID: 34986597; PMCID: PMC8728138.
HGNC link Seal RL, Braschi B, Gray K, Jones TEM, Tweedie S, Haim-Vilmovsky L, Bruford EA. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res. 2023 Jan 6;51(D1):D1003-D1009. doi: 10.1093/nar/gkac888. PMID: 36243972; PMCID: PMC9825485.
NCBI link Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, Connor R, Funk K, Kelly C, Kim S, Madej T, Marchler-Bauer A, Lanczycki C, Lathrop S, Lu Z, Thibaud-Nissen F, Murphy T, Phan L, Skripchenko Y, Tse T, Wang J, Williams R, Trawick BW, Pruitt KD, Sherry ST. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2022 Jan 7;50(D1):D20-D26. doi: 10.1093/nar/gkab1112. PMID: 34850941; PMCID: PMC8728269.
UniProt CC BY 4.0 UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100. PMID: 33237286; PMCID: PMC7778908.
Wikidata Vrandecic, D., Krotzsch, M. Wikidata: a free collaborative knowledgebase. Communications of the ACM. 2014. doi: 10.1145/2629489.

License

MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysec2pri-0.0.2.tar.gz (60.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysec2pri-0.0.2-py3-none-any.whl (79.3 kB view details)

Uploaded Python 3

File details

Details for the file pysec2pri-0.0.2.tar.gz.

File metadata

  • Download URL: pysec2pri-0.0.2.tar.gz
  • Upload date:
  • Size: 60.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pysec2pri-0.0.2.tar.gz
Algorithm Hash digest
SHA256 26ee8a96c80dbd3631990100125f2ad1ff1a1e8d1825bcacdc4e755ed0e7bf5f
MD5 2cf5ed42dee5d9fb1c2c458ad2c72385
BLAKE2b-256 5448c2f5ef40acd45a3740a403d6a65e67f8c7129326dfb4d0f9c8bec5ceed8b

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysec2pri-0.0.2.tar.gz:

Publisher: create-release.yml on jmillanacosta/pysec2pri

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pysec2pri-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: pysec2pri-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 79.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pysec2pri-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f0c11abf24ea3d93a95e0c96b171c7a1b3e8ae2cc00f5360a9cd82ba4cee9e3a
MD5 ecdcce1b9c48f7ea1a63cc3089b6fe05
BLAKE2b-256 08c603788dda68b6f5e9bf2eae7384fb8fc7facb7814ec30411130d756f63939

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysec2pri-0.0.2-py3-none-any.whl:

Publisher: create-release.yml on jmillanacosta/pysec2pri

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page