Skip to main content

suck, parse identifiers or accession numbers

Project description

\n# parseID: suck, parse identifiers or accession numbers

Introduction

parseID is a bioinformatics data structure library optimized for sucking identifiers or accession numbers into memory, parse those identifiers accession numbers to each other.

Identifiers or accession numbers are defined and referenced by various biological databases. Their number could be million size or even billion level. Some data operations, such as query or parse, are very common.

parseID employs Data structure "trie" and "ditrie". Trie could suck tremendous identifiers into memory at a time. Ditrie could suck a large number of mapping of identifiers. Through the trie and ditrie, huge data operations including insert, get, search, delete, scan etc could be quickly called.

testing

pytest -s tests

quick start

There is one example about how huge accession numbers are sucked into Trie. The mapping file could be downloaded from https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_refseq_uniprotkb_collab.gz into local space. Retrieve 176,513,729 (03/25/2024) UniProt Accession numbers from the file and feed them into Trie. Showed as the example below, accession numbers are stored in the object uniprotkb_acc_trie.

from parseid import ProcessID
infile = 'gene_refseq_uniprotkb_collab'
uniprotkb_acc_trie = ProcessID(infile).uniprotkb_protein_accession()

Retrieve pairs of NCBI protein accession number and UniProt Accession numbers from file and feed them into Ditrie. Showed as the example below, the mapping fo two accession numbers are stored in the object map_trie, which is ready for query or parsing.

from parseid import ProcessID
infile = 'gene_refseq_uniprotkb_collab'
ncbi_uniprotkb_ditrie = ProcessID(infile).map_ncbi_uniprotkb()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parseid-0.2.0.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

parseid-0.2.0-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file parseid-0.2.0.tar.gz.

File metadata

  • Download URL: parseid-0.2.0.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for parseid-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e26142713d1017c776e6cd19dfec9c5755d2309d828adc2d9b6a6c9b89596e2b
MD5 20d9857d76f13f36f03fecda5c8fd461
BLAKE2b-256 46d7213d03af70e826d71c2e97973fd76add43efad9e9eb5efce940f6a67119d

See more details on using hashes here.

File details

Details for the file parseid-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: parseid-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for parseid-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5f514763e9567d01de36338efa1b390824b83d5d87810dc5d04231ca1ef8061c
MD5 c5afefc526c19a1ee0bdc649d4ef0b7e
BLAKE2b-256 3483afd20396c8533dea61108ef1b57e6723a2d0fd6c142a157d7c684a8dbe1a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page