Skip to main content

Add your description here

Project description

hugo-unifier

This python package can unify gene symbols based on the HUGO database.

Installation

The package can be installed via pip, or any other Python package manager.

pip install hugo-unifier

Usage

The package can be used both as a command line tool and as a library.

Command Line Tool

Currently, the command line tool only supports unifying the entries of a column in an AnnData objects var attribute. The input file and column name must be passed as an argument. The tool will update the column in place and save the AnnData object to a new file.

Check the help message for more information:

hugo-unifier --help

Library

The package can be used as a library to unify gene symbols in a pandas DataFrame. The unify function takes a list of gene symbols and returns a list of unified gene symbols. The function can be used as follows:

from hugo_unifier import unify
gene_symbols = ["TP53", "BRCA1", "EGFR"]
unified_symbols = unify(gene_symbols)
print(unified_symbols)

How it works

Different datasets sometimes use different gene symbols for the same gene. Sometimes, the same gene symbol occurs with slight modifications, such as dashes, underscores, or other characters. The hugo-unifier iteratively applies attempts to manipulate the gene symbols and check them against the HUGO database.

The following manipulations are applied in the following order:

  1. identity: Use the gene symbol as is.
  2. dot-to-dash: Replace dots with dashes.
  3. discard-after-dot: Discard everything after the first dot.

More conservative manipulations are applied first. The first manipulation that returns a valid gene symbol is used.

Resolution of aliases

When resolving aliases, the following steps are applied:

  1. Remove Conflicting Aliases:
    Aliases that conflict with already approved symbols are removed. For example, if an alias maps to a symbol that is already approved, it is discarded to avoid conflicts.

  2. Correct Same Aliases:
    If an alias maps to the same symbol as its original symbol, it is corrected and marked as an approved symbol. This ensures that aliases that are effectively the same as the original symbol are treated as valid.

  3. Handle Duplicate Aliases:
    If multiple aliases map to the same original symbol:

    • By default, only one alias is retained, and the rest are discarded.
    • If the keep_gene_multiple_aliases option is enabled, all aliases are retained, and an identity mapping is created for the duplicates.
  4. Unaccepted Aliases:
    Any aliases that cannot be resolved or conflict with the above rules are marked as unaccepted and excluded from the final results.

These steps ensure that aliases are resolved in a consistent and conflict-free manner, prioritizing approved symbols and avoiding ambiguity in the mapping process.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hugo_unifier-0.1.2.tar.gz (47.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hugo_unifier-0.1.2-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file hugo_unifier-0.1.2.tar.gz.

File metadata

  • Download URL: hugo_unifier-0.1.2.tar.gz
  • Upload date:
  • Size: 47.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for hugo_unifier-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4d446557b6d8ff8e4a88048d49df103d982e312a3118c6c17ea8cf143a81d406
MD5 6044a8e31a7880e7ed33a0ede69131fd
BLAKE2b-256 ce2cabb40f20318235b9539cdbe2b26e9c928c9c637e46193472b71f03e26f14

See more details on using hashes here.

Provenance

The following attestation bundles were made for hugo_unifier-0.1.2.tar.gz:

Publisher: ci.yml on Mye-InfoBank/hugo-unifier

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hugo_unifier-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: hugo_unifier-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for hugo_unifier-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9cdb0b6c5b65f38c09cfda03f80857c1703b30cccc181d7202ec5b10d6e25ff9
MD5 c9e094875bd7ed3bd5d04f517420f9a2
BLAKE2b-256 5588d79fc552935ea9e7cf35712fd834e076c4d0a60ae293f49950f8e8ae90a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for hugo_unifier-0.1.2-py3-none-any.whl:

Publisher: ci.yml on Mye-InfoBank/hugo-unifier

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page