Skip to main content

Compute genome signatures

Project description

bgsignature is a package used to compute signatures.

The most basic type of computation is the computation of the counts of the different k-mers (e.g. 3 or 5). This computation can be done for a set of mutations, for a set of regions or for a set of mutation that fall within certain regions.

bgsignature consists of 3 tools:

  • count: count different k-mers

  • frequency: divide the counts by the total counts

  • normalize: divide the counts by counts obtained separately and normalize the results.

Advanced features include:

  • ability to group the counts (e.g. group mutations by sample)

  • normalize the counts by the context taken from a regions file

  • collapse (add together) reverse complementary sequences

Installation

This project is a Python package and can be installed with pip. Download the source code, get into this project directory and execute:

pip install .

Usage

Command line interface

The 3 tools can be called using

  • bgsignature count

  • bgsignature frequency

  • bgsignature normalize

Some examples:

  • getting help:

    bgsignature -h
    bgsignature frequency -h
  • count triplets in mutation that fall in certain regions using hg38:

    bgsignature count -m my/muts/file -r my/regions/file
    -g hg38 -o my/output.json --cores 4

Python

Alternative, the command line options have an equivalent in Python:

from bgsignature import count, relative_frequency, normalize

that accept similar parameters except the output. The return object can be used as a dictionary.

If you already have your files loaded in Python you can use directly count function in the corresponding module. E.g.:

from bgsignature.count import mutation
mutation.count(mutations, 'hg38', 3)

In addition, you can also use the the “low-level” functions that do the count (count_all and count_group) which are much simple and do not perform any kind of parallelization. E.g.:

from bgsignature.count import mutation
mutation.count_all(mutations, 'hg38', 3)
# or to group mutations by sample
mutation.count_group(mutations, 'hg38', 3, 'SAMPLE')

The return object can be normalized to 1, using the sum1() method or divided by some normalization counts using the normalize() method.

Important

There are some behavioural characteristics that must be taken into account:

  • bgsignature filters out mutations whose reference nucleotide (as provided in the file), and the corresponding one in the reference genome do not match.

  • when using the collapse option (enabled by default), bgsignature does not remove one of the collapsed sequences but keeps both. This means that you need to manually remove the ones you are not interested in.

  • when using bgsignature.count.mutation.count or bgsignature.count.region.count function and a number of cores for paralelization, the chunk parameter must be selected adequately, as a it can have a huge impact on performance.

File formats

Mutations file

Tab separated file (can be compressed into gz, bgz or xz formats) with a header and at least these columns: CHROMOSOME, POSITION, REF, ALT. In addition, SAMPLE, CANCER_TYPE and SIGNATURE are optional columns that can be used for grouping the signature.

Regions file

Tab separated file (can be compressed into gz, bgz or xz formats) with a header and at least these columns: CHROMOSOME, START, END, ELEMENT. In addition, SYMBOL, and SEGMENT are optional columns that can be used for grouping the signature.

Support

If you are having issues, please let us know. You can contact us at: bbglab@irbbarcelona.org

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bgsignature-0.2.tar.gz (11.7 kB view details)

Uploaded Source

File details

Details for the file bgsignature-0.2.tar.gz.

File metadata

  • Download URL: bgsignature-0.2.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.8

File hashes

Hashes for bgsignature-0.2.tar.gz
Algorithm Hash digest
SHA256 dc25683856e1c65d409ac288b7f8c252582c8660036373efce8728622697751d
MD5 75a4a63e8a9a6a5ebcc1dad320d223bb
BLAKE2b-256 11522121e332f2582edbb82732857ec5d3af12407d8d98a9e4e7390dffd8d6e1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page