Skip to main content

A toolkit for accelerating genomics using index files.

Project description

IndexTools

Common index formats, such as BAM Index (BAI) and Tabix (TBI), contain coarse-grained information on the density of NGS reads along the genome that may be leveraged for rapid approximation of read depth-based metrics. IndexTools is a toolkit for extremely fast NGS analysis based on index files.

Installation

Pre-requisites

  • Python 3.6+

Pip

pip install indextools

From source

  • Clone the repository
    git clone https://github.com/dnanexus/IndexTools.git
    
  • You'll need several tools to run the full install and release process
  • Then
    # Install locally
    $ make install
    # Release new version (if you are a maintainer)
    $ make release version=<new version> token=<GitHub API Token>
    

Commands

Partition

The partition command processes a BAM index file and generates a file in BED format that contains intervals that are roughly equal in "volume." This partition BAM file can be used for more efficient parallelization of secondary analysis tools (as opposed to parallelizing by chromosome or by uniform windows).

# Generate a BED with 10 partitions
indextools partition -I tests/data/small.bam.bai \
  -z tests/data/contig_sizes.txt \
  -n 10 \
  -o small.partitions.bed

Limitations

IndexTools is under active development. Please see the issue tracker and road map to view upcoming features.

Some of the most commonly requested features that are not yet available are:

  • Support for CRAM files and CRAM indexes (.crai).
  • Support for non-local files, via URIs.

Development

We welcome contributions from the community. Please see the developer README for details.

Contributors are required to abide by our Code of Conduct.

Technical details

Volume

In a bioinformatics context, the term “size” is overloaded. It is used to refer to the linear size of a genomic region (number of bp), disk size (number of bytes), or number of features (e.g. read count). IndexTools estimates the approximate number of bytes required to store the uncompressed data of features within a given genomic region. To avoid confusion or conflation with any of the meanings of “size,” we chose instead to use the term “volume” to refer to the approximate size (in bytes) of a given genomic region. It is almost never important or useful to be able to interpret the meaning of a given volume, nor can volume be meaningfully translated to other units; volume is primarily useful as a relative measure. Thus, we use the made-up unit “V” when referring to any specific volume.

License

IndexTools is Copyright (c) 2019 DNAnexus, Inc.; and is made available under the MIT License.

IndexTools is not an officially supported DNAnexus product. All bug reports and feature requests should be handled via the issue tracker. Please do not contact DNAnexus support regarding this software.

Acknowledgements

  • The initial inspiration for IndexTools came from @brentp's indexcov.
  • IndexTools is built on several open-source libraries; see the pyproject.toml file for a full list.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indextools-0.1.4.tar.gz (188.8 kB view details)

Uploaded Source

Built Distribution

indextools-0.1.4-cp36-cp36m-macosx_10_13_x86_64.whl (234.4 kB view details)

Uploaded CPython 3.6m macOS 10.13+ x86-64

File details

Details for the file indextools-0.1.4.tar.gz.

File metadata

  • Download URL: indextools-0.1.4.tar.gz
  • Upload date:
  • Size: 188.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.17 CPython/3.6.6 Darwin/18.6.0

File hashes

Hashes for indextools-0.1.4.tar.gz
Algorithm Hash digest
SHA256 bfba993dfb0ea2cf00b6f8abdcc5972e517ff9ae727e3da984db8e880a7240b3
MD5 39bab878a6d61ffbbbaace7d1824b7bd
BLAKE2b-256 9dc12f71471080764b76732bd22e959f7190cb65a7ab5964beb1665bfd497268

See more details on using hashes here.

File details

Details for the file indextools-0.1.4-cp36-cp36m-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for indextools-0.1.4-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 ef7e48d282050b9baa1d40de6fcd4fbb3ec574ca0ba05141c0d9b865b05e87a8
MD5 1c979caf171598248c90af988e65b923
BLAKE2b-256 5515c51f4b2305917e016dbface61a84a9ada20f8922e5ab1cace8eda1abe8f2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page