Skip to main content

Creates self-contained html pages for visual variant review with IGV (igv.js).

Project description

igv-reports

igv-reports - A Python application to generate self-contained HTML reports for variant review and other genomic applications. Reports consist of a table of genomic sites and an embedded IGV genome browser for viewing data for each site. The tool extracts slices of data for each site and embeds the data as blobs in the HTML report file. The report can be opened in a web browser as a static page, with no depenency on the original input files.

Installation

Prerequisites

igv-reports requires Python 3.8 or greater.

Installing igv-reports

pip install igv-reports

igv-reports requires the package pysam version 0.22.0 or greater, which should be installed automatically. However, on OSX this sometimes fails due to missing dependent libraries. This can be fixed following the procedure below, from the pysam docs;
"The recommended way to install pysam is through conda/bioconda. This will install pysam from the bioconda channel and automatically makes sure that dependencies are installed. Also, compilation flags will be set automatically, which will potentially save a lot of trouble on OS X."

conda config --add channels r
conda config --add channels bioconda
conda install pysam

Creating a report

Reports are created with the command line script create_report, or alternatively python igv_reports/report.py. Command line arguments are described below. Although --tracks is optional, a typical report will include at least an alignment track (BAM or CRAM) file from which the variants were called.

Arguments:

  • Required

    • sites VCF, BED, MAF, BEDPE, or generic tab delimited file of genomic variant sites. Tabix indexed files are supported and strongly recommended for large files.
    • --fasta Reference fasta file; must be indexed. One of either --fasta, --twobit, or --genome is required.
    • --twobit Reference twobit sequence file.
    • --genome An igv.js genome identifier (e.g. hg38). If supplied sequence, ideogram, and the default annotation track for the specified genome will be used.*
  • The arguments begin, end, and sequence are required for a generic tab delimited sites file.

    • --begin INT. Column of start chromosomal position for sites file. Used for generic tab delimited input.
    • --end INT. Column of end chromosomal position for sites. Used for generic tab delimited input.
    • --sequence INT. Column of sequence (chromosome) name.
  • Optional coordinate system flag for generic tab delimited sites file only

    • --zero_based Specify that the position in the sites file is 0-based (e.g. UCSC files) rather than 1-based. Default is false.
  • Optional

    • --ideogram FILE. Ideogram file in UCSC cytoIdeo format. Useful when fasta is used to specify the reference.
    • --tracks LIST. Space-delimited list of track files, see below for supported formats. If both tracks and track-config are specified tracks will appear first by default.
    • --track-config FILE. File containing array of json configuration objects for igv.js tracks. See the igv.js documentation for more details. This option allows customization of track parameters. When using this option, the track url and indexURL properties should be set to the paths of the respective files.
    • --roi LIST. Space-delimited list of region-of-interest (ROI) files. See the igv.js documentation.
    • --sampleinfo LIST. Space delimited list of sample information files. See the igv.js documentation.
    • --template FILE. HTML template file.
    • --output FILE. Output file name; default="igvjs_viewer.html".
    • --info-columns LIST. Space delimited list of info field names to include in the variant table. If sites is a VCF file these are the info ID values. If sites is a tab delimited format these are column names.
    • --info-columns-prefixes LIST. For VCF based reports only. Space delimited list of prefixes of VCF info field IDs to include in the variant table. Any info field with ID starting with one of the listed values will be included.
    • --samples LIST. Space delimited list of sample (i.e. genotypes) names. Used in conjunction with _ --sample-columns_.
    • --sample-columns LIST. Space delimited list of VCF sample FORMAT field names to include in the variant table. If --samples is specified columns will be restricted to those samples, otherwise all samples will be included.
    • --flanking INT. Genomic region to include either side of variant; default=1000.
    • --standalone Embed all JavaScript referenced via <script> tags in the page.
    • --sort Applies to alignment tracks only. If specified alignments are initally sorted by the specified option. Supported values include BASE, STRAND, INSERT_SIZE, MATE_CHR, and NONE. Default value is BASE for single nucleotide variants, NONE (no sorting) otherwise. See the igv.js documentation for more information.
    • --exclude-flags INT. Value is passed to samtools as "-F" flag. Used to filter alignments. Default value is 1536 which filters alignments marked "duplicate" or "vendor failed". To include all alignments use --exclude-flags 0. See samtools documentation for more details.
    • --idlink URL tempate for information link for VCF ID values. The token $$ will be substituted with the ID value. Example: --idlink 'https://www.ncbi.nlm.nih.gov/snp/?term=$$'
    • --no-embed Don't embed data. Fasta and track URLs are referenced unchanged. The resulting report is dependent on the original data files, which must be specified as URLs. Local files are not supported with this option.
    • --subsample FLOAT. Output only a portion of input alignments (0.0 -> 1.0). See samtools view documentation for more details
    • --maxlen INT. Maximum length of a variant (SV) to show in a single view. Variants exceeding this length will be shown in a split-screen (multilocus) view. default = 10000
    • --translate-sequence-track Three-frame Translate sequence track
    • --tabulator Use the tabulator template for the table
    • --filter-config YAML config file for column setup for tabulator.
    • --merge-overlaps Merge overlapping intervals in multi-locus files such as bedpe. If set bedpe features with overlapping regions will be presented in a single locus view. Default is false.
    • --title STRING. Title for the report. Default is "IGV Report".
    • --header STRING. Path to a HTML file to be included in the report. The header file content will be included directly after the <body> tag in the report HTML file.
    • --footer STRING. Path to a HTML file to be included in the report. The footer file content will be included directly before the </body> tag in the report HTML file.

Track file formats:

Currently supported track file formats are BAM, CRAM, VCF, BED, GFF3, GTF, WIG, and BEDGRAPH. FASTA. BAM, CRAM, and VCF files must be indexed. Tabix is supported and it is recommended that all large files be indexed.

Example

The script below creates a variant report from a VCF file and an alignment (BAM) file. Five info fields from the VCF are specified for inclusion in the variant table. The report is created for the hg38 genome and given a custom title.

create_report test/data/variants/variants.vcf.gz \
--genome hg38 \
--info-columns GENE TISSUE TUMOR COSMIC_ID GENE SOMATIC \
--tracks test/data/variants/variants.vcf.gz test/data/variants/recalibrated.bam \
--title "IGV Variant Inspector"
--output example_vcf.html

See the examples page for more examples.

`

Converting genomic files to data URIs for use in igv.js

The script create_datauri (python igv_reports/datauri.py) converts the contents of a file to a data uri for use in igv.js. The datauri will be printed to stdout. NOTE It is not neccessary to run this script explicitly to create a report, it is documented here for use with stand-alone igv.js.

Convert a gzipped vcf file to a datauri.

create_datauri test/data/variants/variants.vcf.gz

Convert a slice of a local bam file to a datauri.

create_datauri --region chr5:474,969-475,009 test/data/variants/recalibrated.bam 

Convert a remote bam file to a datauri.

create_datauri --region chr5:474,969-475,009 https://1000genomes.s3.amazonaws.com/phase3/data/NA12878/alignment/NA12878.mapped.ILLUMINA.bwa.CEU.low_coverage.20121211.bam

Release Notes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

igv_reports-1.16.1.tar.gz (47.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

igv_reports-1.16.1-py3-none-any.whl (56.6 kB view details)

Uploaded Python 3

File details

Details for the file igv_reports-1.16.1.tar.gz.

File metadata

  • Download URL: igv_reports-1.16.1.tar.gz
  • Upload date:
  • Size: 47.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.5

File hashes

Hashes for igv_reports-1.16.1.tar.gz
Algorithm Hash digest
SHA256 c3888bfde00a4085e2cc469e58b7b50194f859c3484373d53e0691f088d889ea
MD5 c1a203494adb43b015a7b41b08639d57
BLAKE2b-256 dd7c5710a90367ee5da439a3c5824d670ff06959633d1ffc1dafd940da2f0dbe

See more details on using hashes here.

File details

Details for the file igv_reports-1.16.1-py3-none-any.whl.

File metadata

  • Download URL: igv_reports-1.16.1-py3-none-any.whl
  • Upload date:
  • Size: 56.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.5

File hashes

Hashes for igv_reports-1.16.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ed8c3bcf541c749db285625c135eaf80bcfc073c96fd3cc15bcd2413f01631e0
MD5 4cfe76710c3d3ceeafb5963db77b37bd
BLAKE2b-256 87e33b3dc66942f2e74b6aa0b737d96572b97c71f0e38c0029aeeb736e78baf8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page