General purpose stuff to generate and handle Hi-C data in its simplest form.

These details have been verified by PyPI

Maintainers

AmauryB baudrly cmdoret koszullab TheoFR

These details have not been verified by PyPI

Project links

Homepage

Project description

hicstuff

PyPI - Python Version

A lightweight library that generates and handles Hi-C contact maps in either cooler-compatible 2Dbedgraph or instaGRAAL format. It is essentially a merge of the yahcp pipeline, the hicstuff library and extra features illustrated in the 3C tutorial and the DADE pipeline, all packaged together for extra convenience.

The goal is to make generation and manipulation of Hi-C matrices as simple as possible and work for any organism.

Installation
Usage
- Full pipeline
- Individual components
Library
Connecting the modules
File formats

Installation

To install a stable version:

   pip3 install -U hicstuff

or, for the latest development version:

    pip3 install -e git+https://github.com/koszullab/hicstuff.git@master#egg=hicstuff

Usage

Full pipeline

All components of the pipelines can be run at once using the hicstuff pipeline command. This allows to generate a contact matrix from reads in a single command. By default, the output sparse matrix is in GRAAL format, but it can be a 2D bedgraph file if required. More detailed documentation can be found on the readthedocs website: https://hicstuff.readthedocs.io/en/latest/index.html

usage:
    pipeline [--quality-min=INT] [--size=INT] [--no-cleanup] [--start-stage=STAGE]
             [--threads=INT] [--aligner=bowtie2] [--matfmt=FMT] [--prefix=PREFIX]
             [--tmpdir=DIR] [--iterative] [--outdir=DIR] [--filter] [--enzyme=ENZ]
             [--plot] [--circular] [--distance_law] [--duplicates]
             [--centromeres=FILE] --genome=FILE <input1> [<input2>]

arguments:
    input1:             Forward fastq file, if start_stage is "fastq", sam
                        file for aligned forward reads if start_stage is
                        "sam", or a .pairs file if start_stage is "pairs".
    input2:             Reverse fastq file, if start_stage is "fastq", sam
                        file for aligned reverse reads if start_stage is
                        "sam", or nothing if start_stage is "pairs".

For example, to run the pipeline with minimap2 using 8 threads and generate a matrix in instagraal format in the directory out:

hicstuff pipeline -t 8 -a minimap2 -e DpnII -o out/ -g genome.fa reads_for.fq reads_rev.fq

The pipeline can also be run from python, using the hicstuff.pipeline submodule. For example, this would run the pipeline with bowtie2 (default) using iterative alignment and keep all intermediate files.

from hicstuff import pipeline as hpi

hpi.full_pipeline(
    'genome.fa', 
    'end1.fq', 
    'end2.fq', 
    no_cleanup=True
    iterative=True
    out_dir='out', 
    enzyme="DpnII")

The general steps of the pipeline are as follows:

Pipeline flowchart

Individual components

For more advanced usage, different scripts can be used independently on the command line to perform individual parts of the pipeline. This readme contains quick descriptions and example usages. To obtain detailed instructions on any subcommand, one can use hicstuff <subcommand> --help.

Iterative alignment

Truncate reads from a fastq file to 20 basepairs and iteratively extend and re-align the unmapped reads to optimize the proportion of uniquely aligned reads in a 3C library.

usage:
    iteralign [--aligner=bowtie2] [--threads=1] [--min_len=20]
              [--tempdir DIR] --out_sam=FILE --genome=FILE <reads.fq>

Digestion of the genome

Digests a fasta file into fragments based on a restriction enzyme or a fixed chunk size. Generates two output files into the target directory named "info_contigs.txt" and "fragments_list.txt"

usage:
    digest [--plot] [--figdir=FILE] [--circular] [--size=INT]
           [--outdir=DIR] --enzyme=ENZ <fasta>

For example, to digest the yeast genome with MaeII and HinfI and show histogram of fragment lengths:

hicstuff digest --plot --outdir output_dir --enzyme MaeII,HinfI Sc_ref.fa

Filtering of 3C events

Filters spurious 3C events such as loops and uncuts from the library based on a minimum distance threshold automatically estimated from the library by default. Can also plot 3C library statistics. This module takes a pairs file with 9 columns as input (readID, chr1, pos1, chr2, pos2, strand1, strand2, frag1, frag2) and filters it.

usage:
    filter [--interactive | --thresholds INT-INT] [--plot]
           [--figdir FILE] <input.pairs> <output.pairs>

Viewing the contact map

Visualize a Hi-C matrix file as a heatmap of contact frequencies. Allows to tune visualisation by binning and normalizing the matrix, and to save the output image to disk. If no output is specified, the output is displayed interactively. If two contact maps are provided, the log ratio of the first divided by the second will be shown.

usage:
    view [--binning=1] [--despeckle] [--frags FILE] [--trim INT]
         [--normalize] [--max=99] [--output=IMG] [--cmap=CMAP]
         [--log] [--region=STR] <contact_map> [<contact_map2>]

For example, to view a 1Mb region of chromosome 1 from a full genome Hi-C matrix rebinned at 10kb:

    hicstuff view --normalize --binning 10kb --region chr1:10,000,000-11,000,000 --frags fragments_list.txt contact_map.tsv

Library

All components of the hicstuff program can be used as python modules. See the documentation on reathedocs. The expected contact map format for the library is a simple CSV file, and the objects handled by the library are simple numpy arrays. The various submodules of hicstuff contain various utilities.

import hicstuff.digest # Functions to work with restriction fragments
import hicstuff.iteralign # Functions related to iterative alignment
import hicstuff.hicstuff # Contains utilities to modify and operate on contact maps as numpy arrays
import hicstuff.filter # Functions for filtering 3C events by type (uncut, loop)
import hicstuff.view # Utilities to visualise contact maps
import hicstuff.io # Reading and writing hicstuff files
import hicstuff.pipeline # Generation and processing of files to generate matrices.

Connecting the modules

All the steps described here are handled automatically when running the hicstuff pipeline. But if you want to connect the different modules manually, the intermediate input and output files must be processed using light python scripting.

Aligning the reads

You can generate SAM files independently using your favorite read mapping software, use the command line utility hicstuff iteralign, or use the helper function align_reads in the submodule hicstuff.pipeline. For example, to perform iterative alignment using minimap2 (instead of bowtie2 by default):

Using the python function:

from hicstuff import pipeline as hpi

hpi.align_reads("end1.fastq", "genome.fasta", "end1.sam", iterative=True, minimap2=True)

Using the command line tool:

hicstuff iteralign --minimap2 --iterative -f genome.fasta -o end1.sam end1.fastq

Extracting contacts from the alignment

The output from hicstuff iteralign is a SAM file. In order to retrieve Hi-C pairs, you need to run iteralign separately on the two fastq files and process the resulting alignment files as follows using the pipeline submodules of hicstuff.

from hicstuff import pipeline as hpi
import pysam as ps
# Sort alignments by read names
ps.sort("-n", "-O", "SAM", "-o", "end1.sam.sorted", "end1.sam")
ps.sort("-n", "-O", "SAM", "-o", "end2.sam.sorted", "end2.sam")
# Combine SAM files
hpi.sam2pairs("end1.sorted.sam", "end2.sorted.sam", "output.pairs", "info_contigs.txt", min_qual=30)

This will generate a "pairs" file containing all read pairs where both reads have been aligned with a mapping quality of at least 30.

Attributing each read to a restriction fragment

To build a a contact matrix, we need to attribute each read to a fragment in the genome. This is done under the hood by performing a binary search for each read position against the list of restriction sites in the genome.

from hicstuff import digest as hcd
from Bio import SeqIO

# Build a list of restriction sites for each chromosome
restrict_table = {}
for record in SeqIO.parse("genome.fasta", "fasta"):
    # Get chromosome restriction table
    restrict_table[record.id] = hcd.get_restriction_table(
        record.seq, enzyme, circular=circular
    )

# Add fragment index to pairs (readID, chr1, pos1, chr2,
# pos2, strand1, strand2, frag1, frag2)
hcd.attribute_fragments("output.pairs", "output_indexed.pairs", restrict_table)

Filtering pairs

The resulting pairs file can then be filtered, either in the command line using the hicstuff filter command, or in python using the hicstuff.filter submodule. Otherwise, the matrix can be built directly from the unfiltered pairs.

Filtering on the command line:

hicstuff filter output_indexed.pairs output_filtered.pairs

Filtering in python:

from hicstuff import filter as hcf

uncut_thr, loop_thr = hcf.get_thresholds("output_indexed.pairs")
hcf.filter_events("output_indexed.pairs", "output_filtered.pairs", uncut_thr, loop_thr)

Note that both the command and the python function have various options to generate figure or tweak the filtering thresholds. These options can be displayed using hicstuff filter -h

Matrix generation

A Hi-C sparse contact matrix can then be generated using the python submodule hicstuff.pipeline. The matrix can be generated either in GRAAL format, or bedgraph2 (WIP) for cooler compatibility.

from hicstuff import pipeline as hpi

n_frags = sum(1 for line in open(fragments_list, "r")) - 1
hpi.pairs2matrix("output_filtered.pairs", "abs_fragments_contacts_weighted.txt", 'fragments_list.txt', mat_fmt="GRAAL")

File formats

pairs files: This format is used for all intermediate files in the pipeline and is also used by hicstuff filter. It is a space-separated format holding informations about Hi-C pairs. It has an official specification defined by the 4D Nucleome data coordination and integration center.
2D bedgraph: This is an optional output format of hicstuff pipeline for the sparse matrix. It has two fragment per line, and the number of times they are found together. It has the following fields: chr1, start1, end1, chr2, start2, end2, occurences
- Those files can be loaded by cooler using cooler load -f bg2 <chrom.sizes>:<binsize> in.bg2.gz out.cool where chrom.sizes is a tab delimited file with chromosome names and length on each line, and binsize is the size of bins in the matrix.
GRAAL sparse matrix: This is a simple tab-separated file with 3 columns: frag1, frag2, contacts. The id columns correspond to the absolute id of the restriction fragments (0-indexed). The first row is a header containing the number of rows, number of columns and number of nonzero entries in the matrix. Example:

564	564	6978
0	0	3
1	2	4
1	3	3

fragments_list.txt: This tab separated file provides information about restriction fragments positions, size and GC content. Note the coordinates are 0 point basepairs, unlike the pairs format, which has 1 point basepairs. Example:
- id: 1 based restriction fragment index within chromosome.
- chrom: Chromosome identifier. Order should be the same as in info_contigs.txt or pairs files.
- start_pos: 0-based start of fragment, in base pairs.
- end_pos: 0-based end of fragment, in base pairs.
- size: Size of fragment, in base pairs.
- gc_content: Proportion of G and C nucleotide in the fragment.

id	chrom	start_pos	end_pos	size	gc_content
1	seq1	0	21	21	0.5238095238095238
2	seq1	21	80	59	0.576271186440678
3	seq1	80	328	248	0.5201612903225806

info_contigs.txt: This tab separated file gives information on contigs, such as number of restriction fragments and size. Example:
- contig: Chromosome identified. Order should be the same in pairs files or fragments_list.txt.
- length: Chromosome length, in base pairs.
- n_frags: Number of restriction fragments in chromosome.
- cumul_length: Cumulative length of previous chromosome, in base pairs.

contig	length	n_frags	cumul_length
seq1	60000	409	0
seq2	20000	155	409

Project details

These details have been verified by PyPI

Maintainers

AmauryB baudrly cmdoret koszullab TheoFR

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

3.2.4

Nov 19, 2024

3.2.3

May 14, 2024

3.2.2

Feb 15, 2024

3.2.1

Sep 6, 2023

3.2.0

Aug 31, 2023

3.1.7

Aug 24, 2023

3.1.6

Jun 8, 2023

3.1.5

Oct 6, 2022

3.1.4

Jul 8, 2022

3.1.3

Apr 15, 2022

3.1.2

Jan 18, 2022

3.1.1

Oct 15, 2021

3.1.0

Aug 24, 2021

3.0.3

Jul 26, 2021

3.0.2

Jun 1, 2021

3.0.1

Apr 27, 2021

3.0.0

Apr 19, 2021

2.3.3

Jan 18, 2021

2.3.2

Oct 9, 2020

2.3.1

Oct 5, 2020

2.3.0

Mar 16, 2020

2.2.2

Jan 23, 2020

2.2.1

Jan 21, 2020

2.2.0

Jan 20, 2020

2.1.3

Jan 20, 2020

2.1.2

Jan 9, 2020

2.1.0

Dec 20, 2019

2.0.8

Dec 10, 2019

2.0.7

Dec 10, 2019

2.0.6

Dec 10, 2019

2.0.5

Dec 10, 2019

2.0.4

Dec 9, 2019

2.0.3

Dec 4, 2019

2.0.2

Nov 29, 2019

2.0.1

Nov 28, 2019

2.0.0

Nov 27, 2019

1.6.9

Nov 21, 2019

1.6.8

Nov 6, 2019

1.6.7

Oct 31, 2019

1.6.6

Oct 24, 2019

1.6.5

Oct 24, 2019

1.6.4

Oct 23, 2019

1.5.14

Oct 3, 2019

1.5.11

Oct 2, 2019

1.5.9

Sep 24, 2019

1.5.8

Sep 10, 2019

1.5.7

Sep 4, 2019

1.5.6

Aug 22, 2019

1.5.5

Aug 21, 2019

1.5.2

Aug 7, 2019

1.5.1

Aug 6, 2019

1.5.0

Aug 5, 2019

1.4.10

Jul 29, 2019

1.4.9

Jul 29, 2019

1.4.7

Jul 17, 2019

1.4.5

Jul 16, 2019

1.4.4

Jul 4, 2019

1.4.2

Jun 12, 2019

1.4.0

Jun 12, 2019

1.3.0

May 28, 2019

1.2.28

May 16, 2019

1.2.27

May 16, 2019

1.2.26

May 7, 2019

1.2.25

May 7, 2019

This version

1.2.24

May 3, 2019

1.2.23

May 3, 2019

1.2.22

May 2, 2019

1.2.21

May 2, 2019

1.2.20

May 2, 2019

1.2.19

Apr 30, 2019

1.2.18

Apr 29, 2019

1.2.17

Apr 24, 2019

1.2.16

Apr 18, 2019

1.2.15

Apr 18, 2019

1.2.14

Apr 16, 2019

1.2.13

Apr 5, 2019

1.2.12

Apr 5, 2019

1.2.11

Apr 1, 2019

1.2.10

Mar 31, 2019

1.2.9

Mar 30, 2019

1.2.8

Mar 29, 2019

1.2.7

Mar 29, 2019

1.2.6

Mar 29, 2019

1.1.6

Mar 29, 2019

1.0.6

Mar 11, 2019

1.0.5

Mar 6, 2019

1.0.4

Mar 5, 2019

1.0.3

Mar 4, 2019

1.0.2

Mar 3, 2019

1.0.1

Feb 28, 2019

1.0.0

Feb 20, 2019

0.2.18

Feb 14, 2019

0.2.17

Feb 13, 2019

0.2.16

Feb 13, 2019

0.2.15

Feb 12, 2019

0.2.14

Feb 6, 2019

0.2.13

Feb 6, 2019

0.2.12

Feb 6, 2019

0.2.11

Feb 6, 2019

0.2.10

Feb 6, 2019

0.2.9

Feb 6, 2019

0.2.8

Feb 4, 2019

0.2.7

Feb 4, 2019

0.2.6

Jan 31, 2019

0.2.5

Jan 30, 2019

0.2.4

Jan 29, 2019

0.2.3

Jan 28, 2019

0.2.2

Jan 28, 2019

0.2.1

Jan 25, 2019

0.1.21

Jan 24, 2019

0.1.20

Jan 24, 2019

0.1.19

Jan 24, 2019

0.1.18

Jan 23, 2019

0.1.17

Jan 18, 2019

0.1.16

Jan 18, 2019

0.1.15

Jan 17, 2019

0.1.14

Jan 16, 2019

0.1.13

Jan 16, 2019

0.1.12

Jan 16, 2019

0.1.11

Jan 11, 2019

0.1.10

Jan 11, 2019

0.1.9

Jan 8, 2019

0.1.8

Jan 8, 2019

0.1.7

Jan 8, 2019

0.1.6

Jan 8, 2019

0.1.5

Jan 3, 2019

0.1.4

Dec 20, 2018

0.1.3

Dec 11, 2018

0.1.2

Dec 6, 2018

0.1.1

Dec 6, 2018

0.1.0

Dec 6, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hicstuff-1.2.24.tar.gz (73.0 kB view details)

Uploaded May 3, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hicstuff-1.2.24-py3-none-any.whl (98.5 kB view details)

Uploaded May 3, 2019 Python 3

File details

Details for the file hicstuff-1.2.24.tar.gz.

File metadata

Download URL: hicstuff-1.2.24.tar.gz
Upload date: May 3, 2019
Size: 73.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.6.8

File hashes

Hashes for hicstuff-1.2.24.tar.gz
Algorithm	Hash digest
SHA256	`53eafb270c22ff098c58877e18e81ae68b8480b9f1a4bd4f4cf841b93c2160aa`
MD5	`d6573afc4ff2216ffde84184ec281677`
BLAKE2b-256	`7d49c70a7bfb82da7b862669d8e705546daa168da5a8961e2b8fc1827502e8a5`

See more details on using hashes here.

File details

Details for the file hicstuff-1.2.24-py3-none-any.whl.

File metadata

Download URL: hicstuff-1.2.24-py3-none-any.whl
Upload date: May 3, 2019
Size: 98.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.6.8

File hashes

Hashes for hicstuff-1.2.24-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7663ba90282cc3eaf562cc3cbff69ceafd81b412dca6806d5e25b28d5d116265`
MD5	`9cfb9427348539c5c953630e3d1043a7`
BLAKE2b-256	`5a3a0267ca14e6b50eaec98c70717c78aa46d0522df57371e52fefa0ecb58d6b`

See more details on using hashes here.

hicstuff 1.2.24

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

hicstuff

Table of contents

Installation

Usage

Full pipeline

Individual components

Iterative alignment

Digestion of the genome

Filtering of 3C events

Viewing the contact map

Library

Connecting the modules

Aligning the reads

Extracting contacts from the alignment

Attributing each read to a restriction fragment

Filtering pairs

Matrix generation

File formats

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes