A package to ananlyze the data generated by Hi-C Capture for ssDNA
Project description
Hi-C ssDNA: a project that analyzes fastq files generated by ssDNA Hi-C Capture developped by Piazza lab
Description
This project analyzes the sequencing data generated after the ssDNA HiC Capture protocol. This package contains two principal modules:
1. oligos_replacement
It generates a genome from the original genome and the new oligos designed in the ssDNA Hi-C Capture
protocol. The new genome is a copy of the original exepted for the oligos regions, the sequence is
replaced by the oligos sequence. Then, it adds -at the end of the new genome built-
a new artificial chromosome named chr_art
which is a concatenation of the original sequence of the oligos with their flanking regions.
Also, the program creates a .bed
file that contains the
coordinates of the oligos in the new genome and in the artificial chromosome and indicates if
the sequence is a flanking region or the oligo itself.
With the new genome created, the user can run the hicstuff package and thus creates a fragments_list
file
and a contacts
which are both tsv files (hicstuff generates them as a .txt
but they are tsv files). Please
check the hicstuff documentation for the structure of those files https://github.com/koszullab/hicstuff#file-formats.
Those two files are required with the correct format to the next module contacts_fitler.
2. contacts_filter
This module filters the contacts. It removes the contacts in which none of the fragments in the oligos
Dependencies
Python3 dependencies:
pandas
sys
getopt
Installation
The easiest way to install oligos replacement is using pip:
pip3 install hic-ssdna
Run the program
Once installed, you can run the first main script oligos_replacement
like this:
hic-ssdna.oligos_replacement <arg1> <arg2> <arg3> <arg4> <arg5>
It takes five arguments:
- The original genome path
- The oligos file path
- An output path where will be created the new genome
- An output path where will be created the
.bed
file - The lengh of the flanking region you want
The second main script contacts_filter
:
hic-ssdna.contacts_filter <arg1> <arg2> <arg3> <arg4>
It takes four arguments:
- The oligos file path
-o <oligos_input.csv>
- The fragments file path
-f <fragments_input.txt>
(produced by hicstuff) - The contacts file path
-c <contacts_input.txt>
(produced by hicstuff) - An output path where the filtered contacts file will be save
-O <output_contacts_filtered.csv>
You can call the script with the -h
argument to see
Formats and conventions
This project has to be used with the following instructions to work correctly.
Files formats
- Genomes:
fasta
- Oligos file:
csv
(with col sep = ',')
Oligo file structure
This file has to contained at least 6 columns with the precises headers below:
chr | start | end | orientation | type | name | sequence_original | sequence_modified |
---|
-
In the
chr
column, it has to be the entire line of the chromome description in the fasta file without the chevron>
-
In the
start
column, the position of the first nucleotide (included) of the oligo (the first nucleotide of the chromosome is the number 1) -
In the
end
column, the position of the last nucleotide (included) of the oligo -
In the
orientation
column,C
for Crick andW
for Watson -
In the
type
column,ss
(ssDNA HiC oligos captured),ss_neg
ssDNA negative control (ssDNA HiC oligos not captured),ds
(dsDNA HiC oligos captured),ds_neg
dsDNA negative control (dsDNA HiC oligos not captured) -
In the
name
column, write the name of the oligo, all names must be different -
In the
sequence_original
column, the original sequence of the oligo -
In the
sequence_original
column, the modified sequence of the oligo
The first oligo is the number 0.
.bed
file structure
The .bed
file generated is a bed4
.
- The first column indicates the chromosome name (the classic genome or the artificial chromosome)
- The second column contains the position of the first nucleotide (included)
- The third column contains the position of the last nucleotide (included)
- The fourth column is
oligo x
withx
the oligo number and followed byflank 5'
orflank 3'
(if the flanking sequence is in 3' or 5' side) and nothing after if the sequence is the oligo itself.
$$ x = a +b $$
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for hic_ssdna-1.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64e01c194d70c49591a802d3b9f0b9427335e2cbd531bd8da0f6976069adc7f6 |
|
MD5 | 136931f208f7a8d7328ff87a63bd6622 |
|
BLAKE2b-256 | 4c8c85ac3e1df53c4b99ac51b10707f05a7329d26abfa784469c86c1db21fa51 |