A Fast Computational Tool for the Simulation and Analysis of Chromatin Loops
Project description
LoopSim
TODO:
- complete pipeline
- [?] improve performance (may be as good as can get on python -- probably just move to SLURM)
Installation
git clone https://github.com/CutaneousBioinf/hi-c
cd hi-c
python3 -m pip install .
You may receive a warning that is something like The script loopsim is not on PATH
. You have two options to resolve this.
- Option one: add the directory with the LoopSim entry point to your
$PATH
environment variable. - Option two: invoke LoopSim with
python3 -m loopsim
as opposed to justloopsim
.
Using the pipeline
LoopSim is brokeen down into a number of different commands, which are designed to be chained.
The process should look something like this:
loopsim validate
- This step validates the input data and possibly removes any erroneous data.loopsim simulate
- This produces a distribution of simulated loop files. Note that this may be a very intensive task, depending on the number of simulations you require. I recommend that anything >30 simulations be done with multiple batches, possibly as a collection of SLURM jobs.loopsim analyze
/loopsim bulk-analyze
- Usebulk-analyze
to produce summary tables with overlaps for the simulated distribution of loop files. Useanalyze
to do the same for single loop files, such as the original.loopsim visualize
- Use this to produce visualizations, summary statistics, and to perform a statistical test on the simulated distribution and the original loop file.
CLI
You can run loopsim --help
for a broad overview of each of the commands.
$ loopsim --help
Usage: python -m loopsim [OPTIONS] COMMAND [ARGS]...
For more explanation of what every command does, please see the
documentation.
Options:
--delimiter TEXT delimiter for outputted files [default: tab]
--help Show this message and exit.
Commands:
analyze Perform analysis on a single loop file
bulk-analyze Perform analysis on a distribution of loop files
simulate Generate a distribution of simulations.
validate Validate input file and output a validated version.
visualize Get visualization and stats from distribution of ratios...
You can also run loopsim <COMMAND> --help
for more detailed help messages on each of the commands.
For example, here is the help message for simulate
:
$ loopsim simulate --help
Usage: python -m loopsim simulate [OPTIONS] LOOP_IN_FILE
CHROMOSOME_REGION_FILE
SIMULATION_DATA_DIRECTORY
Generate a distribution of simulations.
NOTE: any data in SIMULATION_DATA_DIRECTORY may be overwritten!!
Options:
--num-sims INTEGER number of simulations [default: 1]
--num-processes INTEGER number of threads to use
[default: round(multiprocessing.cpu_count() / 2)]
--help Show this message and exit.
Demo
Validate
$ loopsim validate data/merged_5K_10K.loop loop_valid.loop data/chr_region_hg19
Input loop file: data/merged_5K_10K.loop
Output loop file: loop_valid.loop
Chromosome regions file: data/chr_region_hg19
Flagging loop ends that are >= 1.000000e+05
Delimiter for output: ' '
Validating loop data
Validation complete
Validated data outputted to file loop_valid.loop
Files after:
.
└── loop_out.loop
Simulate
$ loopsim simulate --num-sims 2 loop_valid.loop data/chr_region_hg19 sims/
Input loop file: loop_valid.loop
Chromosome regions file: data/chr_region_hg19
Number of simulations: 2
Number of processes: 5
Outputting simulation files to directory: sims/
Delimiter for output: ' '
Simulation 0 simulation started
Simulation 1 simulation started
Simulation 0 simulation complete
Simulation 1 simulation complete
Simulation 0 data outputted to file: sims/sim_hi-c_0.loop
Simulation 1 data outputted to file: sims/sim_hi-c_1.loop
Files after:
.
└── sims
|── sim_hi-c_0.loop
└── sim_hi-c_1.loop
Analyze
Bulk analysis:
$ loopsim bulk-analyze sims/ data/95_BCS_psor_loci ratios_out.txt --loop-out-directory loop_out_dir/
Input loop files directory: sims/
Intervals file: data/95_BCS_psor_loci
Ratio distribution file: ratios_out.txt
Delimiter for output: ' '
Output loop files directory: loop_out_dir/
Output directory does not exist.
Output directory created!
Finished outputting analyzed files to loop_out_dir/
Finished outputting ratio distribution to ratios_out.txt
Files after:
.
|── ratios_out.txt
└── loop_out_dir
|── summary_table_0.loop
└── summary_table_1.loop
Analysis of the original file (validated):
$ loopsim analyze loop_valid.loop loop_analyzed.loop data/95_BCS_psor_loci
Input loop file: loop_valid.loop
Output loop file: loop_analyzed.loop
Intervals file: data/95_BCS_psor_loci
Delimiter for output: ' '
Outputted analyzed loop file to loop_analyzed.loop
Ratio of overlapping intervals out of the total number of loops was: 0.034299968818210166
Files after (though we don't use loop_analyzed.loop
in the pipeline again):
.
└── loop_analyzed.loop
Visualize
$ loopsim visualize ratios_out.txt dist_plot.jpg --other 0.034299968818210166
Obtaining overlapping ratios from: ratios_out.txt.
Exported plot to dist_plot.jpg
Summary stats:
Distribution mean: 0.0178775595052489
Distribution std: 0.000808458018194828
Distribution min: 0.0173058933582787
Distribution median: 0.0178775595052489
Distribution max: 0.0184492256522191
Calculating p-value based on empirical distribution:
p-value: 0.0
Calculating p-value based on normal distribution:
p-value: 0.0
Note: $p = 0$ is probably an artifact of the simulation being $N = 2$.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file loopsim-0.3.0.tar.gz
.
File metadata
- Download URL: loopsim-0.3.0.tar.gz
- Upload date:
- Size: 592.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.9.6 Darwin/22.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15a8a1df6ff8b84f1abb2fdc45197367a09006cae1bb03b45cadf416fb3acfe7 |
|
MD5 | d7a432684d4fbf1ce5bffe9c89785fd9 |
|
BLAKE2b-256 | 59fcdc09a129b549afd67ba9ed588cddfa8925e868353566dfc487eb17406b19 |
File details
Details for the file loopsim-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: loopsim-0.3.0-py3-none-any.whl
- Upload date:
- Size: 591.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.9.6 Darwin/22.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9020bb5908eb54e159b8bcf50bc3255f2101570dba68a54d511ce1fa5951cc68 |
|
MD5 | 105ba26914ccf5b8673d22c50a687bf7 |
|
BLAKE2b-256 | 8ebdf5dc7b8ef1908f66dc81ea256b04e2cdb5ebb627ee177298277c6afc09be |