A package to detect IBS regions
Project description
IBSpy
Python library to identify Identical By State regions
To build the mker database for kmc and the tests run this comand:
kmc -k31 -r -ci1 -fm data/test4B.jagger.fa data/test4B.jagger.kmc_k31 tmp
Installyng IBSpy
There easiest way to install IBSpy is to use pip3.
pip3 install IBSpy
If pip3
fails, you can clone the project and compiling it with:
pip3 install cython biopython pyfaidx
python3 setup.py develop
Then you should have the IBSpy command available.
KMC3
If you want to use the KMC binder, install the KMC and compile the python instructions.
Then, run the following command to setup the path for it.
cd KMC/py_kmc_api
source set_path.sh
Preparing the databases
IBSpy requires to have a kmer database from the sequencing files. Currently two formats are supported:
- Jellyfish: Follow the instructions in its website
- kmerGWAS: Has an adhoc file format that contains only the kmers in a binary representation, sorted. This option is faster than the jellyfish version, but creating the kmer table is less straight forward. The manual is here.
Runn unit tests
To makes sure that your changes havent broken the core IBSpy, run the unit tests:
python3 setup.py test
Running IBSPy
IBSpy has relatively few options, you can look at them with the --help
command.
IBSPy --help
usage: IBSPy [-h] [-w WINDOW_SIZE] [-k KMER_SIZE] [-d DATABASE] [-r REFERENCE]
[-z] [-o OUTPUT] [-f {kmerGWAS,jellyfish}]
optional arguments:
-h, --help show this help message and exit
-w WINDOW_SIZE, --window_size WINDOW_SIZE
window size to analyze
-k KMER_SIZE, --kmer_size KMER_SIZE
Kmer size of the database
-d DATABASE, --database DATABASE
Kmer database
-r REFERENCE, --reference REFERENCE
The reference with the position of the kmers
-z, --compress When an ouput file is present, it is compressed as .gz
-o OUTPUT, --output OUTPUT
Output file. If missing, the ouptut is sent to stdout
-f {kmerGWAS,kmerGWAS_mmap,jellyfish,kmc3}, --database_format {kmerGWAS,kmerGWAS_mmap,jellyfish,kmc3}
Database format
To generate the table with the number of observed kmers and variants run the following command, using the kmer database from kmerGWAS use the following command:
IBSpy --output "kmer_windows_LineXXX.tsv.gz" -z --database kmers_with_strand --reference arinaLrFor.fa --window_size 50000 --compress --database_format kmerGWAS
For KMC3, the database is the name used while creating the database, not the filename.
Running IBSplot
Look at the IBSplot commands using --help
.
IBSPy --help
usage: IBSplot [-h] [-i IBSPY_COUNTS] [-w WINDOW_SIZE] [-f FILTER_COUNTS]
[-n N_COMPONENTS] [-c COVARIANCE_TYPE] [-s STITCH_NUMBER]
[-o OUTPUT] [-r REFERENCE] [-q QUERY] [-p PLOT_OUTPUT]
optional arguments:
-h, --help show this help message and exit
-i IBSPY_COUNTS, --IBSpy_counts IBSPY_COUNTS
tvs file genetared by IBSpy output
-w WINDOW_SIZE, --window_size WINDOW_SIZE
Windows size to count variations within
-f FILTER_COUNTS, --filter_counts FILTER_COUNTS
Filter number of variaitons above this threshold to
compute GMM model, default=None
-n N_COMPONENTS, --n_components N_COMPONENTS
Number of componenets for the GMM model, default=3
-c COVARIANCE_TYPE, --covariance_type COVARIANCE_TYPE
type of covariance used for GMM model, default="full"
-s STITCH_NUMBER, --stitch_number STITCH_NUMBER
Consecutive "outliers" in windows to stitch, default=3
-o OUTPUT, --output OUTPUT
tsv file with variations count by windows and summary
statistics
-r REFERENCE, --reference REFERENCE
genome reference name
-q QUERY, --query QUERY
query sample
-p PLOT_OUTPUT, --plot_output PLOT_OUTPUT
histograms and ascatter files in .PDF format
IBSplot uses the output table generated by IBSpy described above (e.g., "kmer_windows_LineXXX.tsv.gz"
). It can be used to count variant assigning larger windows. In the example below it is using 400,000 bp windows to compute a GMM model and generate the plots.
To generate the table with variant count categorized by the GMM model as IBS or non-IBS and generate the plots, run the following command: The description of the GMM model is here
# minimal arguments
IBSplot --IBSpy_counts "kmeribs-Wheat_Jagger-Flame.tsv.gz" --window_size 400000 --output gmm_ibs.tsv.gz --reference Jagger --query Flame --plot_output gmm_plots.pdf
In addition, you can include some or all of the following commands to tune the GMM model parameters and define the best IBS and non-IBS according to the reference and query sample used:
IBSplot --filter_counts 1000 --n_components 3 --covariance_type 'full' --stitch_number 3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.