Skip to main content

A package to detect IBS regions

Project description

IBSpy

Python package Maintainability

Python library to identify Identical By State regions

To build the mker database for kmc and the tests run this comand:

kmc -k31 -r -ci1 -fm data/test4B.jagger.fa data/test4B.jagger.kmc_k31 tmp

Installyng IBSpy

There easiest way to install IBSpy is to use pip3.

pip3 install IBSpy

If pip3 fails, you can clone the project and compiling it with:

pip3 install cython biopython pyfaidx
python3 setup.py develop

Then you should have the IBSpy command available.

KMC3

If you want to use the KMC binder, install the KMC and compile the python instructions.

Then, run the following command to setup the path for it.

cd KMC/py_kmc_api
source set_path.sh 

Preparing the databases

IBSpy requires to have a kmer database from the sequencing files. Currently two formats are supported:

  1. Jellyfish: Follow the instructions in its website
  2. kmerGWAS: Has an adhoc file format that contains only the kmers in a binary representation, sorted. This option is faster than the jellyfish version, but creating the kmer table is less straight forward. The manual is here.

Runn unit tests

To makes sure that your changes havent broken the core IBSpy, run the unit tests:

python3 setup.py test

Running IBSPy

IBSpy has relatively few options, you can look at them with the --help command.

IBSPy --help
usage: IBSPy [-h] [-w WINDOW_SIZE] [-k KMER_SIZE] [-d DATABASE] [-r REFERENCE]
             [-z] [-o OUTPUT] [-f {kmerGWAS,jellyfish}]

optional arguments:
  -h, --help            show this help message and exit
  -w WINDOW_SIZE, --window_size WINDOW_SIZE
                        window size to analyze
  -k KMER_SIZE, --kmer_size KMER_SIZE
                        Kmer size of the database
  -d DATABASE, --database DATABASE
                        Kmer database
  -r REFERENCE, --reference REFERENCE
                        The reference with the position of the kmers
  -z, --compress        When an ouput file is present, it is compressed as .gz
  -o OUTPUT, --output OUTPUT
                        Output file. If missing, the ouptut is sent to stdout
  -f {kmerGWAS,kmerGWAS_mmap,jellyfish,kmc3}, --database_format {kmerGWAS,kmerGWAS_mmap,jellyfish,kmc3}
                        Database format 

To generate the table with the number of observed kmers and variants run the following command, using the kmer database from kmerGWAS use the following command:

 IBSpy --output "kmer_windows_LineXXX.tsv.gz" -z --database kmers_with_strand  --reference arinaLrFor.fa --window_size 50000 --compress --database_format kmerGWAS

For KMC3, the database is the name used while creating the database, not the filename.

Running IBSplot

Look at the IBSplot commands using --help.

IBSPy --help
usage: IBSplot [-h] [-i IBSPY_COUNTS] [-w WINDOW_SIZE] [-f FILTER_COUNTS]
               [-n N_COMPONENTS] [-c COVARIANCE_TYPE] [-s STITCH_NUMBER]
               [-o OUTPUT] [-r REFERENCE] [-q QUERY] [-p PLOT_OUTPUT]

optional arguments:
  -h, --help            show this help message and exit
  -i IBSPY_COUNTS, --IBSpy_counts IBSPY_COUNTS
                        tvs file genetared by IBSpy output
  -w WINDOW_SIZE, --window_size WINDOW_SIZE
                        Windows size to count variations within
  -f FILTER_COUNTS, --filter_counts FILTER_COUNTS
                        Filter number of variaitons above this threshold to
                        compute GMM model, default=None
  -n N_COMPONENTS, --n_components N_COMPONENTS
                        Number of componenets for the GMM model, default=3
  -c COVARIANCE_TYPE, --covariance_type COVARIANCE_TYPE
                        type of covariance used for GMM model, default="full"
  -s STITCH_NUMBER, --stitch_number STITCH_NUMBER
                        Consecutive "outliers" in windows to stitch, default=3
  -o OUTPUT, --output OUTPUT
                        tsv file with variations count by windows and summary
                        statistics
  -r REFERENCE, --reference REFERENCE
                        genome reference name
  -q QUERY, --query QUERY
                        query sample
  -p PLOT_OUTPUT, --plot_output PLOT_OUTPUT
                        histograms and ascatter files in .PDF format

IBSplot uses the output table generated by IBSpy described above (e.g., "kmer_windows_LineXXX.tsv.gz"). It can be used to count variant assigning larger windows. In the example below it is using 400,000 bp windows to compute a GMM model and generate the plots.

To generate the table with variant count categorized by the GMM model as IBS or non-IBS and generate the plots, run the following command: The description of the GMM model is here

# minimal arguments
IBSplot --IBSpy_counts "kmeribs-Wheat_Jagger-Flame.tsv.gz" --window_size 400000 --output gmm_ibs.tsv.gz --reference Jagger --query Flame --plot_output gmm_plots.pdf

In addition, you can include some or all of the following commands to tune the GMM model parameters and define the best IBS and non-IBS according to the reference and query sample used:

IBSplot --filter_counts 1000 --n_components 3 --covariance_type 'full' --stitch_number 3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

IBSpy-0.3.1.tar.gz (6.2 MB view details)

Uploaded Source

File details

Details for the file IBSpy-0.3.1.tar.gz.

File metadata

  • Download URL: IBSpy-0.3.1.tar.gz
  • Upload date:
  • Size: 6.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.8

File hashes

Hashes for IBSpy-0.3.1.tar.gz
Algorithm Hash digest
SHA256 dc75ab955212567bd69f0efaba2f5b2eb4858f3237fd998c4dad6cff1d935933
MD5 e4134ee0d59a8410c936da69f87d655c
BLAKE2b-256 b0f636be12dca14277e1f4143b9de2ca5e92b6e2516d64697db0c89082cfb086

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page