Skip to main content

TnSeeker

Project description

Tnseeker

Tnseeker is an advanced pipeline tailored for transposon insertion sequencing (Tn-Seq) analysis. It performs an array of tasks: from read trimming and alignment to associating genomic locations with transposon insertions and inferring essential genes based on transposon insertion densities. Additionally, Tnseeker is adept at extracting barcodes from raw fastq files and linking them to corresponding transposon genomic locations for subsequent analysis. What truly distinguishes Tnseeker from other tools is its unique capability to automatically infer and adjust threshold/cutoff parameters. This negates the need for intricate user input, allowing for a more precise determination of gene essentiality based on the data. Compatible with any transposon disruption experiment, Tnseeker efficiently mitigates transposon-specific biases, including those seen with HIMAR. Hence, Tnseeker is versatile enough to handle all Tn-Seq datasets. Tnseeker is under active developement and is available as is. Contact me if you are interested in using the program or have any questions. Bugs can be expected. Please report any weird or unintented behaviour.

Requirements

The tnseeker pipeline requires both Python3 and Bowtie2 to be callable from the terminal (and added to path).

Executing

tnseeker is executable from the command line by typing:

python -m tnseeker

An example use case is the folowing. See below the meaning of the input arguments:

python -m tnseeker -s BW25113 -sd '/your/data/directory/folder_with_fastq.gz_files' -ad /your/annotations/directory/ -at gb -st SE --tn AGATGTGTATAAGAGACAG --ph 10 --mq 40

Optional Arguments:

-h, --help show this help message and exit

-s S Strain name. Must match the annotation (FASTA/GB) file names

-sd SD The full path to the sequencing files FOLDER

--sd_2 SD_2 The full path to the pair ended sequencing files FOLDER (needs to be different from the first folder)

-ad AD The full path to the directory with the .gb and .fasta files

-at AT Annotation Type (Genbank)

-st ST Sequencing type (Paired-ended (PE)/Single-ended(SE)

--tn [TN] Transposon border sequence (tn5: GATGTGTATAAGAGACAG). Required for triming and proper mapping

--m [M] Mismatches in the transposon border sequence (default is 0)

--k [K] Remove intermediate files. Default is yes, remove.

--e [E] Run only the essential determing script. required the all_insertions_STRAIN.csv file to have been generated first.

--t [T] Trims to the indicated nucleotides length AFTER finding the transposon sequence. For example, 100 would mean to keep the 100bp after the transposon (this trimmed read will be used for alignement after)

--b [B] Run with barcode extraction

--b1 [B1] upstream barcode sequence (example: ATC)

--b2 [B2] downstream barcode sequence (example: CTA)

--b1m [B1M] upstream barcode sequence mismatches

--b2m [B2M] downstream barcode sequence mismatches

--b1p [B1P] upstream barcode sequence Phred-score filtering. Default is no filtering

--b2p [B2P] downstream barcode sequence Phred-score filtering. Default is no filtering --rt [RT] Read threshold number

--ne [NE] Run without essential Finding

--ph [PH] Phred Score (removes reads where nucleotides have lower phred scores)

--mq [MQ] Bowtie2 MAPQ threshold

--ig [IG] The number of bp up and down stream of any gene to be considered an intergenic region

--pv [PV] Essential Finder pvalue threshold for essentiality determination

--sl5 [SL5] 5' gene trimming percent for essentiality determination (number between 0 and 1)

--sl3 [SL3] 3' gene trimming percent for essentiality determination (number between 0 and 1)

Dependencies

tnseeker requires several dependencies, all instalable via pip commands. A notable exception is the poibin module, which is available in the current tnseeker folder (you as the user don't need to do anything else), and can be originally be found here: https://github.com/tsakim/poibin

File requirements

tnseeker requires several input files:

  1. A '.fastq.gz' file (needs to be .gz)

  2. An annotation file in genbank format (.gb)

  3. A FASTA file with the genome under analysis (needs to be .fasta).

Working modes

tnseeker is composed of 2 submodules:

  1. the initial sequencing processing: Handles the read trimming and alignment, creating a compiled .csv with all found transposon insertions.

  2. The Essential_finder: Infers gene essentiality from the insertion information found in the previous .csv file. tnseeker can thus be run on a standalone mode if the appropriate .csv and annotation files are indicated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tnseeker-1.0.0.tar.gz (37.6 kB view details)

Uploaded Source

Built Distribution

tnseeker-1.0.0-py3-none-any.whl (48.0 kB view details)

Uploaded Python 3

File details

Details for the file tnseeker-1.0.0.tar.gz.

File metadata

  • Download URL: tnseeker-1.0.0.tar.gz
  • Upload date:
  • Size: 37.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.10

File hashes

Hashes for tnseeker-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a6866b58f4d5f5d316464c240d176fba874219fb902d7b1a479cc43a0b3cc323
MD5 d4ccf061b3e59204f798130fdd88f7f0
BLAKE2b-256 b39f813284d13c024342353eb4eeb8dcf81d7e376da47d85e2cd1d94d7844e1e

See more details on using hashes here.

File details

Details for the file tnseeker-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: tnseeker-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 48.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.10

File hashes

Hashes for tnseeker-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d972ea2d03ed118a3e24d169e18e6bc9cb7a9d3810ded67c3a97fa6168546e35
MD5 de41d0483cba68d540a685d059c5644f
BLAKE2b-256 1b6bbd13f7eb7f2924921f56adf8106c9f87c3ca10eb9a4d7fb9aeaacb5490f6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page