Skip to main content

TnSeeker

Project description

Maintenance PyPI Docker Image Version (latest by date) Docker Pulls

Tnseeker

Tnseeker is an advanced pipeline tailored for transposon insertion sequencing (Tn-Seq) analysis.

It performs an array of tasks:

  1. Read trimming based on the presence of transposon sequences & extraction of associated linked barcodes
  2. Alignment to reference genome (bowtie2)
  3. Links genomic locations (using .gff or .gb files as input) with transposon insertion locations
  4. Infer essential genes based on global and local (contig wise) transposon insertion densities

What truly distinguishes Tnseeker from other tools is its unique capability to automatically infer and adjust threshold/cutoff parameters. This negates the need for intricate user input, allowing for a more precise determination of gene essentiality based on the data.

Tnseeker is also compatible with any transposon disruption experiment. Be it Tn5, HIMAR, or anything else. Hence, Tnseeker is versatile enough to handle all Tn-Seq datasets.

Tnseeker is under active developement and is available as is. Contact me if you are interested in using the program or have any questions. Bugs can be expected. Please report any weird or unintented behaviour.

Instalation

There are two ways of installing tnseeker:

1. Recommended installation (using singularity)

  1. Install singularity in your system
conda create -n singularity -c conda-forge singularity -y
  1. Active the conda environment
conda activate singularity
  1. Download the docker image from dockerhub. This will write a singularity container named tnseeker_latest.sif into your current work directory.
singularity pull docker://afombravo/tnseeker:latest
  1. Start an interactive session of the container. Importantly, you need to --bind all the input files. It is easiest if you put all input files into a single folder, as this is how tnseeker expects it's input. The results will be written into the same folder. Note: the :rw at the end of the path is crucial for singularity to obtain read/write permission and hence be able to compute.
singularity shell --bind /path/to/folder/containing/all/input/files:/input_files:rw \
                  tnseeker_latest.sif
  1. Start a tnseeker run, like so:
cd /input_files; 
tnseeker \
  --cpu 4 \
  -s ORGANISM_FASTA/GB/GFF_NAME \
  -sd .  \
  -ad .  \
  -at gff \
  -st SE \
  --tn TN_SEQUENCE (ex: AGATTA) \ 
  --m 6 \
  --b \
  --b1 UPSTREAM_BARCODE_SEQUENCE (ex: AGAGA) \
  --b2 DOWNSTREAM_BARCODE_SEQUENCE (ex: ATATAT) \
  --ph 10 \
  --mq 20 \
  --b1m 3 \
  --b2m 3 \
  --ig 100 

When using HPC systems it is advisable to include the --cpu flag and specify the amount of threads.


2. Recommended installation #2 (using docker)

  1. Install docker in your system
  2. Download the docker image from dockerhub
docker pull afombravo/tnseeker:latest
  1. Rename to just tnseeker
docker tag afombravo/tnseeker:latest tnseeker

Alternatively, download the docker file from this repo and build it yourself.

docker build --no-cache -t tnseeker .
  1. Start tnseeker docker image with the comand:
docker run -it -v "<local_path/to/all/your/data>:/data" tnseeker
  1. Start tnseeker with:
tnseeker -sd ./ -ad ./ <ALL OTHER TNSEEKER COMANDS HERE>

NOTE: all files required by tnseeker, such as .fasta, .fastq, .gb, or .gff, need to be in the local folder indicated in 4. You then can use the -sd and -ad flags as indicated here in 5.


3. Alternative installation

The tnseeker pipeline requires Python3, Bowtie2, and BLAST, to be callable from the terminal (and added to path).

For local BLAST

apt update
apt install ncbi-blast+

For bowtie2

apt install bowtie2

PyPI module

tnseeker can be installed as PyPI module with the folowing:

pip install tnseeker

tnseeker is executable from the command line by typing:

tnseeker

Running Tnseeker

Tnseeker also has a test mode, where the blast, Bowtie2 instalations are tested, and a small run on a test dataset is performed.

tnseeker --tst

An example use case is the folowing. See below the meaning of the input arguments:

tnseeker -s BW25113 -sd ./ -ad ./ -at gb -st SE --tn AGATGTGTATAAGAGACAG --ph 10 --mq 40

File requirements

tnseeker requires several input files:

  1. A '.fastq.gz' file (needs to be .gz)

  2. An annotation file in genbank format (.gb), or a .gff (there is an example gff format file in this repo)

  3. A FASTA file with the genome under analysis.


Optional Arguments:

-h, --help show this help message and exit

-s S Strain name. Must match the annotation (FASTA/GB) file names

-sd SD The full path to the sequencing files FOLDER

--sd_2 SD_2 The full path to the pair ended sequencing files FOLDER (needs to be different from the first folder)

-ad AD The full path to the directory with the .gb and .fasta files

-at AT Annotation Type (Genbank)

-st ST Sequencing type (Paired-ended (PE)/Single-ended(SE)

--tst [TST] Test mode to confirm everything works as expected.

--tn [TN] Transposon border sequence (tn5: GATGTGTATAAGAGACAG). Required for triming and proper mapping

--m [M] Mismatches in the transposon border sequence (default is 0)

--k [K] Remove intermediate files. Default is yes, remove.

--e [E] Run only the essential determing script. required the all_insertions_STRAIN.csv file to have been generated first.

--t [T] Trims to the indicated nucleotides length AFTER finding the transposon sequence. For example, 100 would mean to keep the 100bp after the transposon (this trimmed read will be used for alignement after)

--b [B] Run with barcode extraction

--b1 [B1] upstream barcode sequence (example: ATC)

--b2 [B2] downstream barcode sequence (example: CTA)

--b1m [B1M] upstream barcode sequence mismatches

--b2m [B2M] downstream barcode sequence mismatches

--b1p [B1P] upstream barcode sequence Phred-score filtering. Default is no filtering

--b2p [B2P] downstream barcode sequence Phred-score filtering. Default is no filtering --rt [RT] Read threshold number

--ne [NE] Run without essential Finding

--ph [PH] Phred Score (removes reads where nucleotides have lower phred scores)

--mq [MQ] Bowtie2 MAPQ threshold

--ig [IG] The number of bp up and down stream of any gene to be considered an intergenic region

--pv [PV] Essential Finder pvalue threshold for essentiality determination

--dut [DUT] fraction of the minimal amount of 'too small domains' in a gene before the entire gene is deemed uncertain for essentiality inference

--sl5 [SL5] 5' gene trimming percent for essentiality determination (number between 0 and 1)

--sl3 [SL3] 3' gene trimming percent for essentiality determination (number between 0 and 1)

--cpu [CPU] Define the number of threads (must be and integer). Advisable when using HPC systems.


Python Dependencies

tnseeker requires several dependencies, all automatically instalable A notable exception is the poibin module, which is available in the current tnseeker folder (you as the user don't need to do anything else), and can be originally be found here: https://github.com/tsakim/poibin


Working modes

tnseeker is composed of 2 submodules:

  1. the initial sequencing processing: Handles the read trimming and alignment, creating a compiled .csv with all found transposon insertions. When individual transposon read associated barcodes are present, these are also extracted.

  2. The Essential_finder: Infers gene essentiality from the insertion information found in the previous .csv file. tnseeker can thus be run on a standalone mode if the appropriate .csv and annotation files are indicated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tnseeker-1.0.7.3.tar.gz (21.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tnseeker-1.0.7.3-py3-none-any.whl (21.5 MB view details)

Uploaded Python 3

File details

Details for the file tnseeker-1.0.7.3.tar.gz.

File metadata

  • Download URL: tnseeker-1.0.7.3.tar.gz
  • Upload date:
  • Size: 21.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.13

File hashes

Hashes for tnseeker-1.0.7.3.tar.gz
Algorithm Hash digest
SHA256 f677c2769b2ba0b9faf346bc1b174011c513ac7a0f1c406713ed55b55e54c4c6
MD5 bc13b0d1123d68107e09894aec2f6f68
BLAKE2b-256 c8a814c03671cf330fcdfc79441e74e1ebea7939ebbeffc4ff330379f77f086e

See more details on using hashes here.

File details

Details for the file tnseeker-1.0.7.3-py3-none-any.whl.

File metadata

  • Download URL: tnseeker-1.0.7.3-py3-none-any.whl
  • Upload date:
  • Size: 21.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.13

File hashes

Hashes for tnseeker-1.0.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 39206a5e166a968a5e3112fef2c3e48d47f663a20b0a5d271b3609b5640a7c89
MD5 8d98f1a57d87e522c207ffa994ee2fd5
BLAKE2b-256 e35d36751721c451c99bdc4cbfb8875190f5d5321f1930fce0a8d283dbd29772

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page