Skip to main content

ShiftScan CLI: shiftscan

Project description

ShiftSCAN: Shift detection and Source Confirmation by Alignment and Navigation

A computational pipeline for detecting alternative chimeric and non-chimeric sources of mass spectrometry-identified peptides.


Overview

ShiftSCAN enables systematic identification of alternative genomic sources for chimeric peptides. The tool addresses critical challenges in programmed ribosomal frameshifting (PRF) studies by:

  • Identifying potential alternative loci for chimeric peptides
  • Differentiating between unique and multi-source chimeric origins
  • Detecting reverse complement translation events
  • Analyzing repeat element contributions to chimeric translation

Key applications include functional genomics studies of non-canonical translation events and validation of putative PRF sites.


Features

  • Multi-frame translation analysis (forward and reverse frames 0, +1, and +2)
  • Repeat element integration for comprehensive source detection
  • CSV output with frameshift position annotations
  • Parallel processing support for large genomes

Available Commands

ShiftSCAN operates through a single command with multiple configurable parameters:

shiftscan

Identifies chimeric sequences in nucleotide databases through frameshift analysis.

Usage

shiftscan --nucleotide_file <genome.fasta> --peptide_file <peptides.fasta> [--max_gap 2] [--codon_table 1] [--num_threads 1] [--blast False] [--sensitivity 5.0] [--word_size 2] [--evalue 1000] [--output results] [--max_transcript_length 30000] [--max_flanking_seq 500] [--no_reverse_complement_check False]

Parameters

  • --nucleotide_file (required): Path to nucleotide FASTA file (genome/transcriptome/repeatome).
  • --peptide_file: Path to query peptide FASTA file.
  • --max_gap: Maximum allowed gap size in nucleotides (default: 2). This parameter corresponds to the frameshift value. Default value of 2 will permit consideration of frameshifts by zero (no frameshift), one, and two nucleotides in either direction. The maximal setting of this parameter is user-defined such as 10, which means unconventionally “long” frameshifting events can be considered as alternative sources of a given query peptide sequence.
  • --codon_table: ID of the NCBI codon table (default: The Standard Code). Users can import other codon tables by typing corresponding table IDs (NCBI).
  • --num_threads: Number of parallel processing threads (default: 1). This parameter can be user-defined based on the availability of processors in the system.
  • --blast: Enable BLAST pre-filtering for acceleration (not recommended). Omitting this flag will disable BLAST. BLAST pre-filtering can be of use only if the input file is very large and only if losing the information on some alternative sources is acceptable for a specific study. This option is available if BLAST was downloaded onto a personal computer. If one’s system did not add BLAST to the PATH environment variable, only one folder needs to be added to the system’s PATH, the one containing all the BLAST executables (blastn.exe, tblastn.exe, blastx.exe, etc.).
  • --threshold: Word inclusion threshold (higher = slower) for the BLAST acceleration (default: 5.0).
  • --word_size: BLAST word size parameter (default: 2).
  • --evalue: BLAST e-value threshold (default: 1000).
  • --output: Base name for the CSV output file (default: results).
  • --max_transcript_length: Max length of a transcript sequence (default: 30,000, in nucleotides). If a subject sequence exceeds this threshold, it will be trimmed according to the setting of [--max_flanking_seq] below.
  • --max_flanking_seq: Max length of flanking sequences to include (default: 500, in nucleotides). If a subject sequence is shorter than the value set in [--max_transcript_length] (see above), it will be recorded as a whole sequence in the output file. For subject sequences longer than [--max_transcript_length] (e.g., a chromosome), this setting will record the frameshift region (the complete matching part) plus [--max_flanking_seq] nucleotides upstream and downstream of the matching site.
  • --no_reverse_complement_check: Omitting this flag will enable reverse complement analysis. This is recommended because the software does not discriminate between transcript, repeat, and genomic sequences. In genomic and repeat sequences, both strands can encode peptides and proteins, even in the same region (oppositely overlapping genes).

Output

Contains tabular data on detected chimeric and non-chimeric sequences with these columns:

  • Type: Detection type. “Frameshift” stands for a chimeric alternative source. “Without frameshift” stands for a non-chimeric alternative source (direct match).
  • Frameshift Position: Position of the last amino acid before the frameshift (position 1 is the first amino acid in a query peptide).
  • Segment 1: Contains a portion of a query peptide sequence before the frameshift.
  • Segment 2: Contains a portion of a query peptide sequence after the frameshift.
  • Gap: Indicates the frameshifts value in the alternative source. If the maximal gap value is set to 2 (default), the following alternative sources will be reported: 0 (no frameshift), +1 and +2 (forward frameshifts), -1 and -2 (backward frameshifts).
  • Frameshift Direction: Reading frame transition (Frame n -> Frame m).
  • Nucleotide Title: Contains a header of a subject sequence from the input nucleotide FASTA file.
  • Nucleotide Sequence: Contains a subject sequence delimited by the setting in [--max_flanking_seq].
  • Protein Title: Contains a query header from the input peptide FASTA file.
  • Protein Sequence: Contains a query peptide sequence.
  • Frame Direction: Strand orientation. Forward frames alone (e.g., transcriptome input) or together with reverse frames (genomic DNA or repeatome input) can be considered.
  • Truncation for Nucleotide Sequence (True or False): If the subject nucleotide sequence exceeds [--max_transcript_length], it will be truncated in the given [--max_flanking_seq] threshold level (output True). Else, the output is False.

ShiftScan Installation Guide

Requirements

  • python 3.8+
  • biopython >= 1.81
  • pandas >= 2.0
  • NCBI BLAST+ (tblastn)

Installation

  • Install from PyPI

pip install shiftscan

  • For development/editable installation

git clone https://github.com/umutcakir/shiftscan cd shiftscan pip install -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shiftscan-0.1.6.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shiftscan-0.1.6-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file shiftscan-0.1.6.tar.gz.

File metadata

  • Download URL: shiftscan-0.1.6.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for shiftscan-0.1.6.tar.gz
Algorithm Hash digest
SHA256 bbe9e634d1c90fc7b131a30437c2e1c908fd6dfec5b28f466d084a8ebe595043
MD5 d048d89451e93f6973928f3fefe7649f
BLAKE2b-256 63c11be2a1f800c80c0122598edee73e3dc41d9b763285c8b44ff3856db964cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for shiftscan-0.1.6.tar.gz:

Publisher: publish.yml on umutcakir/shiftscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shiftscan-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: shiftscan-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for shiftscan-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 13bbdd99ea7a286de0e8cab0b9f414284a00ae92cba7f3b68506d77402e507dc
MD5 bef6c5605c22d2badde526347d8082e2
BLAKE2b-256 c3c471c3eac17ad4f88c677240f7f2318a4cc269f58e4024f87340cc4dfaad36

See more details on using hashes here.

Provenance

The following attestation bundles were made for shiftscan-0.1.6-py3-none-any.whl:

Publisher: publish.yml on umutcakir/shiftscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page