Skip to main content

ShiftScan CLI: shiftscan

Project description

ShiftSCAN: Shift detection and Source Confirmation by Alignment and Navigation

A computational pipeline for detecting alternative chimeric and non-chimeric sources of mass spectrometry-identified peptides.


Overview

ShiftSCAN enables systematic identification of alternative genomic sources for chimeric peptides. The tool addresses critical challenges in programmed ribosomal frameshifting (PRF) studies by:

  • Identifying potential alternative loci for chimeric peptides
  • Differentiating between unique and multi-source chimeric origins
  • Detecting reverse complement translation events
  • Analyzing repeat element contributions to chimeric translation

Key applications include functional genomics studies of non-canonical translation events and validation of putative PRF sites.


Features

  • Multi-frame translation analysis (forward and reverse frames 0, +1, and +2)
  • Repeat element integration for comprehensive source detection
  • CSV output with frameshift position annotations
  • Parallel processing support for large genomes

Available Commands

ShiftSCAN operates through a single command with multiple configurable parameters:

shiftscan

Identifies chimeric sequences in nucleotide databases through frameshift analysis.

Usage

shiftscan --nucleotide_file <genome.fasta> --peptide_file <peptides.fasta> [--max_gap 2] [--codon_table 1] [--num_threads 1] [--blast False] [--sensitivity 5.0] [--word_size 2] [--evalue 1000] [--output results] [--max_transcript_length 30000] [--max_flanking_seq 500] [--no_reverse_complement_check False]

Parameters

  • --nucleotide_file (required): Path to nucleotide FASTA file (genome/transcriptome/repeatome).
  • --peptide_file: Path to query peptide FASTA file.
  • --max_gap: Maximum allowed gap size in nucleotides (default: 2). This parameter corresponds to the frameshift value. Default value of 2 will permit consideration of frameshifts by zero (no frameshift), one, and two nucleotides in either direction. The maximal setting of this parameter is user-defined such as 10, which means unconventionally “long” frameshifting events can be considered as alternative sources of a given query peptide sequence.
  • --codon_table: ID of the NCBI codon table (default: The Standard Code). Users can import other codon tables by typing corresponding table IDs (NCBI).
  • --num_threads: Number of parallel processing threads (default: 1). This parameter can be user-defined based on the availability of processors in the system.
  • --blast: Enable BLAST pre-filtering for acceleration (not recommended). Omitting this flag will disable BLAST. BLAST pre-filtering can be of use only if the input file is very large and only if losing the information on some alternative sources is acceptable for a specific study. This option is available if BLAST was downloaded onto a personal computer. If one’s system did not add BLAST to the PATH environment variable, only one folder needs to be added to the system’s PATH, the one containing all the BLAST executables (blastn.exe, tblastn.exe, blastx.exe, etc.).
  • --threshold: Word inclusion threshold (higher = slower) for the BLAST acceleration (default: 5.0).
  • --word_size: BLAST word size parameter (default: 2).
  • --evalue: BLAST e-value threshold (default: 1000).
  • --output: Base name for the CSV output file (default: results).
  • --max_transcript_length: Max length of a transcript sequence (default: 30,000, in nucleotides). If a subject sequence exceeds this threshold, it will be trimmed according to the setting of [--max_flanking_seq] below.
  • --max_flanking_seq: Max length of flanking sequences to include (default: 500, in nucleotides). If a subject sequence is shorter than the value set in [--max_transcript_length] (see above), it will be recorded as a whole sequence in the output file. For subject sequences longer than [--max_transcript_length] (e.g., a chromosome), this setting will record the frameshift region (the complete matching part) plus [--max_flanking_seq] nucleotides upstream and downstream of the matching site.
  • --no_reverse_complement_check: Omitting this flag will enable reverse complement analysis. This is recommended because the software does not discriminate between transcript, repeat, and genomic sequences. In genomic and repeat sequences, both strands can encode peptides and proteins, even in the same region (oppositely overlapping genes).

Output

Contains tabular data on detected chimeric and non-chimeric sequences with these columns:

  • Type: Detection type. “Frameshift” stands for a chimeric alternative source. “Without frameshift” stands for a non-chimeric alternative source (direct match).
  • Frameshift Position: Position of the last amino acid before the frameshift (position 1 is the first amino acid in a query peptide).
  • Segment 1: Contains a portion of a query peptide sequence before the frameshift.
  • Segment 2: Contains a portion of a query peptide sequence after the frameshift.
  • Gap: Indicates the frameshifts value in the alternative source. If the maximal gap value is set to 2 (default), the following alternative sources will be reported: 0 (no frameshift), +1 and +2 (forward frameshifts), -1 and -2 (backward frameshifts).
  • Frameshift Direction: Reading frame transition (Frame n -> Frame m).
  • Nucleotide Title: Contains a header of a subject sequence from the input nucleotide FASTA file.
  • Nucleotide Sequence: Contains a subject sequence delimited by the setting in [--max_flanking_seq].
  • Protein Title: Contains a query header from the input peptide FASTA file.
  • Protein Sequence: Contains a query peptide sequence.
  • Frame Direction: Strand orientation. Forward frames alone (e.g., transcriptome input) or together with reverse frames (genomic DNA or repeatome input) can be considered.
  • Truncation for Nucleotide Sequence (True or False): If the subject nucleotide sequence exceeds [--max_transcript_length], it will be truncated in the given [--max_flanking_seq] threshold level (output True). Else, the output is False.

ShiftScan Installation Guide

Requirements

  • python 3.8+
  • biopython >= 1.81
  • pandas >= 2.0
  • NCBI BLAST+ (tblastn)

Installation

  • Install from PyPI

pip install shiftscan

  • For development/editable installation

git clone https://github.com/umutcakir/shiftscan cd shiftscan pip install -e .

If you use ShiftSCAN in your research, please cite the following article: Umut Çakır, Noujoud Gabed, Ali Yurtseven, Igor Kryvoruchko (2025). ShiftSCAN, a program that predicts potential alternative sources of mass spectrometry-derived peptides, improves the accuracy of studies on novel amino acid sequences. bioRxiv (Cold Spring Harbor Laboratory). https://doi.org/10.1101/2025.05.30.656965 To report bugs, ask questions, or suggest features, feel free to open an issue on GitHub. Your feedback and citations help us improve and sustain this tool.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shiftscan-0.1.9.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shiftscan-0.1.9-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file shiftscan-0.1.9.tar.gz.

File metadata

  • Download URL: shiftscan-0.1.9.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for shiftscan-0.1.9.tar.gz
Algorithm Hash digest
SHA256 faaef68effd3866297e9948b848d25f8a240bdbc100c04463be563969f1eda3a
MD5 0fd0a33f25dc6310c2a9c49513257a44
BLAKE2b-256 0ad4c2395ef2e6187041aa1150e745a9e4b4dc92183bcfba9f116851da5e1c25

See more details on using hashes here.

Provenance

The following attestation bundles were made for shiftscan-0.1.9.tar.gz:

Publisher: publish.yml on umutcakir/shiftscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file shiftscan-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: shiftscan-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for shiftscan-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 3570571fadde02f8699bf61f33c16748a477b8ac4b54211a3f3ec54fe24c12c9
MD5 3fcfdd0c32fc94c5e56c54250421fc44
BLAKE2b-256 843bcc84653b97ed8cb49bb3a0d56e49868961ddd0fad586f462cec870a7d4f2

See more details on using hashes here.

Provenance

The following attestation bundles were made for shiftscan-0.1.9-py3-none-any.whl:

Publisher: publish.yml on umutcakir/shiftscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page