RNA FISH oligos/probes design tool.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

bbquercus

These details have not been verified by PyPI

Project description

eFISHent

A command-line based tool to facilitate the creation of eFISHent single-molecule RNA fluorescence in-situ hybridization (RNA smFISH) oligonucleotide probes.

Description
Installation
Getting Genomes, Annotations, Count Tables
Workflow
Usage
Output
Full Examples
FAQ

Description

eFISHent is a tool to facilitate the creation of eFISHent RNA smFISH oligonucleotide probes. Some of the key features of eFISHent are:

One-command installation — no sudo, Docker, or conda required
Automatic gene sequence download from NCBI when providing a gene and species name (or pass a FASTA file)
Parameter presets for common FISH protocols (--preset smfish, merfish, dna-fish, etc.)
Filtering steps to remove low-quality probes including off-targets, frequently occurring short-mers, secondary structures, etc.
Mathematical or greedy optimization to ensure highest coverage

Installation

eFISHent is tested on macOS and Linux with Python 3.9+. Works on shared HPC/cluster servers via SSH — no sudo, Docker, or conda needed. For Windows users, we recommend WSL.

Quick Install (Recommended)

A single command installs eFISHent and all dependencies:

curl -LsSf https://raw.githubusercontent.com/BBQuercus/eFISHent/main/install.sh | sh

This will:

Download pre-compiled binaries for bowtie, jellyfish, GLPK, and Entrez Direct
Install Python and the eFISHent package in an isolated environment
Create an efishent command that works without any activation

After installation, restart your shell (or source ~/.zshrc) and you're ready:

efishent --check    # Verify all dependencies
efishent --help     # Show usage

Custom Installation Path

curl -LsSf https://raw.githubusercontent.com/BBQuercus/eFISHent/main/install.sh | sh -s -- --prefix /path/to/install

Updating

efishent --update

This updates both the Python package and verifies all external dependencies are still present and compatible.

Checking Dependencies

efishent --check

Shows the status of all required and optional dependencies with version info.

Using Conda

Alternatively, use conda to manage dependencies:

conda env create bbquercus/efishent
conda activate efishent
pip install efishent

Development Installation

git clone https://github.com/BBQuercus/eFISHent.git
cd eFISHent/

# Install dependencies
./install.sh --deps-only

# Install development version
uv venv && source .venv/bin/activate
uv pip install -e .

Uninstalling

curl -LsSf https://raw.githubusercontent.com/BBQuercus/eFISHent/main/install.sh | sh -s -- --uninstall

Or simply remove the install directory: rm -rf ~/.local/efishent

Getting Genomes, Annotations, Count Tables

For some of the steps described below, you'll need to provide a few key resources unique to your model organism/cell line:

Genome Sequence and Annotations

Go to the UCSC genome browser
Find your organism, typically select the Genome sequence files and select annotations option
For the actual genome, download the file ending in .fa.gz (which will have to be unzipped e.g. using gunzip)
For the annotation, it's typically in the genes/ directory ending in .gtf.gz (will also have to be unzipped)

Count Table

Go to the GEO dataset search
Search for your organism/cell line followed by RNA-seq
Select a sample that you think will represent your data the best (make sure it's RNA-seq - sometimes the search isn't the best...)
Scroll down and, if available, under Supplementary file select any files ending in /FPKM/TPM/RPKMs.txt.gz or similar. Do not download raw counts!
Alternatively you can also visit the Expression Atlas - count tables there might need some minor editing beforehand to ensure the required format of Ensembl ID and count values in columns 1 and 2 respectively

Workflow

eFISHent works by iteratively selecting probes passing various filtering steps as outlined below:

A list of all candidate probes is generated from an input FASTA file containing the gene sequence. This sequence file can be passed manually or downloaded automatically from NCBI when providing a gene and species name.
The first round of filtering removes any probes not passing basic sequence-specific criteria including melting temperature as given formamide and salt concentrations, GC content, and G-quadruplets.
Probes are aligned to the reference genome using bowtie and candidates with off-targets are removed. In case of shorter genes or if off-targets are unavoidable, off-targets can be weighted using an encode count table to remove highly expressed genes.
The targets are divided into short k-mers and discarded if they appear above a determined threshold in the reference genome using Jellyfish.
The secondary structure of each candidate is predicted using a nearest neighbor thermodynamic model and filtered if the free energy is too high which could result in motifs hindering hybridization.
This gives the set of all viable candidates which are still overlapping. The final step is to use mathematical or greedy optimization to maximize probe non-overlapping coverage across the gene sequence.

Usage

Quick Start

eFISHent --reference-genome <reference-genome> --gene-name <gene> --organism-name <organism>

Index Building

While there is only one main workflow, the slightly more time-intensive index creation step can be run ahead of time. Indexes are unique to each reference genome and can be created using:

eFISHent \
    --reference-genome <path to genome fasta file> \
    --build-indices True

Passing Gene Sequence

The actual probe-creating workflow will then not only require the reference genome but also the sequence against which probes should be designed. Probes can be passed in one of three ways:

--sequence-file - Path to a fasta file containing the gene sequence
--ensembl-id (& --organism-name) - Ensembl ID of gene of interest. Will be downloaded from Entrez. The organism name can also be passed to avoid some wonky organism genes that have similar names but isn't required.
--gene-name & --organism-name - Instead of ensembl ID, both gene and organism name can be provided. The sequence will also be downloaded from Entrez.

Optimization Options

There are two ways in which the final set of probe candidates that passed filtering can be assigned/selected using --optimization-method:

greedy - Uses the next best possibility in line starting with the first probe. This has a time complexity of O(n) with n being the number of candidates. Therefore, even with very loosely set parameters and a lot of candidates, this will still be very fast. This is the default option.
optimal - Uses a mathematical optimization model to yield highest coverage (number of nucleotides bound to a gene). This has a time complexity of O(n**2) meaning the more probes there are the exponentially slower it will get. Despite breaking the problem into chunks, this might be restrictively slow. However, you can set a time limit (--optimization-time-limit) to stop the optimization process after a given amount of seconds. The resultant probes will be the best ones found so far.

Off-Target Handling

To minimize the effect of off-targets, you can employ one of two strategies:

Off-target minimization - Using the maximum off-target flag, you can specify the maximum number of off-target bindings in the genome. By default this is set to zero meaning there aren't any known off-targets. However, for shorter or more repetitive genes, this might pose an issue which is why you can also use...

Off-target weighting - If off-targets are unavoidable, you can provide three parameters to select how high their expression is allowed to get:

--reference-annotation - A GTF genome annotation file to know which genes correspond to which genomic loci
--encode-count-table - A csv or tsv file with any normalized RNA-seq count table format (FPKM, FPKM, TPM, etc.) as well as the encode ID matching the entries in the GTF file
--max-expression-percentage - The percentage of genes to be excluded sorted based on expression level (using the provided count table)

If you don't have your own RNA-seq dataset, you can download available datasets (make sure you're not using raw, but only normalized counts!) at Gene Expression Omnibus. Search for RNA-seq and the name of your organism/cell line.

General Filtering Parameters

There are a bunch of parameters that can be set to adjust filtering steps:

Parameter	Description
`--min-length`, `--max-length`	Probe lengths in nucleotides
`--spacing`	Minimum distances between probes
`--min-tm`, `--max-tm`	Minimum and maximum possible melting temperature (affected by length, GC content, and formamide/Na concentration)
`--min-gc`, `--max-gc`	GC content in percentage
`--formamide-concentration`	Percentage of formamide in buffer
`--na-concentration`	Sodium ion concentration in mM
`--kmer-length`, `--max-kmers`	Jellyfish-based short-mer filtering. If candidate probes contain kmers of length `--kmer-length` that are found more than `--max-kmers` in the reference genome, the candidate will get discarded
`--max-deltag`	Predicted secondary structure threshold
`--sequence-similarity`	Will remove probes that might potentially bind to each other (set the similarity to the highest allowed binding percentage). Will reduce the number of probes because (due to otherwise exceptionally high runtimes) has to be run after optimization

Remaining Options

There are a few additional options:

Parameter	Description
`--is-plus-strand`	Set true/false depending on gene of interest
`--is-endogenous`	Set true/false depending on gene of interest
`--threads`	Wherever multiprocessing is available, spawn that many threads. Set this to as many cores as you have available
`--save-intermediates`	Save all intermediary files. Can be used to gauge which filtering steps are set too aggressively
`--verbose`	Set to get some more information on progress

Probe Set Analysis

After generating a probe set, you can analyze it in more detail using the --analyze-probeset option. This creates a PDF report with visualizations of various probe characteristics:

eFISHent \
    --reference-genome <path to genome fasta file> \
    --sequence-file <path to gene fasta file> \
    --analyze-probeset <path to probe set fasta file>

The analysis includes:

Plot	Description
Lengths	Distribution of probe lengths
Melting temperatures	Boxplot of calculated Tm values
GC Content	Boxplot of GC percentages
G quadruplet	Count of G-quadruplet motifs per probe
K-mer count	Maximum k-mer frequency in genome
Free energy	Predicted secondary structure stability (ΔG)
Off target count	Number of off-target binding sites per probe
Binding affinity	Probe-to-probe similarity matrix (potential cross-hybridization)
Gene coverage	Visual map of probe positions along the target sequence

The output is saved as <probeset_name>_analysis.pdf in the current directory.

Output

By default eFISHent will output three unique files:

GENE_HASH.fasta - All probes in FASTA format for subsequent usage
GENE_HASH.csv - A table containing all probes as well as basic parameters (such as melting temperature)
GENE_HASH.txt - A configuration file to check which parameters were used during the run as well as the command to start it

GENE is a reinterpreted gene name dependent on the options passed but should be immediately clear as to where it's from. The HASH is a unique set of characters that identifies the parameters passed for the run. This way, if and only if the same parameters are passed again eFISHent doesn't have to rerun anything. All intermediary files during the run will be saved in the same format but will get deleted at the end unless --save-intermediates is set to true.

Full Examples

First, the indexes for the respective genome have to be built:

eFISHent \
    --reference-genome ./hg-38.fa \
    --build-indices True

An example to create 45 to 50-mers for a gene of interest downloaded from Entrez:

eFISHent \
    --reference-genome ./hg-38.fa \
    --gene-name "norad" \
    --organism-name "homo sapiens" \
    --is-plus-strand True \
    --optimization-method optimal \
    --min-length 45 \
    --max-length 50 \
    --formamide-concentration 45 \
    --threads 8

Another example using a custom sequence:

eFISHent \
    --reference-genome ./dm6.fa \
    --sequence-file "./renilla.fasta" \
    --is-endogenous False \
    --threads 8

An example with off-target weighting:

eFISHent \
    --reference-genome ./hg-38.fa \
    --reference-annotation ./hg-38.gtf \
    --ensembl-id ENSG00000128272 \
    --organism-name "homo sapiens" \
    --is-plus-strand False \
    --max-off-targets 5 \
    --encode-count-table ./count_table.tsv \
    --max-expression-percentage 20 \
    --threads 8

Lastly, an example to analyze an existing probe set:

eFISHent \
    --reference-genome ./hg-38.fa \
    --sequence-file ./my_gene.fasta \
    --analyze-probeset ./my_gene_probes.fasta

FAQ

Have questions? Open an issue on GitHub.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

bbquercus

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.17

Apr 9, 2026

0.0.16

Apr 9, 2026

0.0.15

Apr 9, 2026

0.0.14

Apr 9, 2026

0.0.13

Apr 7, 2026

0.0.12

Apr 5, 2026

0.0.11

Mar 30, 2026

0.0.9

Mar 12, 2026

This version

0.0.8

Mar 8, 2026

0.0.7

Dec 9, 2025

0.0.5

Jun 19, 2023

0.0.4

Mar 21, 2023

0.0.3

Mar 16, 2023

0.0.2

Sep 29, 2022

0.0.1

Jun 23, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

efishent-0.0.8.tar.gz (1.4 MB view details)

Uploaded Mar 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

efishent-0.0.8-py3-none-any.whl (1.4 MB view details)

Uploaded Mar 8, 2026 Python 3

File details

Details for the file efishent-0.0.8.tar.gz.

File metadata

Download URL: efishent-0.0.8.tar.gz
Upload date: Mar 8, 2026
Size: 1.4 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for efishent-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`d79e39e02ae7d020e219b39ef9b8606a647a1e3f51bf87ffd5742a586ec07022`
MD5	`f7f00b45e288c4dc32fe36a9a0018513`
BLAKE2b-256	`22ea75207a248bd229ff01b371c35c9297bcac191b1de8f4f8b608db78ace84b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for efishent-0.0.8.tar.gz:

Publisher: pypi.yml on BBQuercus/eFISHent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: efishent-0.0.8.tar.gz
- Subject digest: d79e39e02ae7d020e219b39ef9b8606a647a1e3f51bf87ffd5742a586ec07022
- Sigstore transparency entry: 1059506566
- Sigstore integration time: Mar 8, 2026
Source repository:
- Permalink: BBQuercus/eFISHent@84edc2b3aa09db7e0eeefc1efd5b149a4b9ce003
- Branch / Tag: refs/tags/v0.0.8
- Owner: https://github.com/BBQuercus
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@84edc2b3aa09db7e0eeefc1efd5b149a4b9ce003
- Trigger Event: release

File details

Details for the file efishent-0.0.8-py3-none-any.whl.

File metadata

Download URL: efishent-0.0.8-py3-none-any.whl
Upload date: Mar 8, 2026
Size: 1.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for efishent-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0a6d034f75375b9491001282c863578a1f2a6cc52882eb75660faa031c82414`
MD5	`43aebc55713b3935ace6c4dec1990135`
BLAKE2b-256	`6072232f3d80c8abec6d9e49afe1f459e54862650633ce6920ba48a2aeb8461c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for efishent-0.0.8-py3-none-any.whl:

Publisher: pypi.yml on BBQuercus/eFISHent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: efishent-0.0.8-py3-none-any.whl
- Subject digest: a0a6d034f75375b9491001282c863578a1f2a6cc52882eb75660faa031c82414
- Sigstore transparency entry: 1059506575
- Sigstore integration time: Mar 8, 2026
Source repository:
- Permalink: BBQuercus/eFISHent@84edc2b3aa09db7e0eeefc1efd5b149a4b9ce003
- Branch / Tag: refs/tags/v0.0.8
- Owner: https://github.com/BBQuercus
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@84edc2b3aa09db7e0eeefc1efd5b149a4b9ce003
- Trigger Event: release

eFISHent 0.0.8

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

eFISHent

Contents

Description

Installation

Quick Install (Recommended)

Custom Installation Path

Updating

Checking Dependencies

Using Conda

Development Installation

Uninstalling

Getting Genomes, Annotations, Count Tables

Genome Sequence and Annotations

Count Table

Workflow

Usage

Quick Start

Index Building

Passing Gene Sequence

Optimization Options

Off-Target Handling

General Filtering Parameters

Remaining Options

Probe Set Analysis

Output

Full Examples

FAQ

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance