Skip to main content

Cis-regulatory Element Genome Scanner - whole-genome cis-element discovery, expression coupling, and KEGG enrichment in one PyQt5 + CLI pipeline.

Project description

Cis-GS banner

Cis-GS  ·  Cis-regulatory Element Genome Scanner

A whole-genome pipeline for discovering cis-regulatory elements, coupling them to expression, and finishing with KEGG enrichment — in one PyQt5 desktop app and one interactive CLI.

PyPI version Python License: MIT Docs Build DOI


Table of Contents


What Cis-GS Does

Cis-GS automates the full promoter → motif → expression → function journey that plant- and animal-genomics labs run by hand today:

  1. Fetch a reference genome + annotation directly from NCBI (live Assembly search).
  2. Extract promoter sequences (configurable length, strand-aware, intergenic-clipped) from any GFF3.
  3. Scan those promoters for transcription-factor binding motifs imported from PlantTFDB, AnimalTFDB, JASPAR 2024, or HOCOMOCO v11 — or any user-supplied IUPAC consensus.
  4. Render publication-ready sequence logos and per-gene hit tables with hypergeometric p-values and BH-FDR.
  5. Couple the hits to your expression table (RNA-seq, microarray, qPCR) to flag motifs whose presence tracks expression direction.
  6. Build a co-expression network (Pearson / Spearman / WGCNA-style soft-thresholding), detect modules via Louvain or hierarchical clustering, and visualise eigengenes.
  7. Enrich the top module / cluster against KEGG (live REST queries, 11 700+ organisms) with one-sided hypergeometric ORA + Benjamini-Hochberg FDR.

Everything runs locally, offline-friendly after the first network fetch, and exports CSV / SVG / PDF at every step.


Highlights of v1.1

  • Live KEGG dropdown — every one of the 11 700+ organisms KEGG knows about, fetched on demand. No more stale species tables.
  • Live NCBI Taxonomy search — type any common or Latin name; results stream back as you type.
  • 60× faster ID conversion — MyGene.info batched POST + progress bar (previously 60+ minutes for 10 k genes; now ~60 s).
  • Interactive CLI wizardscis-gs wizard walks you through every step with arrow-key menus. Every subcommand also accepts -i / --interactive.
  • Fuzzy "did you mean...?" for CLI typos.
  • Brand-icon Contact tab with real-website logos (LinkedIn, GitHub, KEGG, NCBI, PlantTFDB, AnimalTFDB, MyGene).
  • Modern single-color theme (teal #16A085) with instant light / dark toggle — no more 1-2 s freeze.
  • First-run NCBI email prompt — required by the Entrez API, stored only on your machine.
  • Three Gene-ID-Mapping methods for the annoying NCBI-LOC vs species-database mismatch (column swap, mapping CSV, GFF3 Dbxref expansion).

See the full release notes for the v1.0 → v1.1 diff.


Installation

Option 1 — PyPI (Linux / macOS / Windows)

pip install cis-gs
cis-gs --help          # CLI
cis-gs-gui             # GUI

Python 3.9+ required. The first GUI launch will pop up a one-time NCBI email prompt.

Option 2 — Standalone Windows executable

Download Cis-GS.exe from the latest release page. Double-click. No Python install needed. Roughly 120 MB.

Option 3 — From source

git clone https://github.com/Ayushmania2002/Cis-GS.git
cd Cis-GS
pip install -e ".[dev,docs]"
python app_v4_open.py        # GUI
python -m cis_gs --help      # CLI

Full build details (PyInstaller spec, build scripts for all 3 OSes, PyPI release workflow): see BUILD.md.


Quick Start

GUI (one minute)

cis-gs-gui
  1. Step 1 — Promoters: drop a FASTA + GFF3, set promoter length (default 2 kb), click Extract.
  2. Step 2 — Motif Search: click Import from PlantTFDB (or AnimalTFDB), pick your species, tick the motifs you want, Import Selected.
  3. Step 7 — KEGG Enrichment: pick a KEGG organism from the live dropdown, paste your gene list, run.

Done. CSVs and SVGs land in ~/CisGS-Workspace/.

CLI (interactive wizard)

cis-gs wizard

The wizard auto-detects what you've already produced and offers the next sensible step.

CLI (one-liners)

# Extract 2 kb promoters from a GFF3 + FASTA
cis-gs extract --fasta genome.fa --gff annot.gff3 --upstream 2000 --out promoters.fa

# Scan promoters with a MEME motif file
cis-gs search --promoters promoters.fa --motifs motifs.meme --out hits.csv

# KEGG enrichment
cis-gs enrich-kegg --organism ath --genes top_module.txt --out kegg.csv

Every command supports -i / --interactive if you want to be walked through it.


The 7-Step Workflow

Step What it does Output
1. Promoters Strand-aware promoter extraction from any FASTA + GFF3 promoters.fa
2. Motif Search IUPAC / MEME / PlantTFDB / AnimalTFDB scanning with hypergeometric p-values + BH-FDR hits.csv, significance summary
3. Motif Logos logomaker sequence logos with information-content shading per-motif SVG / PNG
4. Expression Feeding Joins hits with an expression CSV via three Gene-ID-Mapping methods (LOC swap, mapping CSV, GFF3 Dbxref expansion) expression_matched.csv
5. Coexpression Pearson / Spearman / WGCNA-style soft-thresholding, Louvain / hierarchical module detection network.gexf, eigengene plot
6. K-means Elbow + silhouette, deterministic seeding, exportable per-cluster gene lists clusters/*.txt
7. KEGG Enrichment Live REST query against any of 11 700+ KEGG organisms, hypergeometric ORA, BH-FDR, fold-enrichment kegg_enrichment.csv

A full description of each step's algorithm and parameters lives in the online documentation.


Supported Motif Databases

Database Coverage Access
PlantTFDB v5 157 plant species, ~6 000 motifs Built-in importer with live species list
AnimalTFDB v4 Human, mouse, zebrafish, insects Built-in importer
JASPAR 2024 (non-redundant) 575 vertebrate + 99 insect motifs Direct REST download
HOCOMOCO v11 ~700 human + ~400 mouse ChIP-Seq motifs Direct REST download
Custom IUPAC / MEME Anything you can write down Paste into Step 2

CLI Reference

cis-gs --help

usage: cis-gs [-h] {wizard,fetch,extract,search,feed,coexpr,kmeans,enrich-kegg,id-convert} ...

  wizard         Step-by-step wizard (recommended for new users)
  fetch          Download a genome + annotation from NCBI
  extract        Extract promoter sequences from FASTA + GFF3
  search         Scan promoters for motif occurrences
  feed           Couple motif hits with an expression table
  coexpr         Build a co-expression network
  kmeans         K-means clustering with elbow / silhouette
  enrich-kegg    KEGG over-representation analysis
  id-convert     Convert gene IDs across namespaces (MyGene.info, batched)

Every subcommand accepts -i / --interactive for a guided run, and --help for full flags.


Programmatic API

from cis_gs.enrichment import KEGGEnricher

e = KEGGEnricher(organism="ath")          # Arabidopsis
result = e.enrich(["AT1G01010", "AT2G18790", "AT3G09600"])
print(result.table.head())
from cis_gs.enrichment.idmap import IDConverter

idc = IDConverter(species="human")
mapping = idc.convert(["TP53", "BRCA1", "MYC"], target="entrez")

Full API reference: Ayushmania2002.github.io/Cis-GS/api.


Screenshots

Step 1: Promoter extraction Step 2: Motif search
Step 5: Coexpression network Step 7: KEGG enrichment

Screenshots are placeholder paths; replace with actual PNGs in docs/source/_static/ before publishing.


Troubleshooting

Symptom Likely cause Fix
cis-gs-gui: command not found after pip install Scripts dir not on PATH python -m cis_gs works, or add pip --user bin dir to PATH
First NCBI Fetch returns 0 results NCBI email not set Settings → Set NCBI Email, then retry
KEGG REST unreachable Firewall or VPN Set HTTPS_PROXY env var, or use the Browse & Import tab with a manually downloaded MEME
Motif hits CSV has empty gene_symbol column Annotation GFF3 not loaded in Step 2 Re-run with the same GFF3 from Step 1 in Gene ID Resolution
Coexpression freezes on > 30k genes All-vs-all correlation is O(n²) Pre-filter to expressed genes (TPM > 1) before Step 5

Open an issue with the log file from ~/CisGS-Workspace/cisgs.log if you hit anything else.


Contributing

Bug reports, feature requests, and pull requests are welcome. For substantial contributions please open an issue first to discuss the change.

git clone https://github.com/Ayushmania2002/Cis-GS.git
cd Cis-GS
pip install -e ".[dev]"
pytest                       # run the test suite

Citation

If Cis-GS contributes to a publication, please cite:

Mallick A. Cis-GS: a unified pipeline for whole-genome cis-regulatory element discovery, expression coupling, and KEGG enrichment. (manuscript in preparation, Plant Signaling Lab, IISER Tirupati, 2026).

BibTeX:

@software{mallick_cisgs_2026,
  author  = {Mallick, Ayushman},
  title   = {{Cis-GS}: Cis-regulatory Element Genome Scanner},
  year    = {2026},
  url     = {https://github.com/Ayushmania2002/Cis-GS},
  version = {1.1.0}
}

A CITATION.cff is included for GitHub's automatic citation widget.


License

Released under the MIT License. Free for academic and commercial use.


Contact

Ayushman Mallick · ayushmania2002@gmail.com Plant Signaling Lab, IISER Tirupati

© 2026 Ayushman Mallick · Plant Signaling Lab · Cis-GS

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cis_gs-1.1.0.tar.gz (6.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cis_gs-1.1.0-py3-none-any.whl (6.9 MB view details)

Uploaded Python 3

File details

Details for the file cis_gs-1.1.0.tar.gz.

File metadata

  • Download URL: cis_gs-1.1.0.tar.gz
  • Upload date:
  • Size: 6.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cis_gs-1.1.0.tar.gz
Algorithm Hash digest
SHA256 06252808d8aab3e6b2e79e6cabd20fe58e8c29ab49b3066095c57c7373d18b73
MD5 6815bc72ad4b330042aa5c7a09528217
BLAKE2b-256 1e487553e4cf9ba12920d6286e71a605b70b7b36c784069a08b2d87fb8a807db

See more details on using hashes here.

Provenance

The following attestation bundles were made for cis_gs-1.1.0.tar.gz:

Publisher: publish.yml on Ayushmania2002/Cis-GS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cis_gs-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: cis_gs-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cis_gs-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fda80f8991d8cd6a4867ec5d7143cc18b59b8b97b4dabafb741f6e98d11586e4
MD5 28197d71b97ffddb4bdeacbb0e0d5929
BLAKE2b-256 adc5ba6add0caa6f45299fd3ea9d6f1387a4e14411d21a007e638e3b7f9af42d

See more details on using hashes here.

Provenance

The following attestation bundles were made for cis_gs-1.1.0-py3-none-any.whl:

Publisher: publish.yml on Ayushmania2002/Cis-GS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page