An integrative pipeline for family-level super-pangenome analysis across coding and noncoding sequences.

These details have not been verified by PyPI

Project links

Homepage

Project description

panCG pipeline

Fig. 1. Overview of the panCG pipeline including panCNS, pangene, and CG modules.

(a) Workflow of the panCNS module. (1) Multiple genome alignment: Input multiple genomes are used for reference-free multiple genome alignment via Progressive Cactus. (2) CNS identification: Each genome is individually designated as the reference genome to generate whole-genome alignments. PhastCons is then employed to identify conserved sequences; conserved sequences overlapping with CDSs are filtered out, yielding CNS regions for each genome. (3) Homologous group identification: Homologous CNS groups are identified based on the aforementioned multiple genome alignments and pairwise CNS comparisons. (4) Synteny cluster construction: Undirected CNS networks are constructed based on syntenic relationships of CNSs between species; these are connected networks. Rectangular nodes represent CNSs, with different colors indicating different species. Edges represent homologous relationships: green for CMR, gray for synteny, and red for best-hit relationships. (5) Index assignment: Members of each synteny network are assigned a unique index. (6) CNS retrieval: For CNSs missing from the index, their CMR PhastCons scores are evaluated. Those with scores exceeding the threshold and no overlap with CDSs are added to the CNS index and labeled “recall-CNS”; CNSs with scores above the threshold but overlapping with CDS are labeled “recall-CDS”; and those with scores below the threshold are designated as “recall-nonCE”. (7) Index retrieval and reassignment: CNSs retrieved in the previous step are incorporated into the index, and best-hit information is used to reassign indices to singleton CNSs. Finally, each CNS has a unique index and a reference-free panCNS is obtained.

(b) Workflow of the pangene module. (1) Ortholog group identification: OrthoFinder is used to identify homolog groups. Circles represent genes, with colors distinguishing homologous genes from different species. Genes in gray ovals belong to the same gene group. (2) CPM clustering: Synteny networks are constructed for genes in each group, and genes are further clustered using the clique percolation method (CPM). Nodes represent genes, and gray edges represent syntenic relationships between genes; the set of genes enclosed by the dashed line denotes a gene cluster identified via CPM. (3) Network expansion: For genes lacking synteny, best-hit information is used to extend the gene synteny network. Red edges indicate homologous best-hit gene pairs. (4) Index assignment: A unique gene index is assigned to genes in each cluster. (5) Tree based reassignment: For gene indices containing paralogous genes, the phylogenetic relationships between genes are considered to further refine index assignments. Finally, each gene has a unique index, and a reference-free pangene is obtained.

(c) Workflow of the CG module. (1) CNS-gene colocalization analysis: CNSs located in the upstream and downstream regions of each gene are extracted to form a CNS set. Based on CNS-gene colocalization patterns across species, we define Conserved Gene and Noncoding sequence Modules (CGNMs) as sets of co-localized CNSs and genes within the same index that are conserved in at least two species. Closely spaced CGNMs are further grouped into Conserved Gene and Noncoding Blocks (CGNBs). (2) Synteny network construction: CNS and gene sets corresponding to each gene index are independently derived from the panCNS and pangene modules, and used to construct CNS synteny networks and gene synteny networks, respectively. (3) Gene-CNS network construction: A unified network for panCNSs and pangenes is generated by merging CNS and gene synteny networks, which captures both collinearity and potential regulatory relationships among all genes and CNSs.

Dependencies

halLiftover in cactus
phast
JCVI
UCSC: mafFilter, mafSplit, wigToBigWig

wget https://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/mafFilter
wget https://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/mafSplit
wget https://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/wigToBigWig

install

Make sure the above dependencies are installed and added to PATH.

pip install panCG
panCG -h

usage

usage: panCG [-h] [--version]  ...

    an integrative pipeline for family-level super-pangenome analysis across coding and noncoding sequences.

optional arguments:
  -h, --help     show this help message and exit
  --version      show program's version number and exit

Commands:

    callCns      Identification of CNS
    pangene      build gene index
    pancns       build CNS index
    GenePavAsso  Associating gene-PAVs with phenotypes between species
    GLSS         Identification of Gene lineage-specific Synteny networks
    CLSS         Identification of CNS lineage-specific Synteny networks
    CnsGeneLink  According to the relative position relationship between CNS and gene and the maximum number of species supported by CNS index and gene index, CNS index and gene index are linked.
    CnsSyntenyNet
                 Used to construct SyntenyNet for filtered pan-CNS

Input file format requirements

The chromosome ID of the genome cannot contain special characters such as ":", "-", ",", etc., and no other characters except numbers, letters and "_".
In the gff annotation file, it is best to only have gene, mRNA, exon, cds, and utr information. And gene must contain the ID field, and others must contain the Parent field.
The bed file of gene must be a standard 6-column bed file. <chrID> <start> <end> <geneID> <score/0> <chain>.

Output

cns calling

Directory	File suffix	Describe
{Workdir}/03-phastCons/	{species}.all.bw	PhastCons Conservative Scoring File
{Workdir}/03-phastCons/	{species}.CNSs.bed	CNS file of {species}

panCNS

Directory	File suffix	Describe
{Workdir}/Ref_{ref}_	.panGene.final.csv	The output panCNS file, each line represents an index

pangene

Directory	File suffix	Describe
{Workdir}/Ref_{ref}_IndexDir	.panGene.csv	The result pangene

The Group column is the homology group identified by orthofinder.

Group column	Describe
OGxxxxxxx.x	Indicates the gene index subdivided in the homology group
OGxxxxxxx.x.Un	The .Un suffix indicates a set of genes that still exist independently in a single species after CPM.
OGxxxxxxx.x.tree_x	Indicates the gene index subdivided by gene evolution relationship based on the gene index
OGxxxxxxx.x.tree_Un	The gene set ending with .tree_Un is a gene set that is not classified using evolutionary relationships.
UnMapOGXXXXXXX.x	UnMap prefix is the gene that orthofinder has no clustering

quick start

We provide example data for testing, which can be downloaded at figshare.

cactus

nohup /usr/bin/time -v cactus jobstore species.22way.info.txt Citrus.7ways.test_data.hal \
   --realTimeLogging True \
   --workDir /home/xxx/cactus_dir \
   --maxCores 16 --maxMemory 100G --maxDisk 200G > Citrus.7ways.cactus.log 2>&1 &
   
nohup /usr/bin/time -v cactus-hal2maf jobstore Citrus.7ways.test_data.hal C_sinensis.7ways.maf \
    --refGenome C_sinensis \
    --chunkSize 10000000 \
    --noAncestors \
    --dupeMode single \
    --workDir /home/xxx/cactus_dir > C_sinensis.hal2maf.single.log 2>&1 &

call CNS

for i in C_sinensis C_limon ponkan C_australasica C_glauca F_hindsii A_buxifolia
do
    /usr/bin/time -v panCG callCns \
        -c /home/ltan/Tmp/01-PanCNSGene_test_data/panCG/Example/CNScalling.config.yaml \
        -w /home/ltan/Tmp/01-PanCNSGene_test_data/01-callcns/${i} \
        -r ${i} > ${i}.callCns.log 2>&1
done

pangene

nohup /usr/bin/time -v panCG pangene \
    -c /home/ltan/Tmp/01-PanCNSGene_test_data/panCG/Example/panCG.config.yaml \
    -w /home/ltan/Tmp/01-PanCNSGene_test_data/02-pangene \
    -r C_sinensis > pangene.log 2>&1 &

panCNS

nohup /usr/bin/time -v panCG pancns \
    -c /home/ltan/Tmp/01-PanCNSGene_test_data/panCG/Example/panCG.config.yaml \
    -w /home/ltan/Tmp/01-PanCNSGene_test_data/03-pancns \
    -r C_sinensis \
    -W /home/ltan/Tmp/01-PanCNSGene_test_data/02-pangene \
    > pancns.log 2>&1 &

Citation

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.2

Sep 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pancg-1.0.2.tar.gz (72.5 kB view details)

Uploaded Sep 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

panCG-1.0.2-py3-none-any.whl (85.2 kB view details)

Uploaded Sep 16, 2025 Python 3

File details

Details for the file pancg-1.0.2.tar.gz.

File metadata

Download URL: pancg-1.0.2.tar.gz
Upload date: Sep 16, 2025
Size: 72.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for pancg-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`855c184309e96affc526c2aa09350a07e07fd14f94f97a7c3d79fc9634144f96`
MD5	`32844a848e46013f9d12bb2a1e44e3b3`
BLAKE2b-256	`93486f4dab90aaefb8cf8a48aa628e212b27b906ad30ca9788c5405e4369cbd2`

See more details on using hashes here.

File details

Details for the file panCG-1.0.2-py3-none-any.whl.

File metadata

Download URL: panCG-1.0.2-py3-none-any.whl
Upload date: Sep 16, 2025
Size: 85.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for panCG-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5894b1d0dbf872718a6da8be37ab4205f8f5d9d6da0745b7626fb432313834d6`
MD5	`cbdd57924970ee5ea24d3a4cfdb5df01`
BLAKE2b-256	`18804111041e646b8dc8131f265c41902c0ebbebebecb37c977361addefe0367`

See more details on using hashes here.

panCG 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

panCG pipeline

Dependencies

install

usage

Input file format requirements

Output

cns calling

panCNS

pangene

quick start

cactus

call CNS

pangene

panCNS

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes