Skip to main content

No project description provided

Project description

Pango-collapse

tests

CLI to collapse Pango linages for reporting

Install

Install from pypi with pip.

pip install pango-collapse

Usage

pango-collapse takes a CSV file of SARS-CoV-2 samples (input.csv) with a column (default Lineage) indicating the pango lineage of the samples (e.g. output from pangoLEARN, nextclade, USHER, etc).

# input.csv
Lineage
BA.5.2.1
BA.4.6
BE.1

pango-collapse will collapse lineages up to the first user defined parent lineage (specified in a text file with --collapse-file). If the sample lineage has no parent lineage in the user defined collapse file the compressed lineage will be returned. Collapse up to either A or B by adding A and B to the collapse file. By default (i.e. if no collapse file is specified) pango-collapse uses the collapse file found here. This file is dependant on the version of pango-collapse, use --latest to load the latest version of the collapse file from github at run time.

# collapse.txt
BA.5
BE.1

pango-collapse will produce an output file which is a copy of the input file plus Lineage_full (the uncompressed lineage) and Lineage_family (the lineage compressed up to) columns.

pango-collapse input.csv --collapse-file collapse.txt -o output.csv 
# output.csv 
Lineage,Lineage_full,Lineage_family
BA.5.2.1,B.1.1.529.5.2.1,BA.5
BA.4.6,B.1.1.529.4.6,BA.4.6
BE.1,B.1.1.529.5.3.1.1,BE.1

Nextclade example

This example shows how to use some of the pango-collapse features by collapsing the Pango Lineages in the output from Nextclade.

Produce a nextclade.tsv file from a nextclade analysis (there is an example file in tests/data).

We are only interested in the major sub-lineages of omicron i.e. BA.1-BA.5. We can therefor make a collapse file with the following:

# omicron
BA.1
BA.2
BA.3
BA.4
BA.5

Note: BA is an alias of B.1.1.529, however, as we have not included B.1.1.529 in our collapse file any samples designated B.1.1.529 will not be compressed.

Run the following command to collapse the omicron sub-lineages:

pango-collapse nextclade.tsv -o nextclade_collapsed_omicron.tsv -l Nextclade_pango --strict 

The -l (--lineage-column) flag tells pango-collapse to look for the compressed linage in the Nextclade_pango column in the nextclade.tsv file.

The --strict tells pango-collapse to use strict mode i.e. only report lineages in the collapse file. If the lineage cannot be collapsed then no value is returned in the collapse column.

We can visualise the results in pandas:

import pandas as pd
df = pd.read_csv("nextclade_output.tsv", sep="\t")
df.Lineage_family.fillna('Other', inplace=True)
df.Lineage_family.value_counts().plot(kind='bar')

--help

                                                                       
 Usage: pango-collapse [OPTIONS] INPUT                                  
                                                                        
 Collapse Pango sublineages up to user defined parent lineages.         
                                                                        
╭─ Arguments ──────────────────────────────────────────────────────────╮
│ *    input      FILE  Path to input CSV/TSV with Lineage column.     │
│                       [default: None]                                │
│                       [required]                                     │
╰──────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────╮
│ *  --output              -o      FILE  Path to output CSV/TSV with   │
│                                        Lineage column.               │
│                                        [default: None]               │
│                                        [required]                    │
│    --collapse-file       -c      PATH  Path to collapse file with    │
│                                        lineages (one per line) to    │
│                                        collapse up to. Defaults to   │
│                                        collapse file shipped with    │
│                                        this version of               │
│                                        pango-collapse.               │
│                                        [default:                     │
│                                        /Users/wwirth/Library/CloudS… │
│    --lineage-column      -l      TEXT  Column to extract from input  │
│                                        file for lineage.             │
│                                        [default: Lineage]            │
│    --full-column         -f      TEXT  Column to use for the         │
│                                        uncompressed output.          │
│                                        [default: Lineage_full]       │
│    --collapse-column     -k      TEXT  Column to use for the         │
│                                        collapsed output.             │
│                                        [default: Lineage_family]     │
│    --alias-file          -a      PATH  Path to Pango Alias file for  │
│                                        pango_aliasor. Will download  │
│                                        latest file if not supplied.  │
│                                        [default: None]               │
│    --strict              -s            If a lineage is not in the    │
│                                        collapse file return None     │
│                                        instead of the uncompressed   │
│                                        lineage.                      │
│    --latest              -u            Use the latest collapse file  │
│                                        from github.                  │
│    --version             -v            Print the current version     │
│                                        number and exit.              │
│    --install-completion                Install completion for the    │
│                                        current shell.                │
│    --show-completion                   Show completion for the       │
│                                        current shell, to copy it or  │
│                                        customize the installation.   │
│    --help                -h            Show this message and exit.   │
╰──────────────────────────────────────────────────────────────────────╯

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pango-collapse-0.4.6.tar.gz (5.8 kB view hashes)

Uploaded Source

Built Distribution

pango_collapse-0.4.6-py3-none-any.whl (6.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page