pav3 · PyPI

Assembly-based variant discovery

These details have not been verified by PyPI

Project description

Phased Assembly Variant Caller

Variant caller for assembled genomes.

Development release (Please read)

PAV 3 is currently a development release and may contain bugs. Please report problems you encounter.

PAV now uses Polars for fast data manipulation. If your job fails with strange Polars errors, such as "capacity overflow" or "PanicException", this is likely from Polars running out of memory.

Notes for early adopters

If you have been using development versions, please read about these changes.

Call subcommand is deprecated. If you have been running pav3 with the "call" subcommand, switch it to "batch" (i.e. ("pav3 batch ...")). A future version will use "call" for single assemblies (with multiple haplotypes) defined on the command-line and "batch" to run all assemblies from an assembly table.

Moving configuration to pav.json. The configuration file "config.json" is being moved to "pav.json". A future version will ignore "config.json".

Install and run

Install

pip install pav3

Run

To run PAV, use the pav3 batch command after setting up configuration files (see below). A future version will add pav call for single-assemblies without requiring configuration files.

Some Python environments may require you to run pav3 through the python command:

python -m pav3 batch

Dependencies

Currently, PAV needs minimap2 in the environment where it is run. This may change in future releases. All other dependencies are handled by the installer.

Output

PAV will output a VCF file for each sample called NAME.vcf.gz.

results/NAME/call_hap: Unmerged variant calls.
- Includes tables of callable regions in reference ("callable_ref") and query ("callable_qry") coordinates.
results/NAME/call: Variant calls merged across haplotypes.

All tables are in parquet format.

Configuring PAV for batch runs

To run assemblies in batches ("pav3 batch" command), PAV reads two configuration files:

pav.json: Points to the reference genome and can be used to set optional parameters.
assemblies.tsv: A table of input assemblies.

Base config: pav.json

A JSON configuration file, pav.json, configures PAV. Default options are built-in, and the only required option is reference pointing to a reference FASTA file.

Example:

{
  "reference": "/path/to/hg38.no_alt.fa.gz"
}

Assembly table

The assembly table points PAV to input assemblies. It may be in TSV, CSV, Excel, or parquet formats (TSV and CSV may optionally be gzipped). Each assembly has one row in the table.

Columns:

NAME: Assembly or sample name.
HAP_*: One column for each assembled haplotype.

Name column

The NAME column contains the assembly name (or sample name). This column must be present and each row must have a unique value.

Haplotype columns

PAV accepts one or more assembled haplotypes per assembly, each with a column in the table starting with "HAP_". Each is a path to an input file for one assembly haplotype.

Common column names are "HAP_h1" for haplotype "h1" and "HAP_h2" for haplotype "h2". For some assemblies with known parental origins, "HAP_mat" and "HAP_pat" are commonly used.

There must be at least one haplotype per assembly, and PAV has no limits on the number of haplotypes (i.e. 3 or more are acceptable).

Not all assemblies need to have the same haplotypes. PAV will ignore empty the "HAP_" values for each assembly. For example, if some assemblies have an "unphased" haplotype and other do not, include "HAP_unphased" and leave it blank for assemblies that do not have it.

Note that genotypes in the VCF file will have one allele for each haplotype defined for the assembly. For an assembly with haplotypes "h1", "h2", and "unphased", three genotype alleles will be possible (e.g. "1|0|." for a heterozygous variant present in "h1", not present in "h2", and uncallable in "unphased"). The order of genotypes is determined by the order of haplotype columns in the assembly table.

Each "HAP_" column contains paths to input files in FASTA, FASTQ, GFA, or FOFN format. FOFN may contain paths to these same file types including other FOFNs (recursive FOFNs are not recommended, but PAV will detect cycles). Multiple files can be input by separating them by semi-colons (i.e. "path/to/file1.fasta.gz;path/to/file2.fasta.gz") and a mix of types is possible. PAV will be fastest if the input is a single bgzipped and indexed FASTA file, it will build its own FASTA file for all other cases.

Configuration column for global overrides

An optional "CONFIG" column can override global configuration parameters per assembly. Global configuration parameters are defined in pav.json or are PAV default values if not defined. Values in this column are semicolon-separated lists of key-value pairs (i.e. "key1=val1;key2=val2"). The "reference" parameter cannot be overridden per assembly.

A note on references

Do not use references with ALT, PATCH, or DECOY scaffolds for PAV, or generally, any assembly-based or long-read variant calling tool. Reference redundancy may increase callset errors.

The GRCh38 HGSVC no-ALT reference for long reads can be found here: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC2/technical/reference/20200513_hg38_NoALT/

The T2T-CHM13v2.0 (hs1 on UCSC) is suitable without alteration. Custom per-sample assemblies containing a single-haplotype or an unphased ("squashed") assembly typically also make a suitable reference as long as they are free of large structural misassemblies and especially large false duplications.

PAV versions

PAV uses Python package versioning with three fields:

Major: Major changes or new features.
Minor: Small changes, but may affect PAV's API or command-line interfaces.
Patch: Small changes and minor new features. Patch versions do break API or command-line compatibility, but may add minor features or options to the API that were not previously supported.

PAV follows Python's packaging versioning scheme (https://packaging.python.org/en/latest/discussions/versioning/).

PAV may use pre-release versions with a suffix for development releases (".devN"), alpha ("aN"), beta ("bN"), or release-candidate ("rcN") where "N" is an integer greater than 0. For example, "3.0.0.dev1" is a development version, and "3.0.0a1" is an early alpha version, and "3.0.0rc1" is a release candidate, all of which precede the "3.0.0" release and should not be considered production-ready.

Cite PAV

PAV 3 does not yet have a citation. For now, use the citation for previous PAV versions, but check back for updates.

Ebert et al., “Haplotype-Resolved Diverse Human Genomes and Integrated Analysis of Structural Variation,” Science, February 25, 2021, eabf7117, https://doi.org/10.1126/science.abf7117 (PMID: 33632895).

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

3.0.0.dev22 pre-release

May 6, 2026

3.0.0.dev21 pre-release

Mar 30, 2026

3.0.0.dev20 pre-release

Mar 23, 2026

3.0.0.dev19 pre-release

Mar 19, 2026

3.0.0.dev18 pre-release

Feb 18, 2026

3.0.0.dev17 pre-release

Feb 10, 2026

This version

3.0.0.dev16 pre-release yanked

Feb 9, 2026

3.0.0.dev15 pre-release yanked

Feb 8, 2026

3.0.0.dev14 pre-release

Dec 8, 2025

3.0.0.dev13 pre-release

Dec 5, 2025

3.0.0.dev12 pre-release

Nov 27, 2025

3.0.0.dev11 pre-release

Nov 26, 2025

3.0.0.dev9 pre-release

Nov 20, 2025

3.0.0.dev8 pre-release

Nov 6, 2025

3.0.0.dev6 pre-release

Oct 21, 2025

3.0.0.dev5 pre-release

Oct 21, 2025

3.0.0.dev4 pre-release

Oct 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pav3-3.0.0.dev16.tar.gz (3.0 MB view details)

Uploaded Feb 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pav3-3.0.0.dev16-py3-none-any.whl (3.0 MB view details)

Uploaded Feb 9, 2026 Python 3

File details

Details for the file pav3-3.0.0.dev16.tar.gz.

File metadata

Download URL: pav3-3.0.0.dev16.tar.gz
Upload date: Feb 9, 2026
Size: 3.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for pav3-3.0.0.dev16.tar.gz
Algorithm	Hash digest
SHA256	`8cb2575247918e833295ed11216bf0f4fb9dd9677dc3e0f94fa4c6c33064532f`
MD5	`4bb62b4cf5a4553c979ff6861acb3146`
BLAKE2b-256	`b1f148e6be03f6d88c2fda87bf8dab42b529a0709837c679dab86bdf6ea96233`

See more details on using hashes here.

File details

Details for the file pav3-3.0.0.dev16-py3-none-any.whl.

File metadata

Download URL: pav3-3.0.0.dev16-py3-none-any.whl
Upload date: Feb 9, 2026
Size: 3.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for pav3-3.0.0.dev16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`88ce0774188b7f89d46e8ac4c7824d905044b2610437a17ffa051eb6df1e6bdd`
MD5	`a0dcc1b6be5d5cb65473413852551711`
BLAKE2b-256	`6118a554da08c265bbb3e84115c8abdee9649484ac7b23e6094e3013b7a8d0c1`

See more details on using hashes here.

pav3 3.0.0.dev16

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Development release (Please read)

Notes for early adopters

Install and run

Install

Run

Dependencies

Output

Configuring PAV for batch runs

Base config: pav.json

Assembly table

Name column

Haplotype columns

Configuration column for global overrides

A note on references

PAV versions

Cite PAV

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes