Simulate genotype likelihoods from haplotype genotype matrix
Project description
simGL
simGL simulates genotype likelihoods (GLs) from haplotypic genotype matrices, given per-sample coverage and sequencing error rates. It is designed to work seamlessly with msprime and tskit pipelines, but accepts any NumPy haplotype matrix.
Installation
pip install simGL
Or from source:
git clone https://github.com/RacimoLab/simGL.git
cd simGL
pip install -e .
Quick example
import msprime
import numpy as np
import simGL
# 1. Simulate a tree sequence and extract the biallelic genotype matrix
ts = msprime.sim_ancestry(
samples=10, ploidy=2, sequence_length=100_000,
recombination_rate=1e-8, population_size=10_000, random_seed=1,
)
ts = msprime.sim_mutations(ts, rate=1e-4, random_seed=1)
gm_full = ts.genotype_matrix()
biallelic = gm_full.max(axis=1) == 1
gm = gm_full[biallelic] # shape (n_sites, n_haplotypes)
# 2. Get reference and alternative alleles
variants = list(ts.variants())
ref = np.array([v.alleles[0] for v in variants])[biallelic]
alt = np.array([v.alleles[1] for v in variants])[biallelic]
# 3. Simulate allele read counts
arc = simGL.sim_allelereadcounts(
gm, mean_depth=10., std_depth=2., e=0.01,
ploidy=2, seed=42, ref=ref, alt=alt,
)
# arc shape: (n_sites, n_individuals, 4) — A, C, G, T read counts
# 4. Compute genotype likelihoods
GL = simGL.allelereadcounts_to_GL(arc, e=0.01, ploidy=2)
# GL shape: (n_sites, n_individuals, 10) — all diploid ACGT genotypes
# 5. Subset to biallelic genotypes and write a VCF
Ra = simGL.ref_alt_to_index(ref, alt)
GL_sub = simGL.subset_GL(GL, Ra, ploidy=2)
pos = np.array([int(v.site.position) for v in variants])[biallelic] + 1
names = [f"ind{i}" for i in range(ts.num_individuals)]
simGL.GL_to_vcf(GL_sub, arc, ref, alt, pos, names, "output.vcf")
Documentation
Full documentation — installation, user guide, API reference, and theory — is available at https://simgl.readthedocs.io.
Citation
If you use simGL in your work, please cite the relevant methodological papers listed in the Citation page of the documentation.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simgl-0.2.0.tar.gz.
File metadata
- Download URL: simgl-0.2.0.tar.gz
- Upload date:
- Size: 611.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7079acd4bc1380a8a0c5e6e53556be9dc323201471d7f5a9717287bb5d447324
|
|
| MD5 |
2b408d1266453dc01274186e8a841d71
|
|
| BLAKE2b-256 |
50d9c51b0249de6766e3e59f485c2d53423a1cde571b2bb34731bc75c1c677ac
|
File details
Details for the file simgl-0.2.0-py3-none-any.whl.
File metadata
- Download URL: simgl-0.2.0-py3-none-any.whl
- Upload date:
- Size: 14.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ba0dd4b534765730298c0de6f27b959f3d6a8979182a41c97f0bbaa489e23c2
|
|
| MD5 |
bf1db9f2dfb5930530fc84d8011c248c
|
|
| BLAKE2b-256 |
ca73854d7570854e58dc3de97b8f96c228396125c52038c2c2a197063e538e0e
|