Skip to main content

Simulate genotype likelihoods from haplotype genotype matrix

Project description

simGL

simGL simulates genotype likelihoods (GLs) from haplotypic genotype matrices, given per-sample coverage and sequencing error rates. It is designed to work seamlessly with msprime and tskit pipelines, but accepts any NumPy haplotype matrix.

Installation

pip install simGL

Or from source:

git clone https://github.com/RacimoLab/simGL.git
cd simGL
pip install -e .

Quick example

import msprime
import numpy as np
import simGL

# 1. Simulate a tree sequence and extract the biallelic genotype matrix
ts = msprime.sim_ancestry(
    samples=10, ploidy=2, sequence_length=100_000,
    recombination_rate=1e-8, population_size=10_000, random_seed=1,
)
ts = msprime.sim_mutations(ts, rate=1e-4, random_seed=1)

gm_full  = ts.genotype_matrix()
biallelic = gm_full.max(axis=1) == 1
gm       = gm_full[biallelic]           # shape (n_sites, n_haplotypes)

# 2. Get reference and alternative alleles
variants = list(ts.variants())
ref = np.array([v.alleles[0] for v in variants])[biallelic]
alt = np.array([v.alleles[1] for v in variants])[biallelic]

# 3. Simulate allele read counts
arc = simGL.sim_allelereadcounts(
    gm, mean_depth=10., std_depth=2., e=0.01,
    ploidy=2, seed=42, ref=ref, alt=alt,
)
# arc shape: (n_sites, n_individuals, 4)  —  A, C, G, T read counts

# 4. Compute genotype likelihoods
GL = simGL.allelereadcounts_to_GL(arc, e=0.01, ploidy=2)
# GL shape: (n_sites, n_individuals, 10)  —  all diploid ACGT genotypes

# 5. Subset to biallelic genotypes and write a VCF
Ra     = simGL.ref_alt_to_index(ref, alt)
GL_sub = simGL.subset_GL(GL, Ra, ploidy=2)

pos   = np.array([int(v.site.position) for v in variants])[biallelic] + 1
names = [f"ind{i}" for i in range(ts.num_individuals)]
simGL.GL_to_vcf(GL_sub, arc, ref, alt, pos, names, "output.vcf")

Documentation

Full documentation — installation, user guide, API reference, and theory — is available at https://simgl.readthedocs.io.

Citation

If you use simGL in your work, please cite the relevant methodological papers listed in the Citation page of the documentation.

License

ISC

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simgl-0.2.0.tar.gz (611.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simgl-0.2.0-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file simgl-0.2.0.tar.gz.

File metadata

  • Download URL: simgl-0.2.0.tar.gz
  • Upload date:
  • Size: 611.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for simgl-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7079acd4bc1380a8a0c5e6e53556be9dc323201471d7f5a9717287bb5d447324
MD5 2b408d1266453dc01274186e8a841d71
BLAKE2b-256 50d9c51b0249de6766e3e59f485c2d53423a1cde571b2bb34731bc75c1c677ac

See more details on using hashes here.

File details

Details for the file simgl-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: simgl-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for simgl-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8ba0dd4b534765730298c0de6f27b959f3d6a8979182a41c97f0bbaa489e23c2
MD5 bf1db9f2dfb5930530fc84d8011c248c
BLAKE2b-256 ca73854d7570854e58dc3de97b8f96c228396125c52038c2c2a197063e538e0e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page