Skip to main content

Python wrapper for SauersML/convert_genome (DTC → VCF/BCF/PLINK conversion).

Project description

convert_genome (Python)

Python wrapper for the SauersML/convert_genome CLI. Convert direct-to-consumer dumps (23andMe, AncestryDNA, MyHeritage, deCODEme) and standard VCF/BCF into compliant VCF, BCF, or PLINK 1.9 binary — with build detection, sex inference, liftover, and panel harmonisation, all controllable from kwargs.

from convert_genome import convert, OutputFormat

result = convert(
    input="23andme.txt",
    output="out.vcf",
    format=OutputFormat.VCF,
    assembly="hg38",
    standardize=True,
)

result.statistics.emitted_records       # int
result.sample.sex_inferred              # bool
result.build_detection.detected_build   # 'GRCh37' / 'GRCh38' / ...
result.report_path                      # path to <stem>_report.json
result.output_paths                     # files that actually exist on disk
result.yield_rate                       # emitted / total

The wrapper runs the Rust binary, parses the sidecar <stem>_report.json into typed frozen dataclasses, and returns a single ConversionResult.

Install

pip install convert_genome
# the Rust binary:
cargo install convert_genome

Binary located via binary= or PATH. No env-var indirection — if the binary isn't on PATH, pass binary= explicitly. Missing binary → ConvertGenomeBinaryNotFound with the suggested install command.

Shortcuts: skip every auto-discovery step

The CLI will download/auto-detect things it doesn't need to. Pass them in directly:

convert(
    input="raw.txt",
    output="out.vcf",
    reference="/cache/hg38.fa",         # skip FASTA download
    reference_fai="/cache/hg38.fa.fai", # skip .fai indexing
    input_build="hg19",                  # skip build detection
    assembly="GRCh38",                   # target build (still does liftover)
    panel="/cache/1kg_panel.vcf",        # supply harmonisation panel
    sex="female",                        # skip sex inference
    standardize=True,
)

sex is lenient: passing "unknown" or "indeterminate" (e.g. when chaining out of infer_sex) silently omits the --sex flag and lets the CLI run its own inference.

Builder

Converter is a frozen dataclass; every with_* returns a new instance, so branching is safe.

from convert_genome import Converter, Sex, OutputFormat

plan = (
    Converter(input="raw.txt", output_dir="out/", format=OutputFormat.PLINK)
        .with_assembly("GRCh38")
        .with_reference("/cache/hg38.fa", "/cache/hg38.fa.fai")
        .with_panel("/data/1kg_panel.vcf.gz")
        .with_standardize()
        .with_sex(Sex.MALE)
)

print(plan.argv())   # exact argv that would be passed to the CLI
result = plan.run()

Enums

InputFormat.AUTO / .DTC / .VCF / .BCF
OutputFormat.VCF / .BCF / .PLINK
Sex.MALE / .FEMALE
Assembly.GRCH37 / .GRCH38     # plus a `.parse()` classmethod that
                              # accepts 'hg19' / 'hg38' / 'build38' / ...

Output

The Rust tool writes <stem>_report.json alongside the main output. The wrapper loads it into ConversionResult, with sub-dataclasses for each section:

result.input         # InputInfo (path, format, origin)
result.output        # OutputInfo (path, format)
result.reference     # ReferenceInfo (path, origin, assembly)
result.panel         # PanelInfo | None
result.sample        # SampleInfo (id, sex, sex_inferred)
result.build_detection  # BuildDetection | None (detected_build, match rates)
result.statistics    # Statistics (total / emitted / variant / ... records)
result.report_path   # path to the JSON sidecar
result.output_paths  # tuple[Path] — files that actually exist on disk

For PLINK output, output_paths includes the .bed/.bim/.fam trio. For output_dir with a panel, it includes panel.vcf. Non-existent paths are filtered out automatically.

Errors

  • ConvertGenomeBinaryNotFound — CLI not installed / not on PATH.
  • InvalidConfig — argument combination rejected before launching (e.g. missing input file, conflicting output/output_dir).
  • ConvertGenomeFailed — CLI exited non-zero. The exception carries stdout, stderr, returncode.
  • ReportNotFound — CLI ran clean but didn't write a JSON sidecar.

All subclass ConvertGenomeError.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convert_genome-0.1.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

convert_genome-0.1.0-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file convert_genome-0.1.0.tar.gz.

File metadata

  • Download URL: convert_genome-0.1.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for convert_genome-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4974e8aba1df0034a742d7aa2fb1d845b2ec53f69b2563d1d5cfa907ad19f67d
MD5 5b122cb13c56c54faddab5922c0e6e86
BLAKE2b-256 e717af0bcfe9915489be27cb2d4f5fe689cee1d145a827b1d6a1d2eeb3af614d

See more details on using hashes here.

File details

Details for the file convert_genome-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: convert_genome-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for convert_genome-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 44e8eb224944579e7c5fd646e246109ddff6b037418208a8d73561bf0f688cdb
MD5 c667127dc916b98952fb3cce1b707c21
BLAKE2b-256 241c4782a44b7d9b2cfd29b8cf7eadf9762d74a114a23808e9138cbcfc61b1f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page