Skip to main content

Python wrapper for SauersML/convert_genome (DTC → VCF/BCF/PLINK conversion).

Project description

convert_genome (Python)

Python wrapper for the SauersML/convert_genome CLI. Convert direct-to-consumer dumps (23andMe, AncestryDNA, MyHeritage, deCODEme) and standard VCF/BCF into compliant VCF, BCF, or PLINK 1.9 binary — with build detection, sex inference, liftover, and panel harmonisation, all controllable from kwargs.

from convert_genome import convert, OutputFormat

result = convert(
    input="23andme.txt",
    output="out.vcf",
    format=OutputFormat.VCF,
    assembly="hg38",
    standardize=True,
)

result.statistics.emitted_records       # int
result.sample.sex_inferred              # bool
result.build_detection.detected_build   # 'GRCh37' / 'GRCh38' / ...
result.report_path                      # path to <stem>_report.json
result.output_paths                     # files that actually exist on disk
result.yield_rate                       # emitted / total

The wrapper runs the Rust binary, parses the sidecar <stem>_report.json into typed frozen dataclasses, and returns a single ConversionResult.

Install

pip install convert_genome
# the Rust binary:
cargo install convert_genome

Binary located via binary= or PATH. No env-var indirection — if the binary isn't on PATH, pass binary= explicitly. Missing binary → ConvertGenomeBinaryNotFound with the suggested install command.

Shortcuts: skip every auto-discovery step

The CLI will download/auto-detect things it doesn't need to. Pass them in directly:

convert(
    input="raw.txt",
    output="out.vcf",
    reference="/cache/hg38.fa",         # skip FASTA download
    reference_fai="/cache/hg38.fa.fai", # skip .fai indexing
    input_build="hg19",                  # skip build detection
    assembly="GRCh38",                   # target build (still does liftover)
    panel="/cache/1kg_panel.vcf",        # supply harmonisation panel
    sex="female",                        # skip sex inference
    standardize=True,
)

sex is lenient: passing "unknown" or "indeterminate" (e.g. when chaining out of infer_sex) silently omits the --sex flag and lets the CLI run its own inference.

Builder

Converter is a frozen dataclass; every with_* returns a new instance, so branching is safe.

from convert_genome import Converter, Sex, OutputFormat

plan = (
    Converter(input="raw.txt", output_dir="out/", format=OutputFormat.PLINK)
        .with_assembly("GRCh38")
        .with_reference("/cache/hg38.fa", "/cache/hg38.fa.fai")
        .with_panel("/data/1kg_panel.vcf.gz")
        .with_standardize()
        .with_sex(Sex.MALE)
)

print(plan.argv())   # exact argv that would be passed to the CLI
result = plan.run()

Enums

InputFormat.AUTO / .DTC / .VCF / .BCF
OutputFormat.VCF / .BCF / .PLINK
Sex.MALE / .FEMALE
Assembly.GRCH37 / .GRCH38     # plus a `.parse()` classmethod that
                              # accepts 'hg19' / 'hg38' / 'build38' / ...

Output

The Rust tool writes <stem>_report.json alongside the main output. The wrapper loads it into ConversionResult, with sub-dataclasses for each section:

result.input         # InputInfo (path, format, origin)
result.output        # OutputInfo (path, format)
result.reference     # ReferenceInfo (path, origin, assembly)
result.panel         # PanelInfo | None
result.sample        # SampleInfo (id, sex, sex_inferred)
result.build_detection  # BuildDetection | None (detected_build, match rates)
result.statistics    # Statistics (total / emitted / variant / ... records)
result.report_path   # path to the JSON sidecar
result.output_paths  # tuple[Path] — files that actually exist on disk

For PLINK output, output_paths includes the .bed/.bim/.fam trio. For output_dir with a panel, it includes panel.vcf. Non-existent paths are filtered out automatically.

Errors

  • ConvertGenomeBinaryNotFound — CLI not installed / not on PATH.
  • InvalidConfig — argument combination rejected before launching (e.g. missing input file, conflicting output/output_dir).
  • ConvertGenomeFailed — CLI exited non-zero. The exception carries stdout, stderr, returncode.
  • ReportNotFound — CLI ran clean but didn't write a JSON sidecar.

All subclass ConvertGenomeError.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convert_genome-0.2.0.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

convert_genome-0.2.0-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file convert_genome-0.2.0.tar.gz.

File metadata

  • Download URL: convert_genome-0.2.0.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for convert_genome-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9709d5c6a7b3731bff52472f270f5218eb84e7c4a35224d055bdc1e338ec2d9d
MD5 133ac1d13b340bb3732477d0647c2704
BLAKE2b-256 e9fda049b0f15b25cfe14363625fc218defafbbaebb8f6ebe5270878ab98bd7c

See more details on using hashes here.

File details

Details for the file convert_genome-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: convert_genome-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 22.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for convert_genome-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b4cc1051cc3478784de358bc57fd84457d2da5853cff64154572b1e11ff63ed1
MD5 76d23a18714214e7ed521ad4e273b719
BLAKE2b-256 33fe4d75a75710f748fb9b3e2a4903373c406f8b21a85998cd51c2ae0ae8deba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page