Skip to main content

Python bindings for SauersML/convert_genome (DTC → VCF/BCF/PLINK conversion).

Project description

convert_genome (Python)

Python wrapper for the SauersML/convert_genome CLI. Convert direct-to-consumer dumps (23andMe, AncestryDNA, MyHeritage, deCODEme) and standard VCF/BCF into compliant VCF, BCF, or PLINK 1.9 binary — with build detection, sex inference, liftover, and panel harmonisation, all controllable from kwargs.

from convert_genome import convert, OutputFormat

result = convert(
    input="23andme.txt",
    output="out.vcf",
    format=OutputFormat.VCF,
    assembly="hg38",
    standardize=True,
)

result.statistics.emitted_records       # int
result.sample.sex_inferred              # bool
result.build_detection.detected_build   # 'GRCh37' / 'GRCh38' / ...
result.report_path                      # path to <stem>_report.json
result.output_paths                     # files that actually exist on disk
result.yield_rate                       # emitted / total

The wrapper runs the Rust binary, parses the sidecar <stem>_report.json into typed frozen dataclasses, and returns a single ConversionResult.

Install

pip install convert_genome
# the Rust binary:
cargo install convert_genome

Binary located via binary= or PATH. No env-var indirection — if the binary isn't on PATH, pass binary= explicitly. Missing binary → ConvertGenomeBinaryNotFound with the suggested install command.

Shortcuts: skip every auto-discovery step

The CLI will download/auto-detect things it doesn't need to. Pass them in directly:

convert(
    input="raw.txt",
    output="out.vcf",
    reference="/cache/hg38.fa",         # skip FASTA download
    reference_fai="/cache/hg38.fa.fai", # skip .fai indexing
    input_build="hg19",                  # skip build detection
    assembly="GRCh38",                   # target build (still does liftover)
    panel="/cache/1kg_panel.vcf",        # supply harmonisation panel
    sex="female",                        # skip sex inference
    standardize=True,
)

sex is lenient: passing "unknown" or "indeterminate" (e.g. when chaining out of infer_sex) silently omits the --sex flag and lets the CLI run its own inference.

Builder

Converter is a frozen dataclass; every with_* returns a new instance, so branching is safe.

from convert_genome import Converter, Sex, OutputFormat

plan = (
    Converter(input="raw.txt", output_dir="out/", format=OutputFormat.PLINK)
        .with_assembly("GRCh38")
        .with_reference("/cache/hg38.fa", "/cache/hg38.fa.fai")
        .with_panel("/data/1kg_panel.vcf.gz")
        .with_standardize()
        .with_sex(Sex.MALE)
)

print(plan.argv())   # exact argv that would be passed to the CLI
result = plan.run()

Enums

InputFormat.AUTO / .DTC / .VCF / .BCF
OutputFormat.VCF / .BCF / .PLINK
Sex.MALE / .FEMALE
Assembly.GRCH37 / .GRCH38     # plus a `.parse()` classmethod that
                              # accepts 'hg19' / 'hg38' / 'build38' / ...

Output

The Rust tool writes <stem>_report.json alongside the main output. The wrapper loads it into ConversionResult, with sub-dataclasses for each section:

result.input         # InputInfo (path, format, origin)
result.output        # OutputInfo (path, format)
result.reference     # ReferenceInfo (path, origin, assembly)
result.panel         # PanelInfo | None
result.sample        # SampleInfo (id, sex, sex_inferred)
result.build_detection  # BuildDetection | None (detected_build, match rates)
result.statistics    # Statistics (total / emitted / variant / ... records)
result.report_path   # path to the JSON sidecar
result.output_paths  # tuple[Path] — files that actually exist on disk

For PLINK output, output_paths includes the .bed/.bim/.fam trio. For output_dir with a panel, it includes panel.vcf. Non-existent paths are filtered out automatically.

Errors

  • ConvertGenomeBinaryNotFound — CLI not installed / not on PATH.
  • InvalidConfig — argument combination rejected before launching (e.g. missing input file, conflicting output/output_dir).
  • ConvertGenomeFailed — CLI exited non-zero. The exception carries stdout, stderr, returncode.
  • ReportNotFound — CLI ran clean but didn't write a JSON sidecar.

All subclass ConvertGenomeError.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convert_genome-0.3.1.tar.gz (249.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

convert_genome-0.3.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.7 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

convert_genome-0.3.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

convert_genome-0.3.1-cp39-abi3-macosx_11_0_arm64.whl (4.5 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

convert_genome-0.3.1-cp39-abi3-macosx_10_12_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file convert_genome-0.3.1.tar.gz.

File metadata

  • Download URL: convert_genome-0.3.1.tar.gz
  • Upload date:
  • Size: 249.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for convert_genome-0.3.1.tar.gz
Algorithm Hash digest
SHA256 e86e18e7f3a579ea32e8a33ba164bbb660a6a69d16f000df3f61737994e10850
MD5 3c2d999d2d2dc24eda4439c6f2f8379c
BLAKE2b-256 6e6440ad405509a269bc01d1abaa93e88f9370135c2f722f50932f4c2e000f43

See more details on using hashes here.

File details

Details for the file convert_genome-0.3.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for convert_genome-0.3.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d0c968dc1c5edc5686d9754f5b4fdd1d203658cfe3ccf362c5ececc609bc0d5f
MD5 e51fc3081efd43018b5352d28a370823
BLAKE2b-256 787c5b1703dcff74042567d99d7d55bc0ddfefa74f9d8efc1ae11334d870a07d

See more details on using hashes here.

File details

Details for the file convert_genome-0.3.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for convert_genome-0.3.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1469580fcf817e4a70596a6add4eb659a1a76ca3189533998a0fcf48a0ed106b
MD5 5ad56df79bdbd777326f9bd2f06f412b
BLAKE2b-256 8199e1a32ef39f4e0d4aac5a7f1df6ff064e6579cccd07d6c3920449af15c371

See more details on using hashes here.

File details

Details for the file convert_genome-0.3.1-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for convert_genome-0.3.1-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9f1d2ea6359e1d1d10d29b9be9dc7c603bc37ce5276f6b4fac7378f20032df78
MD5 cb9318db9ac0ddcd0d0fb4c969466e07
BLAKE2b-256 ee066bd2c1ec44eeeafa953150d4585e41e6ed1f349299f868be5ba2907a3f12

See more details on using hashes here.

File details

Details for the file convert_genome-0.3.1-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for convert_genome-0.3.1-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f593ce86ea42b3f4f3b960e8da398f8f19c483f6f4015f21259d58bb62a60c23
MD5 7ce65a542838906c1ab1bb3806adf0a1
BLAKE2b-256 1a9932473dbc6cc144ee4c35353d764cf5a8180ac3f5c901cd644f91fd23994d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page