Skip to main content

Python bindings for SauersML/convert_genome (DTC → VCF/BCF/PLINK conversion).

Project description

convert_genome (Python)

Python wrapper for the SauersML/convert_genome CLI. Convert direct-to-consumer dumps (23andMe, AncestryDNA, MyHeritage, deCODEme) and standard VCF/BCF into compliant VCF, BCF, or PLINK 1.9 binary — with build detection, sex inference, liftover, and panel harmonisation, all controllable from kwargs.

from convert_genome import convert, OutputFormat

result = convert(
    input="23andme.txt",
    output="out.vcf",
    format=OutputFormat.VCF,
    assembly="hg38",
    standardize=True,
)

result.statistics.emitted_records       # int
result.sample.sex_inferred              # bool
result.build_detection.detected_build   # 'GRCh37' / 'GRCh38' / ...
result.report_path                      # path to <stem>_report.json
result.output_paths                     # files that actually exist on disk
result.yield_rate                       # emitted / total

The wrapper runs the Rust binary, parses the sidecar <stem>_report.json into typed frozen dataclasses, and returns a single ConversionResult.

Install

pip install convert_genome
# the Rust binary:
cargo install convert_genome

Binary located via binary= or PATH. No env-var indirection — if the binary isn't on PATH, pass binary= explicitly. Missing binary → ConvertGenomeBinaryNotFound with the suggested install command.

Shortcuts: skip every auto-discovery step

The CLI will download/auto-detect things it doesn't need to. Pass them in directly:

convert(
    input="raw.txt",
    output="out.vcf",
    reference="/cache/hg38.fa",         # skip FASTA download
    reference_fai="/cache/hg38.fa.fai", # skip .fai indexing
    input_build="hg19",                  # skip build detection
    assembly="GRCh38",                   # target build (still does liftover)
    panel="/cache/1kg_panel.vcf",        # supply harmonisation panel
    sex="female",                        # skip sex inference
    standardize=True,
)

sex is lenient: passing "unknown" or "indeterminate" (e.g. when chaining out of infer_sex) silently omits the --sex flag and lets the CLI run its own inference.

Builder

Converter is a frozen dataclass; every with_* returns a new instance, so branching is safe.

from convert_genome import Converter, Sex, OutputFormat

plan = (
    Converter(input="raw.txt", output_dir="out/", format=OutputFormat.PLINK)
        .with_assembly("GRCh38")
        .with_reference("/cache/hg38.fa", "/cache/hg38.fa.fai")
        .with_panel("/data/1kg_panel.vcf.gz")
        .with_standardize()
        .with_sex(Sex.MALE)
)

print(plan.argv())   # exact argv that would be passed to the CLI
result = plan.run()

Enums

InputFormat.AUTO / .DTC / .VCF / .BCF
OutputFormat.VCF / .BCF / .PLINK
Sex.MALE / .FEMALE
Assembly.GRCH37 / .GRCH38     # plus a `.parse()` classmethod that
                              # accepts 'hg19' / 'hg38' / 'build38' / ...

Output

The Rust tool writes <stem>_report.json alongside the main output. The wrapper loads it into ConversionResult, with sub-dataclasses for each section:

result.input         # InputInfo (path, format, origin)
result.output        # OutputInfo (path, format)
result.reference     # ReferenceInfo (path, origin, assembly)
result.panel         # PanelInfo | None
result.sample        # SampleInfo (id, sex, sex_inferred)
result.build_detection  # BuildDetection | None (detected_build, match rates)
result.statistics    # Statistics (total / emitted / variant / ... records)
result.report_path   # path to the JSON sidecar
result.output_paths  # tuple[Path] — files that actually exist on disk

For PLINK output, output_paths includes the .bed/.bim/.fam trio. For output_dir with a panel, it includes panel.vcf. Non-existent paths are filtered out automatically.

Errors

  • ConvertGenomeBinaryNotFound — CLI not installed / not on PATH.
  • InvalidConfig — argument combination rejected before launching (e.g. missing input file, conflicting output/output_dir).
  • ConvertGenomeFailed — CLI exited non-zero. The exception carries stdout, stderr, returncode.
  • ReportNotFound — CLI ran clean but didn't write a JSON sidecar.

All subclass ConvertGenomeError.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convert_genome-0.3.2.tar.gz (255.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

convert_genome-0.3.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.7 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

convert_genome-0.3.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

convert_genome-0.3.2-cp39-abi3-macosx_11_0_arm64.whl (4.5 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

convert_genome-0.3.2-cp39-abi3-macosx_10_12_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file convert_genome-0.3.2.tar.gz.

File metadata

  • Download URL: convert_genome-0.3.2.tar.gz
  • Upload date:
  • Size: 255.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for convert_genome-0.3.2.tar.gz
Algorithm Hash digest
SHA256 cee61cfe1a7861391ac937f0e1a5832fa8b9331492505819d509aebcd4ea78ca
MD5 12fc4ca4113f2bc8cd2aafa5d5ab6e95
BLAKE2b-256 5d287031a6424473d0ac29f25b0a8f8e9c56f217834bc184d0c72b81e3630426

See more details on using hashes here.

File details

Details for the file convert_genome-0.3.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for convert_genome-0.3.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9d7618cae6086e04acb693596310b012329686bb327c00c99fdd007eed848481
MD5 ef4b2862aefec8eb0f90fafc5f410bea
BLAKE2b-256 002964d1239894e722c4486fc076decc2cf64a269fe952efc33ac77bb7137d2a

See more details on using hashes here.

File details

Details for the file convert_genome-0.3.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for convert_genome-0.3.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f04a272099b8be3f8617072e8bfe4bea296bae648c7f03b65f8547b1ec53b729
MD5 cca461bb4b5a77a5d4f30cf384d7a69b
BLAKE2b-256 16c12418b087da3c28553c70a1a66a42ee985ec8416d0b3a66dbc83024ccfe80

See more details on using hashes here.

File details

Details for the file convert_genome-0.3.2-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for convert_genome-0.3.2-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 79d88badab2c71c5f324f07792c710ce0cc5f332ce014aa9887412b773aa8da8
MD5 227baef18fc4a6176aa8f116569db20a
BLAKE2b-256 33fcbaec1423ae2c8db94b0a674e6a0fe6552c4e32189acb29c27da4c2600b72

See more details on using hashes here.

File details

Details for the file convert_genome-0.3.2-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for convert_genome-0.3.2-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a1fa38be6196f2e9abdf1d3c7194268444c31ad33629ff15c3a46eca2904f507
MD5 24515e7c6b6913540b7b58cb54f5edab
BLAKE2b-256 2e2366150c7b0b946860664b5df04ede6828dabd69b4e0bb3556462b28a38912

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page