Skip to main content

Tools for programmatically annotating VCFs with the Seshat TP53 database.

Project description

tp53

PyPi Release CI Python Versions basedpyright mypy Poetry Ruff

Tools for programmatically annotating VCFs with the Seshat TP53 database.

Installation

The package can be installed with pip:

pip install tp53

Upload a VCF to the Seshat TP53 Annotation Server

Upload a VCF to the Seshat TP53 annotation server using a headless browser.

 python -m tp53.seshat.upload_vcf \
    --input "input.vcf" \
    --email "example@gmail.com"
INFO:tp53.seshat.upload_vcf:Uploading 0 %...
INFO:tp53.seshat.upload_vcf:Uploading 53%...
INFO:tp53.seshat.upload_vcf:Uploading 53%...
INFO:tp53.seshat.upload_vcf:Uploading 60%...
INFO:tp53.seshat.upload_vcf:Uploading 60%...
INFO:tp53.seshat.upload_vcf:Uploading 66%...
INFO:tp53.seshat.upload_vcf:Uploading 66%...
INFO:tp53.seshat.upload_vcf:Uploading 80%...
INFO:tp53.seshat.upload_vcf:Uploading 80%...
INFO:tp53.seshat.upload_vcf:Upload complete!

This tool is used to programmatically configure and upload batch variants in VCF format to the Seshat annotation server. The tool works by building a headless Chrome browser instance and then interacting with the Seshat website directly through simulated key presses and mouse clicks. Unfortunately, Seshat does not provide a native programmatic API and one could not be reverse engineered. Seshat also utilizes custom JavaScript in their form processing, so a lightweight approach of simply interacting with the HTML form elements was also not possible.

VCF Input Requirements

Seshat will not let the user know why a VCF fails to annotate, but it has been observed that Seshat can fail to parse some of VarDictJava's structural variants (SVs) as valid variant records. One solution that has worked in the past is to remove SVs. The following command will exclude all variants with a non-empty SVTYPE INFO key:

 bcftools view in.vcf --exclude 'SVTYPE!="."' > out.noSV.vcf
Automation

There are no terms and conditions posted on the Seshat annotation server's website, and there is no server-side robots.txt rule set. In lieu of usage terms, we strongly encourage all users of this script to respect the Seshat resource by adhering to the following best practice:

  • Minimize Load: Limit the rate of requests to the server
  • Minimize Connections: Limit the number of concurrent requests

If you need to batch process dozens, or hundreds, of VCF callsets, you may consider improving this underlying Python script to randomize the user agent and IP address of your headless browser session to prevent from being labelled as a bot.

Environment Setup

This script relies on Google Chrome:

❯ brew install --cask google-chrome

Distributions of MacOS may require you to authenticate the Chrome driver (link).

Development and Testing

See the contributing guide for more information.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tp53-0.3.0.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tp53-0.3.0-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file tp53-0.3.0.tar.gz.

File metadata

  • Download URL: tp53-0.3.0.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for tp53-0.3.0.tar.gz
Algorithm Hash digest
SHA256 0a651e78bafc738caa82b738ba68be10cfbf070f6c11d25e69084303bcdb3774
MD5 a149bc4bd4663f40bc0c885fbdbf9244
BLAKE2b-256 dc94e4f0558b539b68236282c1161abbc4fbbb61da2f7f1130f20fe8f04edbdf

See more details on using hashes here.

Provenance

The following attestation bundles were made for tp53-0.3.0.tar.gz:

Publisher: publish_tp53.yml on clintval/tp53

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tp53-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: tp53-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for tp53-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4d4ec23f4b389b9a5e0c63d3aab13b0167cb8f32d5cc3bae40580844ffe333a7
MD5 6dbf17ecd7015791301afc3544cdd5cd
BLAKE2b-256 c14883eb590ff200d04e71c0f064505212c47c63fd85e7346e9dba9a048414be

See more details on using hashes here.

Provenance

The following attestation bundles were made for tp53-0.3.0-py3-none-any.whl:

Publisher: publish_tp53.yml on clintval/tp53

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page