This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (
Help us improve Python packaging - Donate today!

Tools for extracting variants from Leiden Open Variation Database Installations.

Project Description

.. image::

.. image::

.. image::

Tools for extracting, remapping, and validating variants from Leiden Open Variation Database Installations.

* Free software: BSD license
* Documentation: (Not Complete)

# Modules

This project contains a number of number of modules within leiden_sc/:

## lovd

### macarthur_core/lovd/

These classes allow a user to extract tables of data (mutations listed for a specific gene in the database) and other useful information from any Leiden Open Variation Database installation, such as Unfortunately, it has been necessary to do this by downloading the HTML for relevant pages on the database and parsing out the necessary data, as they do not have an easy way to access the data otherwise. Therefore, I have added an external dependency - beautifulsoup4 - for HTML parsing.

Basically, the usage for these classes goes like this:
leiden_url = ''
gene_id = 'ACTA1'

database = make_leiden_database(leiden_url)
column_labels = leiden_database.get_table_headers()
table_entries = leiden_database.get_table_data()
Note that make_leiden_database acts as a factory method to generate the right subclass of LeidenDatabase for the detected version number.

### macarthur_core/lovd/

These are general utility functions, some of which are used in

## remapping

### macarthur_core/remapping/

Genetic mutations are often listed in one of two formats, HGVS and VCF. HGVS is compact and has its own (relatively complex) syntax for describing mutations. However, for large scale analysis projects HGVS is extremely difficult to use effectively for a number of reasons. We are interested in converting data from the Leiden Open Variation Databases from one format to the other. This is a non-trivial conversion.

The class ```VariantRemapper``` in ```macarthur_core/remapping/``` wraps a third party module (hgvs) to make it easier to use within this project. The third party module documentation and description HGVS vs. VCF notation is described here:

Unfortunately, the third-party tool depends on two relatively large files that I cannot easily host on github. These are normally housed in a folder called resources within the module. One is a human genome reference sequence (```macarthur_core/remapping/resources/hg19.fa```) and the other is a file containing definitions of transcript sequences that are needed to facilitate conversion between the HGVS and VCF
(```macarthur_core/remapping/resources/genes.refSeq```). These two files are hosted at: Note that the files will need to decompressed using gunzip and placed in ```macarthur_core/remapping/resources/```. The first time these functions are used, two additional files will be generated (takes some time). Subsequent runs will not require this process to be repeated.

## io

### macarthur_core/io/
This module has functions for reading and writing delimited files to and from 2D lists where first dimension is rows and
the second is columns. It also contains a function for formatting output text for a file format called <a href=''>VCF</a> from a 2D
list of data.

### macarthur_core/io/
This module has functions for reading HTML data from URLs.

# Scripts

I have included several scripts that I use to extract, remap, and validate data from LOVD databases.

This is a script that I use in my overall project that makes use of the macarthur_core/lovd and macarthur_core/web_io to extract data from all data from a given LOVD URL.

The script makes use of argparse to provide a user interface. The string provided for the command-line interface should hopefully provide an explanation of how it is used. It should save a tab-delimited file for each gene's table data, where each output file is named according to the gene name.

From command line generally it is used in the following way:
python --all --leiden_url --output_directory results
Users can also print a list of all available genes using:
python --genes_available --leiden_url


0.1.0 (2014-3-12)

* First release on PyPI.
Release History

Release History

This version
History Node


Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
lovd-0.1.0.tar.gz (11.9 kB) Copy SHA256 Checksum SHA256 Source Apr 28, 2014

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting