This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description
===============================
leiden_sc
===============================

.. image:: https://badge.fury.io/py/lovd.png
:target: http://badge.fury.io/py/lovd

.. image:: https://travis-ci.org/andrewhill157/leiden_sc.png?branch=master
:target: https://travis-ci.org/andrewhill157/lovd

.. image:: https://pypip.in/d/lovd/badge.png
:target: https://crate.io/packages/lovd?version=latest


Tools for extracting, remapping, and validating variants from Leiden Open Variation Database Installations.

* Free software: BSD license
* Documentation: http://lovd.rtfd.org. (Not Complete)

# Modules

This project contains a number of number of modules within leiden_sc/:

## lovd

### macarthur_core/lovd/leiden_database.py:

These classes allow a user to extract tables of data (mutations listed for a specific gene in the database) and other useful information from any Leiden Open Variation Database installation, such as http://www.dmd.nl/nmdb2/ Unfortunately, it has been necessary to do this by downloading the HTML for relevant pages on the database and parsing out the necessary data, as they do not have an easy way to access the data otherwise. Therefore, I have added an external dependency - beautifulsoup4 - for HTML parsing.

Basically, the usage for these classes goes like this:
```
leiden_url = 'http://www.dmd.nl/nmdb2/'
gene_id = 'ACTA1'

database = make_leiden_database(leiden_url)
database.set_gene_id(gene_id)
column_labels = leiden_database.get_table_headers()
table_entries = leiden_database.get_table_data()
...
```
Note that make_leiden_database acts as a factory method to generate the right subclass of LeidenDatabase for the detected version number.

### macarthur_core/lovd/utilities.py:

These are general utility functions, some of which are used in leiden_database.py.

## remapping

### macarthur_core/remapping/remapping.py

Genetic mutations are often listed in one of two formats, HGVS and VCF. HGVS is compact and has its own (relatively complex) syntax for describing mutations. However, for large scale analysis projects HGVS is extremely difficult to use effectively for a number of reasons. We are interested in converting data from the Leiden Open Variation Databases from one format to the other. This is a non-trivial conversion.

The class ```VariantRemapper``` in ```macarthur_core/remapping/remapping.py``` wraps a third party module (hgvs) to make it easier to use within this project. The third party module documentation and description HGVS vs. VCF notation is described here: https://github.com/counsyl/hgvs

Unfortunately, the third-party tool depends on two relatively large files that I cannot easily host on github. These are normally housed in a folder called resources within the module. One is a human genome reference sequence (```macarthur_core/remapping/resources/hg19.fa```) and the other is a file containing definitions of transcript sequences that are needed to facilitate conversion between the HGVS and VCF
(```macarthur_core/remapping/resources/genes.refSeq```). These two files are hosted at: http://www.broadinstitute.org/~ahill Note that the files will need to decompressed using gunzip and placed in ```macarthur_core/remapping/resources/```. The first time these functions are used, two additional files will be generated (takes some time). Subsequent runs will not require this process to be repeated.

## io

### macarthur_core/io/file_io.py
This module has functions for reading and writing delimited files to and from 2D lists where first dimension is rows and
the second is columns. It also contains a function for formatting output text for a file format called <a href="http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41">VCF</a> from a 2D
list of data.

### macarthur_core/io/web_io.py
This module has functions for reading HTML data from URLs.

# Scripts

I have included several scripts that I use to extract, remap, and validate data from LOVD databases.

## extract_data.py
This is a script that I use in my overall project that makes use of the macarthur_core/lovd and macarthur_core/web_io to extract data from all data from a given LOVD URL.

The script makes use of argparse to provide a user interface. The string provided for the command-line interface should hopefully provide an explanation of how it is used. It should save a tab-delimited file for each gene's table data, where each output file is named according to the gene name.

From command line generally it is used in the following way:
```
python extract_data.py --all --leiden_url http://www.dmd.nl/nmdb2/ --output_directory results
```
Users can also print a list of all available genes using:
```
python extract_data.py --genes_available --leiden_url http://www.dmd.nl/nmdb2/
```





History
-------

0.1.0 (2014-3-12)
++++++++++++++++++

* First release on PyPI.
Release History

Release History

0.1.0

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
lovd-0.1.0.tar.gz (11.9 kB) Copy SHA256 Checksum SHA256 Source Apr 28, 2014

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting