A simple Django app to represent genes.
Project description
Genes is a Django app to represent genes.
Download and Install
This package is registered as django-genes in PyPI and is pip installable:
pip install django-genes
If any of the following dependency packages are not found on your system, pip will install them too:
django 1.8 or later (Django web framework)
django-organisms (Organisms model, which is required by Genes model)
django-haystack (see Search Indexes and Data Template section.)
django-fixtureless (for unittest, see tests.py)
Quick Start
Add ‘genes’ and ‘organisms’ to your INSTALLED_APPS setting like this:
INSTALLED_APPS = ( ... 'organisms', 'genes', )
Run python manage.py migrate command to create genes and organisms models.
Search Indexes and Data Template
The module search_indexes.py can be used by django haystack (https://github.com/django-haystack/django-haystack) to search genes. It includes the Gene fields that should be included in the search index, and how they should be weighted. The text field refers to a document that is built for the search engine to index. The location of data template for this document is: genes/templates/search/indexes/gene_text.txt.
For more information, see: http://django-haystack.readthedocs.org/en/latest/tutorial.html#handling-data
Usage of Management Commands
This app includes five management commands in management/commands/ sub-directory:
1. genes_add_xrdb
This command adds cross-reference databases for genes. It must be called for every new cross-reference database to populate the gene and cross-reference objects in the database. It requires 2 arguments:
name: the name of the database
URL: the URL for that database, with the string ‘_REPL_’ added at the end of the URL
For example, this command adds Ensembl as a cross-reference database:
python manage.py genes_add_xrdb --name=Ensembl --URL=http://www.ensembl.org/Gene/Summary?g=_REPL_And this command adds MIM as a cross-reference database:
python manage.py genes_add_xrdb --name=MIM --URL=http://www.ncbi.nlm.nih.gov/omim/_REPL_
2. genes_load_geneinfo
This command parses gene info file(s) and saves the corresponding gene objects into the database. It takes 2 required arguments and 5 optional arguments:
(Required) geneinfo_file: location of gene info file;
(Required) taxonomy_id: taxonomy ID for organism for which genes are being populated;
(Optional) systematic_col: systematic column in gene info file. Default is 3;
(Optional) symbol_col: symbol column in gene info file. Default is 2;
(Optional) gi_tax_id: alternative taxonomy ID for some organisms (such as S. cerevisiae);
(Optional) alias_col: the column containing gene aliases. If a hyphen ‘-’ or blank space ‘ ‘ is passed, symbol_col will be used. Default is 4.
(Optional) put_systematic_in_xrdb: name of cross-reference Database for which you want to use organism systematic IDs as CrossReference IDs. This is useful for Pseudomonas, for example, as systematic IDs are saved into PseudoCAP cross-reference database.
The following example shows how to download a gzipped human gene info file from NIH FTP server, and populate the database based on this file.
# Create a temporary data directory: mkdir data # Download a gzipped human gene info file into data directory: wget -P data/ -N ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz # Unzip downloaded file: gunzip -c data/Homo_sapiens.gene_info.gz > data/Homo_sapiens.gene_info # Call genes_load_geneinfo to populate the database: python manage.py genes_load_geneinfo --geneinfo_file=data/Homo_sapiens.gene_info --taxonomy_id=9606 --systematic_col=2 --symbol_col=2
3. genes_load_uniprot.py
This command can be used to populate database with UniProtKB identifiers. It takes one argument:
uniprot_file: location of a file mapping UniProtKB IDs to Entrez and Ensembl IDs
Important: Before calling this command, please make sure that both Ensembl and Entrez identifiers have been loaded into the database.
After downloading the gzipped file, use zgrep command to get the lines we need (the original file is quite large), then run this command:
wget -P data/ -N ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping.dat.gz zgrep -e "GeneID" -e "Ensembl" data/idmapping.dat.gz > data/uniprot_entrez_ensembl.txt python manage.py genes_load_uniprot --uniprot_file=data/uniprot_entrez_ensembl.txt
4. genes_load_wb.py
This command can be used to populate database with WormBase identifiers. It takes 3 arguments:
(Required) wb_url: URL of wormbase xrefs file;
(Required) taxonomy_id: taxonomy ID assigned to this organism by NCBI;
(Optional) db_name: the name of the cross-reference database, default is ‘WormBase’.
As is expected, the WormBase cross-reference database should be populated using the genes_add_xrdb command (see command #1) before this command to populate the WormBase identifiers. Here is an example:
# Find latest version of WormBase here: # http://www.wormbase.org/about/release_schedule#102--10-1 python manage.py genes_load_wb --wb_url=ftp://ftp.wormbase.org/pub/wormbase/releases/WS243/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS243.xrefs.txt.gz --taxonomy_id=6239
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file django-genes-0.1.tar.gz
.
File metadata
- Download URL: django-genes-0.1.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe80987e350d0f0e0a55909fff1b573fe8b75577b90691fff86956574da290b3 |
|
MD5 | 02ee91511a786a9db3558febeb8ac1f7 |
|
BLAKE2b-256 | a4e32b60cd64d564240d2d51dec47c6879745615880c3c1329dc0aa8f7b57f57 |