Probabilistic Phage Protein Functions: Phage genomes and their annotations
Project description
PPPF
Probabilistic Phage Protein Families
Author
Synopsis
We are exploring different ways of annotating phage proteins (because it never gets old), and this is a database of complete phage genomes and their annotations.
It also includes some phage protein clustering and tools associated with those clusters.
At the moment this is very much a pre-alpha project. We are defining the tables and relations, building the code base to access those tables, and trying to explore what we should do next.
However, we have made all our data, and the code to recreate it, available for everyone in case it is of use to anyone.
Installation
PIP installation
pip install pppf
Getting started
The [download_databases](python scripts/download_databases.py) script will download the two databases phages.sql
[2.6 GB] and clusters.sql
[1.8 GB] to the default location (currently PPPF/data/databases/
) or to a location of your choosing.
Most of the code in scripts requires that you provide a phage
or clusters
database as a command line option, but we are implementing code in pppfdb
that will use the default location. If you use a different location, you may need to change the location in that code.
Building from scratch
If you want to build the databases from scratch, you can do so using snakemake
and the snakefiles that we provide.
Then, you can use snakemake to start making it better. Probably.
You will need a process_phages.json file, and then you can update the databases with the latest phage genomes like this:
snakemake -s PPPF/snakefiles/download_phages.snakefile --configfile process_phages.json
if you are running on Edwards' local compute resources, you can use this command to run the download on the cluster.
snakemake -s ~/GitHubs/PPPF/snakefiles/download_phages.snakefile --cluster 'qsub -cwd -o sge_download.out -e sge_download.err -V' -j 200 --latency-wait 60
It will download a new set of accessions, and then check the database to see what needs to be added. Note that currently we do not delete anything from the database.
Using PPPF
The basic structure is that each of the directories is a library, and the scripts directory contains scripts that use those libraries.
Take a look at the database schema for a more detailed discussion of the schema we designed.
Information
License
PPPF is released under the MIT License
Issues
Please use the issue tracker for any bugs, enhancements, suggestions, or comments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file PPPF-0.1.0.tar.gz
.
File metadata
- Download URL: PPPF-0.1.0.tar.gz
- Upload date:
- Size: 18.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c714275a06b307f08b66ddf4e73ca8074286186b0fabfdacb1494e21f64da506 |
|
MD5 | dd75bb62a7fd40aa9534987dabcf8fb5 |
|
BLAKE2b-256 | edf8d5262ededeb717ebd97f2a20f373870d17ecec4e610cea22f02482c4b4d5 |
File details
Details for the file PPPF-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: PPPF-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08dad2a54a2f1143cd745ddfc6aff922c38f1be595edd7b3b5f5d3f084b9ecc0 |
|
MD5 | b868d22021a9643b919833fbe28126b4 |
|
BLAKE2b-256 | ae9046670c056b394026f929656c44b9d69baedf1a830b18aaf9f10302cba6a1 |