Skip to main content

Feature-aware orthology prediction tool

Project description

HaMStR-OneSeq

PyPI version conda-install conda-version GPLv3-license

Table of Contents

How to install

HaMStR-oneSeq is distributed as a python package called hamstr1s. It is compatible with Python ≥ v3.7.

Install the hamstr1s package

You can install hamstr1s using pip:

python3 -m pip install hamstr1s

or, in case you do not have admin rights, and don't use package systems like Anaconda to manage environments you need to use the --user option:

python3 -m pip install --user hamstr1s

and then add the following line to the end of your ~/.bashrc or ~/.bash_profile file, restart the current terminal to apply the change (or type source ~/.bashrc):

export PATH=$HOME/.local/bin:$PATH

Setup HaMStR-oneSeq

After installing hamstr1s, you need to setup HaMStR-oneSeq to get its dependencies and pre-calculated data.

You can do it by just running this command

setup1s

or, in case you are using Anaconda

setup1s --conda

You should have the sudo password ready, otherwise some missing dependencies cannot be installed. See dependency list for more info. If you do not have root privileges, ask your admin to install those dependencies using setup1s --lib command.

After the setup run successfully, you can start using HaMStR.

For debugging the setup, please create a log file by running the setup as e.g. setup1s | tee log.txt for Linux/MacOS or setup1s --conda | tee log.txt for Anaconda and send us that log file, so that we can trouble shoot the issues. Most of the problems can be solved by just re-running the setup.

Usage

HaMStR-oneSeq will run smoothly with the provided sample input file in 'infile.fa' if everything is set correctly.

oneSeq --seqFile infile.fa --seqName test --refspec HUMAN@9606@3

The output files with the prefix test will be saved at your current working directory. You can have an overview about all available options with the command

oneSeq -h

Please find more information in our wiki to learn about the input and outputs files of HaMStR-oneSeq.

HaMStR-oneSeq data set

Within the data package (https://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml) we provide a set of 78 reference taxa. They can be automatically downloaded during the setup. This data comes "ready to use" with the HaMStR-OneSeq framework. Species data must be present in the three directories listed below:

  • genome_dir (Contains sub-directories for proteome fasta files for each species)
  • blast_dir (Contains sub-directories for BLAST databases made with makeblastdb out of your proteomes)
  • weight_dir (Contains feature annotation files for each proteome)

For each species/taxon there is a sub-directory named in accordance to the naming schema ([Species acronym]@[NCBI ID]@[Proteome version])

HaMStR-oneSeq is not limited to those 78 taxa. If needed the user can manually add further gene sets (multifasta format) using provided python scripts.

Adding a new gene set into HaMStR-oneSeq

For adding one gene set, please use the addTaxon1s function:

addTaxon1s -f newTaxon.fa -i tax_id [-o /output/directory] [-n abbr_tax_name] [-c] [-v protein_version] [-a]

in which, the first 3 arguments are required including newTaxon.fa is the gene set that need to be added, tax_id is its NCBI taxonomy ID, /output/directory is where the sub-directories can be found (genome_dir, blast_dir and weight_dir). If not given, new taxon will be added into the same directory of pre-calculated data. Other arguments are optional, which are -n for specify your own taxon name (if not given, an abbriviate name will be suggested based on the NCBI taxon name of the input tax_id), -c for calculating the BLAST DB (only needed if you need to include your new taxon into the list of taxa for compilating the core set), -v for identifying the genome/proteome version (default will be 1), and -a for turning off the annotation step (not recommended).

Adding a list of gene sets into HaMStR-oneSeq

For adding more than one gene set, please use the addTaxa1s script:

addTaxa1s -i /path/to/newtaxa/fasta -m mapping_file [-o /output/directory] [-c]

in which, /path/to/taxa/fasta is a folder where the FASTA files of all new taxa can be found. mapping_file is a tab-delimited text file, where you provide the taxonomy IDs that stick with the FASTA files:

#filename	tax_id	abbr_tax_name	version
filename1	12345678
filename2	9606
filename3	4932	my_fungi
...

The header line (started with #) is a Must. The values of the last 2 columns (abbr. taxon name and genome version) are, however, optional. If you want to specify a new version for a genome, you need to define also the abbr. taxon name, so that the genome version is always at the 4th column in the mapping file.

NOTE: After adding new taxa into HaMStR-oneSeq, you should check for the validity of the new data before running HaMStR.

Bugs

Any bug reports or comments, suggestions are highly appreciated. Please open an issue on GitHub or be in touch via email.

How to cite

Ebersberger, I., Strauss, S. & von Haeseler, A. HaMStR: Profile hidden markov model based search for orthologs in ESTs. BMC Evol Biol 9, 157 (2009), doi:10.1186/1471-2148-9-157

Contributors

Contact

For further support or bug reports please contact: ebersberger@bio.uni-frankfurt.de

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hamstr1s-2.0.4.tar.gz (88.6 kB view details)

Uploaded Source

Built Distribution

hamstr1s-2.0.4-py3-none-any.whl (112.4 kB view details)

Uploaded Python 3

File details

Details for the file hamstr1s-2.0.4.tar.gz.

File metadata

  • Download URL: hamstr1s-2.0.4.tar.gz
  • Upload date:
  • Size: 88.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for hamstr1s-2.0.4.tar.gz
Algorithm Hash digest
SHA256 f1db8f43fb0876dc83f71a25bbc55b8328a8e899e7890fd09a46ac719b135426
MD5 8cc41270d8fb4c3fa528c6ea08a4e07a
BLAKE2b-256 d2a436fc13d9027f3e2c2c9e38908bc3bf0d8226303c04089cbe7d908dbd3314

See more details on using hashes here.

File details

Details for the file hamstr1s-2.0.4-py3-none-any.whl.

File metadata

  • Download URL: hamstr1s-2.0.4-py3-none-any.whl
  • Upload date:
  • Size: 112.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for hamstr1s-2.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 421b848b9b4ffb3a26882bff1a2b354d145c98f6845b3244ff2e65b8381525a1
MD5 4c59e2886fd4c34b3024762b35b08e88
BLAKE2b-256 455510274c135bb2b0f7d47a207f0c704afe87d47628be6ae86a35c56ea2b4f7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page