GenBank data miner for fungal taxonomists
Project description
GenMine
A GenBank data mining program for (mostly fungal) taxonomists
GenMine downloads GenBank nucleotide records GenMine filters downloaded data with frequently used genes in taxonomy.
Citation: Chang Wan Seo, Sung Hyun Kim, Young Woon Lim & Myung Soo Park (2022) Re-Identification on Korean Penicillium Sequences in GenBank Collected by Software GenMine, Mycobiology, DOI: 10.1080/12298093.2022.2116816
https://www.tandfonline.com/doi/full/10.1080/12298093.2022.2116816
Install
- pip
pip install GenMine
- conda
conda install -c cwseo GenMine
Usage
Basic usage
- Download all Penicillium records
GenMine -e wan101010@snu.ac.kr -g Penicillium
- Download all Penicillium records and then filter records with term "Korea"
GenMine -e wan101010@snu.ac.kr -g Penicillium -a Korea
- Download data accession numbers
GenMine -e wan101010@snu.ac.kr -c ON417149.1 ON417150.1
Advanced usage
- Download records of multiple genera
GenMine -e wan101010@snu.ac.kr -g Penicillium Trichoderma Alternaria
- Download records of multiple genera given by file
GenMine -e wan101010@snu.ac.kr -g genera.txt
"genera.txt" should be like this
Penicillium
Trichoderma
Alternaria
- Download records of multiple accession given by file
GenMine -e wan101010@snu.ac.kr -c accessions.txt
"accessions.txt" should be like this
ON417149.1
ON417150.1
MW554209.1
OK643788.1
- Continue download from interrupted run (only for accessions, for genus, it will automatically solve if you launch GenMine in same location)
GenMine -e wan101010@snu.ac.kr -c accessions.txt -o "2022-11-02-00-12-08"
# Caution 1: -o should be name of previous run result directory
# Caution 2: will not work for finished run
Arguments
- Basic Parameters
--genus, -g : List of genus to find | File with genera in each line
--accession, -c : List of accessions to get | File with accessions in each line
--email, -e : your email for NCBI access
- Optional Parameters
--additional, -a : additional terms (ex. country name) to filter
--max, -m : maximum length of the sequence to parse (default: 5000)
Output explanations
Main output
WIP
Features
GenMine is a python program that parses records from GenBank and sort by gene names, based on Entrez library. Comparing to Entrez, GenMiner has some advantages and disadvantages
Advantages
- GenMine doesn't misses records, especially with multiple terms
- GenMine can download discontinuously, especially useful in low internet condition
- GenMine classifies downloaded records by gene types (ITS, LSU, SSU, BenA etc...)
- If you want more gene types, issue it!
- We are currently working on better gene annotations
Limitations
- Slower than Entrez (sometimes a lot), due to completeness and stability
Bug reports and Suggestions
- Bug reports and suggestions are available in Github Issues or directly to wan101010@snu.ac.kr
- However, we want GenMine to remain as small tool. For suggestions little bit too much for the purpose of GenMine might be accepted in our upcomming softwares
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file genmine-1.4.2.tar.gz.
File metadata
- Download URL: genmine-1.4.2.tar.gz
- Upload date:
- Size: 32.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.5 cpython/3.13.9 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa113dcd6a4c9cc6efc7d015d56c19f03907bd965be04cfb139ac1b49217f5f3
|
|
| MD5 |
6a73a39eb9016c69129c8f85b584917f
|
|
| BLAKE2b-256 |
2424bfac44e47b88d820d4ddf97228dac485304fb97d7c5d9cb82acf8c011557
|
File details
Details for the file genmine-1.4.2-py3-none-any.whl.
File metadata
- Download URL: genmine-1.4.2-py3-none-any.whl
- Upload date:
- Size: 32.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.5 cpython/3.13.9 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
626809d25555a5c893ccb651e58ebe8176c944a0d2ece6a3c726f6f8ec8856f3
|
|
| MD5 |
6f96356c68b3fd9b916d61b9ca9a8218
|
|
| BLAKE2b-256 |
d1a6d0894453a3cb1a58b538545c87bf09e9c4c339c27f801b361af29a9e9cc0
|