Skip to main content

GenBank data miner for fungal taxonomists

Project description

GenMine

A GenBank data mining program for (mostly fungal) taxonomists

GenMine downloads GenBank nucleotide records GenMine filters downloaded data with frequently used genes in taxonomy.

Citation: Chang Wan Seo, Sung Hyun Kim, Young Woon Lim & Myung Soo Park (2022) Re-Identification on Korean Penicillium Sequences in GenBank Collected by Software GenMine, Mycobiology, DOI: 10.1080/12298093.2022.2116816

https://www.tandfonline.com/doi/full/10.1080/12298093.2022.2116816

Install

  • pip
pip install GenMine
  • conda
conda install -c cwseo GenMine

Usage

Basic usage

  • Download all Penicillium records
GenMine -e wan101010@snu.ac.kr -g Penicillium
  • Download all Penicillium records and then filter records with term "Korea"
GenMine -e wan101010@snu.ac.kr -g Penicillium -a Korea
  • Download data accession numbers
GenMine -e wan101010@snu.ac.kr -c ON417149.1 ON417150.1

Advanced usage

  • Download records of multiple genera
GenMine -e wan101010@snu.ac.kr -g Penicillium Trichoderma Alternaria
  • Download records of multiple genera given by file
GenMine -e wan101010@snu.ac.kr -g genera.txt

"genera.txt" should be like this

Penicillium
Trichoderma
Alternaria
  • Download records of multiple accession given by file
GenMine -e wan101010@snu.ac.kr -c accessions.txt

"accessions.txt" should be like this

ON417149.1
ON417150.1
MW554209.1
OK643788.1
  • Continue download from interrupted run (only for accessions, for genus, it will automatically solve if you launch GenMine in same location)
GenMine -e wan101010@snu.ac.kr -c accessions.txt -o "2022-11-02-00-12-08"
# Caution 1: -o should be name of previous run result directory
# Caution 2: will not work for finished run

Arguments

  • Basic Parameters
--genus, -g : List of genus to find | File with genera in each line
--accession, -c : List of accessions to get | File with accessions in each line
--email, -e : your email for NCBI access
  • Optional Parameters
--additional, -a : additional terms (ex. country name) to filter 
--max, -m : maximum length of the sequence to parse (default: 5000)

Output explanations

Main output

WIP

Features

GenMine is a python program that parses records from GenBank and sort by gene names, based on Entrez library. Comparing to Entrez, GenMiner has some advantages and disadvantages

Advantages

  • GenMine doesn't misses records, especially with multiple terms
  • GenMine can download discontinuously, especially useful in low internet condition
  • GenMine classifies downloaded records by gene types (ITS, LSU, SSU, BenA etc...)
  • If you want more gene types, issue it!
  • We are currently working on better gene annotations

Limitations

  • Slower than Entrez (sometimes a lot), due to completeness and stability

Bug reports and Suggestions

  • Bug reports and suggestions are available in Github Issues or directly to wan101010@snu.ac.kr
  • However, we want GenMine to remain as small tool. For suggestions little bit too much for the purpose of GenMine might be accepted in our upcomming softwares

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genmine-1.4.0.tar.gz (30.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genmine-1.4.0-py3-none-any.whl (29.0 kB view details)

Uploaded Python 3

File details

Details for the file genmine-1.4.0.tar.gz.

File metadata

  • Download URL: genmine-1.4.0.tar.gz
  • Upload date:
  • Size: 30.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for genmine-1.4.0.tar.gz
Algorithm Hash digest
SHA256 99bcdc2c382a4c51c24e4e2c20fadf48db6760fde4b41a794a1c1d5256e8858f
MD5 b71c2e975bdd05d44efe6582f529bbc8
BLAKE2b-256 f6760ef48df4909f923a153cd48d74199fc9fc67a6b8a7effe484e05826b9ecd

See more details on using hashes here.

File details

Details for the file genmine-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: genmine-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 29.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for genmine-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eab5aeaa54e1f6f71b80d0c5661d01043719cf9f5035646db74549834dff0510
MD5 05de9be99d1fedeca4282f1bded48bb5
BLAKE2b-256 8250f8afc633898f0e1b498dbdf45465b4d85b26a6169655b129bfe126f9bdc5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page