Skip to main content

GenBank data miner for fungal taxonomists

Project description

GenMine

A GenBank data mining program for (mostly fungal) taxonomists

GenMine downloads GenBank nucleotide records GenMine filters downloaded data with frequently used genes in taxonomy.

Citation: Chang Wan Seo, Sung Hyun Kim, Young Woon Lim & Myung Soo Park (2022) Re-Identification on Korean Penicillium Sequences in GenBank Collected by Software GenMine, Mycobiology, DOI: 10.1080/12298093.2022.2116816

https://www.tandfonline.com/doi/full/10.1080/12298093.2022.2116816

Install

  • pip
pip install GenMine
  • conda
conda install -c cwseo GenMine

Usage

Basic usage

  • Download all Penicillium records
GenMine -e wan101010@snu.ac.kr -g Penicillium
  • Download all Penicillium records and then filter records with term "Korea"
GenMine -e wan101010@snu.ac.kr -g Penicillium -a Korea
  • Download data accession numbers
GenMine -e wan101010@snu.ac.kr -c ON417149.1 ON417150.1

Advanced usage

  • Download records of multiple genera
GenMine -e wan101010@snu.ac.kr -g Penicillium Trichoderma Alternaria
  • Download records of multiple genera given by file
GenMine -e wan101010@snu.ac.kr -g genera.txt

"genera.txt" should be like this

Penicillium
Trichoderma
Alternaria
  • Download records of multiple accession given by file
GenMine -e wan101010@snu.ac.kr -c accessions.txt

"accessions.txt" should be like this

ON417149.1
ON417150.1
MW554209.1
OK643788.1
  • Continue download from interrupted run (only for accessions, for genus, it will automatically solve if you launch GenMine in same location)
GenMine -e wan101010@snu.ac.kr -c accessions.txt -o "2022-11-02-00-12-08"
# Caution 1: -o should be name of previous run result directory
# Caution 2: will not work for finished run

Arguments

  • Basic Parameters
--genus, -g : List of genus to find | File with genera in each line
--accession, -c : List of accessions to get | File with accessions in each line
--email, -e : your email for NCBI access
  • Optional Parameters
--additional, -a : additional terms (ex. country name) to filter 
--max, -m : maximum length of the sequence to parse (default: 5000)

Output explanations

Main output

WIP

Features

GenMine is a python program that parses records from GenBank and sort by gene names, based on Entrez library. Comparing to Entrez, GenMiner has some advantages and disadvantages

Advantages

  • GenMine doesn't misses records, especially with multiple terms
  • GenMine can download discontinuously, especially useful in low internet condition
  • GenMine classifies downloaded records by gene types (ITS, LSU, SSU, BenA etc...)
  • If you want more gene types, issue it!
  • We are currently working on better gene annotations

Limitations

  • Slower than Entrez (sometimes a lot), due to completeness and stability

Bug reports and Suggestions

  • Bug reports and suggestions are available in Github Issues or directly to wan101010@snu.ac.kr
  • However, we want GenMine to remain as small tool. For suggestions little bit too much for the purpose of GenMine might be accepted in our upcomming softwares

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genmine-1.4.2.tar.gz (32.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genmine-1.4.2-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file genmine-1.4.2.tar.gz.

File metadata

  • Download URL: genmine-1.4.2.tar.gz
  • Upload date:
  • Size: 32.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.13.9 HTTPX/0.28.1

File hashes

Hashes for genmine-1.4.2.tar.gz
Algorithm Hash digest
SHA256 fa113dcd6a4c9cc6efc7d015d56c19f03907bd965be04cfb139ac1b49217f5f3
MD5 6a73a39eb9016c69129c8f85b584917f
BLAKE2b-256 2424bfac44e47b88d820d4ddf97228dac485304fb97d7c5d9cb82acf8c011557

See more details on using hashes here.

File details

Details for the file genmine-1.4.2-py3-none-any.whl.

File metadata

  • Download URL: genmine-1.4.2-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.13.9 HTTPX/0.28.1

File hashes

Hashes for genmine-1.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 626809d25555a5c893ccb651e58ebe8176c944a0d2ece6a3c726f6f8ec8856f3
MD5 6f96356c68b3fd9b916d61b9ca9a8218
BLAKE2b-256 d1a6d0894453a3cb1a58b538545c87bf09e9c4c339c27f801b361af29a9e9cc0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page