Skip to main content

GenBank data miner for fungal taxonomists

Project description

GenMine

A GenBank data mining program for (mostly fungal) taxonomists

GenMine downloads GenBank nucleotide records GenMine filters downloaded data with frequently used genes in taxonomy.

Citation: Chang Wan Seo, Sung Hyun Kim, Young Woon Lim & Myung Soo Park (2022) Re-Identification on Korean Penicillium Sequences in GenBank Collected by Software GenMine, Mycobiology, DOI: 10.1080/12298093.2022.2116816

https://www.tandfonline.com/doi/full/10.1080/12298093.2022.2116816

Install

  • pip
pip install GenMine
  • conda
conda install -c cwseo GenMine

Usage

Basic usage

  • Download all Penicillium records
GenMine -e wan101010@snu.ac.kr -g Penicillium
  • Download all Penicillium records and then filter records with term "Korea"
GenMine -e wan101010@snu.ac.kr -g Penicillium -a Korea
  • Download data accession numbers
GenMine -e wan101010@snu.ac.kr -c ON417149.1 ON417150.1

Advanced usage

  • Download records of multiple genera
GenMine -e wan101010@snu.ac.kr -g Penicillium Trichoderma Alternaria
  • Download records of multiple genera given by file
GenMine -e wan101010@snu.ac.kr -g genera.txt

"genera.txt" should be like this

Penicillium
Trichoderma
Alternaria
  • Download records of multiple accession given by file
GenMine -e wan101010@snu.ac.kr -c accessions.txt

"accessions.txt" should be like this

ON417149.1
ON417150.1
MW554209.1
OK643788.1
  • Continue download from interrupted run (only for accessions, for genus, it will automatically solve if you launch GenMine in same location)
GenMine -e wan101010@snu.ac.kr -c accessions.txt -o "2022-11-02-00-12-08"
# Caution 1: -o should be name of previous run result directory
# Caution 2: will not work for finished run

Arguments

  • Basic Parameters
--genus, -g : List of genus to find | File with genera in each line
--accession, -c : List of accessions to get | File with accessions in each line
--email, -e : your email for NCBI access
  • Optional Parameters
--additional, -a : additional terms (ex. country name) to filter 
--max, -m : maximum length of the sequence to parse (default: 5000)

Output explanations

Main output

WIP

Features

GenMine is a python program that parses records from GenBank and sort by gene names, based on Entrez library. Comparing to Entrez, GenMiner has some advantages and disadvantages

Advantages

  • GenMine doesn't misses records, especially with multiple terms
  • GenMine can download discontinuously, especially useful in low internet condition
  • GenMine classifies downloaded records by gene types (ITS, LSU, SSU, BenA etc...)
  • If you want more gene types, issue it!
  • We are currently working on better gene annotations

Limitations

  • Slower than Entrez (sometimes a lot), due to completeness and stability

Bug reports and Suggestions

  • Bug reports and suggestions are available in Github Issues or directly to wan101010@snu.ac.kr
  • However, we want GenMine to remain as small tool. For suggestions little bit too much for the purpose of GenMine might be accepted in our upcomming softwares

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genmine-1.4.1.tar.gz (30.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genmine-1.4.1-py3-none-any.whl (29.1 kB view details)

Uploaded Python 3

File details

Details for the file genmine-1.4.1.tar.gz.

File metadata

  • Download URL: genmine-1.4.1.tar.gz
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.1 cpython/3.13.9 HTTPX/0.28.1

File hashes

Hashes for genmine-1.4.1.tar.gz
Algorithm Hash digest
SHA256 2f69c351e8d0db4ff89792ae6a1e0b61a8e11c4824a16c2c0c6d95cf191bca8e
MD5 2faff70580ef2795092ef452e8b8909e
BLAKE2b-256 61d96f66f4abf35042154239c9d543c9208c294ca9683bd15f7094888276b1df

See more details on using hashes here.

File details

Details for the file genmine-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: genmine-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 29.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.1 cpython/3.13.9 HTTPX/0.28.1

File hashes

Hashes for genmine-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b76fec6c597eb4744b09bc733fd402baf86e06be6aedca0cc40a3e0ea2e727e2
MD5 786795c99f4f13504047ac78a737a864
BLAKE2b-256 1e33d3dc23dbd8811479dc55cb9b6c02d705cd5c687a9a1d803d3185f9e9027d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page