Skip to main content

Interface to WormBase (www.wormbase.org) curation data, including literature management and NLP functions

Project description

WBtools

Interface to WormBase curation database and Text Mining functions

Access WormBase paper corpus information by loading pdf files (converted to txt) and curation info from the WormBase database. The package also exposes text mining functions on papers' fulltext.

Installation

pip install wbtools

Usage example

Get sentences from a WormBase paper

from wbtools.literature.corpus import CorpusManager

paper_id = "00050564"
cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
                         paper_ids=[paper_id], file_server_host="file_server_base_url", file_server_user="username", 
                         file_server_passwd="password")
sentences = cm.get_paper(paper_id).get_text_docs(split_sentences=True)

Get the latest papers (up to 50) added to WormBase or modified in the last 30 days

from wbtools.literature.corpus import CorpusManager
import datetime

one_month_ago = (datetime.datetime.now() - datetime.timedelta(days=30)).strftime("%M/%D/%Y")

cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
                         from_date=one_month_ago, max_num_papers=50, 
                         file_server_host="file_server_base_url", file_server_user="username", 
                         file_server_passwd="password")
paper_ids = [paper.paper_id for paper in cm.get_all_papers()]

Get the latest 50 papers added to WormBase or modified that have a final pdf version and have been flagged by WB paper classification pipeline, excluding reviews and papers with temp files only (proofs)

from wbtools.literature.corpus import CorpusManager
import datetime

cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
                         max_num_papers=50, must_be_autclass_flagged=True, exclude_pap_types=['Review'], 
                         exclude_temp_pdf=True, file_server_host="file_server_base_url", 
                         file_server_user="username", file_server_passwd="password")
paper_ids = [paper.paper_id for paper in cm.get_all_papers()]

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wbtools-3.3.0.tar.gz (37.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wbtools-3.3.0-py3-none-any.whl (45.1 kB view details)

Uploaded Python 3

File details

Details for the file wbtools-3.3.0.tar.gz.

File metadata

  • Download URL: wbtools-3.3.0.tar.gz
  • Upload date:
  • Size: 37.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for wbtools-3.3.0.tar.gz
Algorithm Hash digest
SHA256 b04a88a3104d66c1fefc852a82c126a521b4a4a8bfe26cbab6a1ef0ae15f6042
MD5 b93a99aab3ab9ca9384600de3c5fa64e
BLAKE2b-256 c9bf35c84909c3869fccb4e3b92b31ff56911cc517d46cfddc30969d8366cd81

See more details on using hashes here.

File details

Details for the file wbtools-3.3.0-py3-none-any.whl.

File metadata

  • Download URL: wbtools-3.3.0-py3-none-any.whl
  • Upload date:
  • Size: 45.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for wbtools-3.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2db9a1bbe26d13f6a8cad32fbb5d09614da3af815c4cb873bf98bdc6640ad1a2
MD5 3d6ae6d14a0c7b9e2ec38432a447e832
BLAKE2b-256 3281f415dfef95fd9785f8e579741403660f49fc9c6bfc5035296eb6fcf5735f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page