Interface to WormBase (www.wormbase.org) curation data, including literature management and NLP functions
Project description
WBtools
Interface to WormBase curation database and Text Mining functions
Access WormBase paper corpus information by loading pdf files (converted to txt) and curation info from the WormBase database. The package also exposes text mining functions on papers' fulltext.
Installation
pip install wbtools
Usage example
Get sentences from a WormBase paper
from wbtools.literature.corpus import CorpusManager
paper_id = "00050564"
cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
paper_ids=[paper_id], file_server_host="file_server_base_url", file_server_user="username",
file_server_passwd="password")
sentences = cm.get_paper(paper_id).get_text_docs(split_sentences=True)
Get the latest papers (up to 50) added to WormBase or modified in the last 30 days
from wbtools.literature.corpus import CorpusManager
import datetime
one_month_ago = (datetime.datetime.now() - datetime.timedelta(days=30)).strftime("%M/%D/%Y")
cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
from_date=one_month_ago, max_num_papers=50,
file_server_host="file_server_base_url", file_server_user="username",
file_server_passwd="password")
paper_ids = [paper.paper_id for paper in cm.get_all_papers()]
Get the latest 50 papers added to WormBase or modified that have a final pdf version and have been flagged by WB paper classification pipeline, excluding reviews and papers with temp files only (proofs)
from wbtools.literature.corpus import CorpusManager
import datetime
cm = CorpusManager()
cm.load_from_wb_database(db_name="wb_dbname", db_user="wb_dbuser", db_password="wb_dbpasswd", db_host="wb_dbhost",
max_num_papers=50, must_be_autclass_flagged=True, exclude_pap_types=['Review'],
exclude_temp_pdf=True, file_server_host="file_server_base_url",
file_server_user="username", file_server_passwd="password")
paper_ids = [paper.paper_id for paper in cm.get_all_papers()]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wbtools-3.0.11.tar.gz
(41.3 kB
view details)
Built Distribution
wbtools-3.0.11-py3-none-any.whl
(55.6 kB
view details)
File details
Details for the file wbtools-3.0.11.tar.gz
.
File metadata
- Download URL: wbtools-3.0.11.tar.gz
- Upload date:
- Size: 41.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f09a2c4d0000e5bf63819f42beb271520d529fa462a0e687a7d5b1bf7fbd280 |
|
MD5 | 8d3cd27179eb1f7c32e9df085d2cb151 |
|
BLAKE2b-256 | 04b9a20cdad1a955d442acf123dcc9ebade387f7cb939a3979e22c78001e4c99 |
File details
Details for the file wbtools-3.0.11-py3-none-any.whl
.
File metadata
- Download URL: wbtools-3.0.11-py3-none-any.whl
- Upload date:
- Size: 55.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f066e79dbeaeab651fe6dd6433de438d2d06c49343da2693611daa51de87758 |
|
MD5 | 143158435c71b52b80b0c93a7ba4f1b9 |
|
BLAKE2b-256 | d9e2c90f2c5ed788311b91d5134565221d752fccf6ce08024940833eb7cf073d |