scibite-toolkit - python library for calling SciBite applications: TERMite, TExpress, SciBite Search, CENtree and Workbench. The library also enables processing of the JSON results from such requests
Project description
Project Description
scibite-toolkit - python library for making calls to SciBite's TERMite, CENtree, Workbench and SciBite Search. The library also enables post-processing of the JSON returned from such requests.
Install
$ pip3 install scibite_toolkit
Versions listed on PyPi!
Example call to TERMite
In this example call to TERMite, we will annotate one zip file from MEDLINE and then process the output to a dataframe with the built in functions of the toolkit.
We will use the first zip file from PubMed's Annual Baseline files.
Two example scripts will be shown - one that authenticates with a SciBite hosted instance of TERMite and one that hosts with a local instance of TERMite (hosted by customer).
*Please note the following:
you can test with any file. If you would like to test with just text (and not a file), please use "t.set_text('your text') and don't use the t.set_binary_content command.
Example 1 - SciBite Hosted instance of TERMite
import pandas as pd
from scibite_toolkit import termite
# Initialize your TERMite Request
t = termite.TermiteRequestBuilder()
# Specify your TERMite API Endpoint and login URL
t.set_url('url_endpoint')
t.set_saas_login_url('login_url')
# Authenticate with the instance
username = 'username
password = 'password'
t.set_auth_saas(username, password)
# Set your runtime options
t.set_entities('INDICATION') # comma separated list of VOCabs you want to run over your data
t.set_input_format('medline.xml') # the input format of the data sent to TERMite
t.set_output_format('json') # the output format of the response from TERMite
t.set_binary_content('path/to/file') # the file path of the file you want to annotate
t.set_subsume(True) # set subsume run time option (RTO) to true
# Execute the request and convert response to dataframe for easy analysis
termite_response = t.execute()
resp_df = termite.get_termite_dataframe(termite_response)
print(resp_df.head(3))
Example 2 - Local Instance of TERMite (Hosted by Customer)
import pandas as pd
from scibite_toolkit import termite
# Initialize your TERMite Request
t = termite.TermiteRequestBuilder()
# Specify your TERMite API Endpoint and login URL
t.set_url('url_endpoint')
# Authenticate with the instance
username = 'username'
password = 'password^'
t.set_basic_auth(username, password)
# Set your runtime options
t.set_entities('INDICATION') # comma separated list of VOCabs you want to run over your data
t.set_input_format('medline.xml') # the input format of the data sent to TERMite
t.set_output_format('json') # the output format of the response from TERMite
t.set_binary_content('path/to/file') # the file path of the file you want to annotate
t.set_subsume(True) # set subsume run time option (RTO) to true
# Execute the request and convert response to dataframe for easy analysis
termite_response = t.execute()
resp_df = termite.get_termite_dataframe(termite_response)
print(resp_df.head(3))
Example call to TExpress
In this example call to TExpress, we will annotate one zip file from Medline and then process the output to a dataframe with the built in functions of the toolkit.
We will use the first zip file from PubMed's Annual Baseline files.
Two example scripts will be shown - one that authenticates with a SciBite hosted instance of TExpress and one that authenticates with a local instance of TExpress (hosted by the customer).
Please note the following:
you can test with any file. If you would like to test with just text (and not a file), please use "t.set_text('your text') and don't use the t.set_binary_content command.
Example 1 - SciBite Hosted Instance of TExpress
import pandas as pd
from scibite_toolkit import texpress
# Initialize your TERMite Request
t = texpress.TexpressRequestBuilder()
# Specify your TERMite API Endpoint and login URL
t.set_url('url_endpoint')
t.set_saas_login_url('login_url')
# Authenticate with the instance
username = 'username'
password = 'password'
t.set_auth_saas(username, password)
# Set your runtime options
t.set_entities('INDICATION') # comma separated list of VOCabs you want to run over your data
t.set_input_format('medline.xml') # the input format of the data sent to TERMite
t.set_output_format('json') # the output format of the response from TERMite
t.set_binary_content('path/to/file') # the file path of the file you want to annotate
t.set_subsume(True) # set subsume run time option (RTO) to true
t.set_pattern(':(INDICATION):{0,5}:(INDICATION)') # pattern to tell TExpress what to look for within data
# Execute the request and convert response to dataframe for easy analysis
texpress_resp = t.execute()
resp_df = texpress.get_texpress_dataframe(texpress_resp)
print(resp_df.head(3))
Example 2 - Local Instance of TExpress (Hosted by Customer)
import pandas as pd
from scibite_toolkit import texpress
# Initialize your TERMite Request
t = texpress.TexpressRequestBuilder()
# Specify your TERMite API Endpoint
t.set_url('url_endpoint')
# Authenticate with the instance
username = 'username'
password = 'password'
t.set_basic_auth(username, password)
# Set your runtime options
t.set_entities('INDICATION') # comma separated list of VOCabs you want to run over your data
t.set_input_format('pdf') # the input format of the data sent to TERMite
t.set_output_format('medline.xml') # the output format of the response from TERMite
t.set_binary_content('/path/to/file') # the file path of the file you want to annotate
t.set_subsume(True) # set subsume run time option (RTO) to true
t.set_pattern(':(INDICATION):{0,5}:(INDICATION)') # pattern to tell TExpress what to look for within data
# Execute the request and convert response to dataframe for easy analysis
texpress_resp = t.execute()
resp_df = texpress.get_texpress_dataframe(texpress_resp)
print(resp_df.head(3))
Example call to SciBite Search
from scibite_toolkit import scibite_search
# First authenticate - The examples provided are assuming our SaaS-hosted instances, adapt accordingly
ss_home = 'https://yourdomain-search.saas.scibite.com/'
sbs_auth_url = "https://yourdomain.saas.scibite.com/"
client_id = "yourclientid"
client_secret ="yourclientsecret"
s = scibite_search.SBSRequestBuilder()
s.set_url(ss_home)
s.set_auth_url(sbs_auth_url)
s.set_oauth2(client_id,client_secret) #Authentication will last according to what was was set up when generating the client
# Now you can use the request object
# Search over documents
sample_query = 'schema_id="clinical_trial" AND (title~INDICATION$D011565 AND DRUG$*)'
# Note that endpoint is capped at 100 results, but you can paginate using the offset parameter
response = s.get_docs(query=sample_query,markup=True,limit=100)
# Co-ocurrence search across sentences
# Get the top 50 co-ocurrence sentence aggregates for psoriasis indication and any gene
response = s.get_aggregates(query='INDICATION$D011565',vocabs=['HGNCGENE'],limit=50)
Example call to Workbench
from scibite_toolkit import workbench
#first authenticate with the instance
username = 'username'
password = 'password'
client_id = 'client_id'
wb = WorkbenchRequestBuilder()
url = 'https://workbench-url.com'
wb.set_oauth2(client_id, username, password)
#then set up your call - here we will be creating a WB dataset, uploading a file to it and annotating it
wb.set_dataset_name = 'My Test Dataset'
wb.set_dataset_desc = 'My Test Description'
wb.create_dataset()
wb.set_file_input('path/to/file.xlsx')
wb.upload_file_to_dataset()
#In this example, we will only annotate two columns with pre-selected VOCabs.
#If you would like to tell WB to annotate the dataset without setting a termite config, just call auto_annotate_dataset
vocabs = [[5,6],[8,9]]
attrs = [200,201]
wb.set_termite_config('',vocabs,attrs)
wb.auto_annotate_dataset()
License
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scibite_toolkit-1.0.0.tar.gz
.
File metadata
- Download URL: scibite_toolkit-1.0.0.tar.gz
- Upload date:
- Size: 40.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e09e8399f5c4e0b6ce0f14cbc94b28842e1a1d428a20f10955c2398087e82f4 |
|
MD5 | c279fa3e8eef04cbe36d674a45aa16a0 |
|
BLAKE2b-256 | 92bb8e5590d06ad456e1aa1b047bf0991c0897e8c7ed18aef904e3c14ae9d054 |
File details
Details for the file scibite_toolkit-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: scibite_toolkit-1.0.0-py3-none-any.whl
- Upload date:
- Size: 42.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5dca88c6a6f692b2e41e6766c29820dd70e278fc9f46cbf2f0c46bb10a89859 |
|
MD5 | 309d9689e23eb44c68c0831460e5a579 |
|
BLAKE2b-256 | 6b5e6989e5cd5797f183f5937826028e7b20bffceed55d85eb1313ee283be7dd |