scibite-toolkit - python library for calling TERMite, TExpress and other tools, and processing results
Project description
Project Description
scibite-toolkit - python library for making calls to SciBite's TERMite, TExpress and SciBite Search. The library also enables post-processing of the JSON returned from such requests.
Install
$ pip3 install termite_toolkit
Versions listed on PyPi!
Example call to TERMite
In this example call to TERMite, we will annotate one zip file from MEDLINE and then process the output to a dataframe with the built in functions of the toolkit.
We will use the first zip file from PubMed's Annual Baseline files.
Two example scripts will be shown - one that authenticates with a SciBite hosted instance of TERMite and one that hosts with a local instance of TERMite (hosted by customer).
*Please note the following:
you can test with any file. If you would like to test with just text (and not a file), please use "t.set_text('your text') and don't use the t.set_binary_content command.
Example 1 - SciBite Hosted instance of TERMite
import pandas as pd
from termite_toolkit import termite
# Initialize your TERMite Request
t = termite.TermiteRequestBuilder()
# Specify your TERMite API Endpoint and login URL
t.set_url('url_endpoint')
t.set_saas_login_url('login_url')
# Authenticate with the instance
username = 'username
password = 'password'
t.set_auth_saas(username, password)
# Set your runtime options
t.set_entities('INDICATION') # comma separated list of VOCabs you want to run over your data
t.set_input_format('medline.xml') # the input format of the data sent to TERMite
t.set_output_format('json') # the output format of the response from TERMite
t.set_binary_content('path/to/file') # the file path of the file you want to annotate
t.set_subsume(True) # set subsume run time option (RTO) to true
# Execute the request and convert response to dataframe for easy analysis
termite_response = t.execute()
resp_df = termite.get_termite_dataframe(termite_response)
print(resp_df.head(3))
Example 2 - Local Instance of TERMite (Hosted by Customer)
import pandas as pd
from termite_toolkit import termite
# Initialize your TERMite Request
t = termite.TermiteRequestBuilder()
# Specify your TERMite API Endpoint and login URL
t.set_url('url_endpoint')
# Authenticate with the instance
username = 'username'
password = 'password^'
t.set_basic_auth(username, password)
# Set your runtime options
t.set_entities('INDICATION') # comma separated list of VOCabs you want to run over your data
t.set_input_format('medline.xml') # the input format of the data sent to TERMite
t.set_output_format('json') # the output format of the response from TERMite
t.set_binary_content('path/to/file') # the file path of the file you want to annotate
t.set_subsume(True) # set subsume run time option (RTO) to true
# Execute the request and convert response to dataframe for easy analysis
termite_response = t.execute()
resp_df = termite.get_termite_dataframe(termite_response)
print(resp_df.head(3))
Example call to TExpress
In this example call to TExpress, we will annotate one zip file from Medline and then process the output to a dataframe with the built in functions of the toolkit.
We will use the first zip file from PubMed's Annual Baseline files.
Two example scripts will be shown - one that authenticates with a SciBite hosted instance of TExpress and one that authenticates with a local instance of TExpress (hosted by the customer).
Please note the following:
you can test with any file. If you would like to test with just text (and not a file), please use "t.set_text('your text') and don't use the t.set_binary_content command.
Example 1 - SciBite Hosted Instance of TExpress
import pandas as pd
from termite_toolkit import texpress
# Initialize your TERMite Request
t = texpress.TexpressRequestBuilder()
# Specify your TERMite API Endpoint and login URL
t.set_url('url_endpoint')
t.set_saas_login_url('login_url')
# Authenticate with the instance
username = 'username'
password = 'password'
t.set_auth_saas(username, password)
# Set your runtime options
t.set_entities('INDICATION') # comma separated list of VOCabs you want to run over your data
t.set_input_format('medline.xml') # the input format of the data sent to TERMite
t.set_output_format('json') # the output format of the response from TERMite
t.set_binary_content('path/to/file') # the file path of the file you want to annotate
t.set_subsume(True) # set subsume run time option (RTO) to true
t.set_pattern(':(INDICATION):{0,5}:(INDICATION)') # pattern to tell TExpress what to look for within data
# Execute the request and convert response to dataframe for easy analysis
texpress_resp = t.execute()
resp_df = texpress.get_texpress_dataframe(texpress_resp)
print(resp_df.head(3))
Example 2 - Local Instance of TExpress (Hosted by Customer)
import pandas as pd
from termite_toolkit import texpress
# Initialize your TERMite Request
t = texpress.TexpressRequestBuilder()
# Specify your TERMite API Endpoint
t.set_url('url_endpoint')
# Authenticate with the instance
username = 'username'
password = 'password'
t.set_basic_auth(username, password)
# Set your runtime options
t.set_entities('INDICATION') # comma separated list of VOCabs you want to run over your data
t.set_input_format('pdf') # the input format of the data sent to TERMite
t.set_output_format('medline.xml') # the output format of the response from TERMite
t.set_binary_content('/path/to/file') # the file path of the file you want to annotate
t.set_subsume(True) # set subsume run time option (RTO) to true
t.set_pattern(':(INDICATION):{0,5}:(INDICATION)') # pattern to tell TExpress what to look for within data
# Execute the request and convert response to dataframe for easy analysis
texpress_resp = t.execute()
resp_df = texpress.get_texpress_dataframe(texpress_resp)
print(resp_df.head(3))
Example call to SciBite Search
from termite_toolkit import scibite_search
# First authenticate - The examples provided are assuming our SaaS-hosted instances, adapt accordingly
ss_home = 'https://yourdomain-search.saas.scibite.com/'
sbs_auth_url = "https://yourdomain.saas.scibite.com/"
client_id = "yourclientid"
client_secret ="yourclientsecret"
s = scibite_search.SBSRequestBuilder()
s.set_url(ss_home)
s.set_auth_url(sbs_auth_url)
s.set_oauth2(client_id,client_secret) #Authentication will last according to what was setup at the client
# Now you can use the request object
# Search over documents
sample_query = 'schema_id="clinical_trial" AND (title~INDICATION$D011565 AND DRUG$*)'
# Note that endpoint is capped at 100 results, but you can paginage using the offset parameter
response = s.get_docs(query=sample_query,markup=True,limit=100)
# Co-ocurrence search across sentences
# Get the top 50 co-ocurrence sentence aggregates for psoriasis indication and any gene
response = s.get_aggregates(query='INDICATION$D011565',vocabs=['HGNCGENE'],limit=50)
License
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for termite_toolkit-0.6.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c87fbabdf65b8f56c32341d6c204346b811302c5e95802d87754b3a8b10c620 |
|
MD5 | b163f36993d6976554fee11c25dee1de |
|
BLAKE2b-256 | 979d23b998fc432e28b7780fe10de391f3a5a4df0dab9327d7b088b937b63416 |