Skip to main content

scibite-toolkit - python library for calling TERMite, TExpress and other tools, and processing results

Project description

Project Description

scibite-toolkit - python library for making calls to SciBite's TERMite, TExpress and SciBite Search. The library also enables post-processing of the JSON returned from such requests.

Install

$ pip3 install termite_toolkit

Versions listed on PyPi!

Example call to TERMite

In this example call to TERMite, we will annotate one zip file from MEDLINE and then process the output to a dataframe with the built in functions of the toolkit.

We will use the first zip file from PubMed's Annual Baseline files.

Two example scripts will be shown - one that authenticates with a SciBite hosted instance of TERMite and one that hosts with a local instance of TERMite (hosted by customer).

*Please note the following:

you can test with any file. If you would like to test with just text (and not a file), please use "t.set_text('your text') and don't use the t.set_binary_content command.

Example 1 - SciBite Hosted instance of TERMite

import pandas as pd
from termite_toolkit import termite

# Initialize your TERMite Request
t = termite.TermiteRequestBuilder()

# Specify your TERMite API Endpoint and login URL
t.set_url('url_endpoint')
t.set_saas_login_url('login_url')

# Authenticate with the instance
username = 'username
password = 'password'
t.set_auth_saas(username, password)

# Set your runtime options
t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
t.set_input_format('medline.xml')  # the input format of the data sent to TERMite
t.set_output_format('json')  # the output format of the response from TERMite
t.set_binary_content('path/to/file')  # the file path of the file you want to annotate
t.set_subsume(True)  # set subsume run time option (RTO) to true

# Execute the request and convert response to dataframe for easy analysis
termite_response = t.execute()
resp_df = termite.get_termite_dataframe(termite_response)
print(resp_df.head(3))

Example 2 - Local Instance of TERMite (Hosted by Customer)

import pandas as pd
from termite_toolkit import termite

# Initialize your TERMite Request
t = termite.TermiteRequestBuilder()

# Specify your TERMite API Endpoint and login URL
t.set_url('url_endpoint')

# Authenticate with the instance
username = 'username'
password = 'password^'
t.set_basic_auth(username, password)

# Set your runtime options
t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
t.set_input_format('medline.xml')  # the input format of the data sent to TERMite
t.set_output_format('json')  # the output format of the response from TERMite
t.set_binary_content('path/to/file')  # the file path of the file you want to annotate
t.set_subsume(True)  # set subsume run time option (RTO) to true

# Execute the request and convert response to dataframe for easy analysis
termite_response = t.execute()
resp_df = termite.get_termite_dataframe(termite_response)
print(resp_df.head(3))

Example call to TExpress

In this example call to TExpress, we will annotate one zip file from Medline and then process the output to a dataframe with the built in functions of the toolkit.

We will use the first zip file from PubMed's Annual Baseline files.

Two example scripts will be shown - one that authenticates with a SciBite hosted instance of TExpress and one that authenticates with a local instance of TExpress (hosted by the customer).

Please note the following:

you can test with any file. If you would like to test with just text (and not a file), please use "t.set_text('your text') and don't use the t.set_binary_content command.

Example 1 - SciBite Hosted Instance of TExpress

import pandas as pd
from termite_toolkit import texpress

# Initialize your TERMite Request
t = texpress.TexpressRequestBuilder()

# Specify your TERMite API Endpoint and login URL
t.set_url('url_endpoint')
t.set_saas_login_url('login_url')

# Authenticate with the instance
username = 'username'
password = 'password'
t.set_auth_saas(username, password)

# Set your runtime options
t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
t.set_input_format('medline.xml')  # the input format of the data sent to TERMite
t.set_output_format('json')  # the output format of the response from TERMite
t.set_binary_content('path/to/file')  # the file path of the file you want to annotate
t.set_subsume(True)  # set subsume run time option (RTO) to true
t.set_pattern(':(INDICATION):{0,5}:(INDICATION)')  # pattern to tell TExpress what to look for within data

# Execute the request and convert response to dataframe for easy analysis
texpress_resp = t.execute()
resp_df = texpress.get_texpress_dataframe(texpress_resp)
print(resp_df.head(3))

Example 2 - Local Instance of TExpress (Hosted by Customer)

import pandas as pd
from termite_toolkit import texpress

# Initialize your TERMite Request
t = texpress.TexpressRequestBuilder()

# Specify your TERMite API Endpoint
t.set_url('url_endpoint')

# Authenticate with the instance
username = 'username'
password = 'password'
t.set_basic_auth(username, password)

# Set your runtime options
t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
t.set_input_format('pdf')  # the input format of the data sent to TERMite
t.set_output_format('medline.xml')  # the output format of the response from TERMite
t.set_binary_content('/path/to/file')  # the file path of the file you want to annotate
t.set_subsume(True)  # set subsume run time option (RTO) to true
t.set_pattern(':(INDICATION):{0,5}:(INDICATION)')  # pattern to tell TExpress what to look for within data

# Execute the request and convert response to dataframe for easy analysis
texpress_resp = t.execute()
resp_df = texpress.get_texpress_dataframe(texpress_resp)
print(resp_df.head(3))

Example call to SciBite Search

from termite_toolkit import scibite_search

# First authenticate - The examples provided are assuming our SaaS-hosted instances, adapt accordingly
ss_home = 'https://yourdomain-search.saas.scibite.com/'
sbs_auth_url = "https://yourdomain.saas.scibite.com/"
client_id = "yourclientid"
client_secret ="yourclientsecret"
s = scibite_search.SBSRequestBuilder()
s.set_url(ss_home)
s.set_auth_url(sbs_auth_url)
s.set_oauth2(client_id,client_secret) #Authentication will last according to what was setup at the client

# Now you can use the request object

# Search over documents
sample_query = 'schema_id="clinical_trial" AND (title~INDICATION$D011565 AND DRUG$*)'

# Note that endpoint is capped at 100 results, but you can paginage using the offset parameter
response = s.get_docs(query=sample_query,markup=True,limit=100)

# Co-ocurrence search across sentences
# Get the top 50 co-ocurrence sentence aggregates for psoriasis indication and any gene
response = s.get_aggregates(query='INDICATION$D011565',vocabs=['HGNCGENE'],limit=50)

Example call to Workbench

from termite_toolkit import workbench
#first authenticate with the instance
username = 'username'
password = 'password'
client_id = 'client_id'
wb = WorkbenchRequestBuilder()
url = 'https://workbench-url.com'
wb.set_oauth2(client_id, username, password)
#then set up your call - here we will be creating a WB dataset, uploading a file to it and annotating it
wb.set_dataset_name = 'My Test Dataset'
wb.set_dataset_desc = 'My Test Description'
wb.create_dataset()
wb.set_file_input('path/to/file.xlsx')
wb.upload_file_to_dataset()
#In this example, we will only annotate two columns with pre-selected VOCabs.
#If you would like to tell WB to annotate the dataset without setting a termite config, just call auto_annotate_dataset
vocabs = [[5,6],[8,9]]
attrs = [200,201]
wb.set_termite_config('',vocabs,attrs)
wb.auto_annotate_dataset()

License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

termite_toolkit-0.6.1.tar.gz (32.1 kB view hashes)

Uploaded Source

Built Distribution

termite_toolkit-0.6.1-py3-none-any.whl (34.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page