Skip to main content

scibite-toolkit - python library for calling SciBite applications: TERMite, TExpress, SciBite Search, CENtree and Workbench. The library also enables processing of the JSON results from such requests

Project description

Project Description

scibite-toolkit - python library for making calls to SciBite's TERMite, CENtree, Workbench and SciBite Search. The library also enables post-processing of the JSON returned from such requests.

Install

$ pip3 install scibite_toolkit

Versions listed on PyPi!

Example call to TERMite

In this example call to TERMite, we will annotate one zip file from MEDLINE and then process the output to a dataframe with the built in functions of the toolkit.

We will use the first zip file from PubMed's Annual Baseline files.

Two example scripts will be shown - one that authenticates with a SciBite hosted instance of TERMite and one that hosts with a local instance of TERMite (hosted by customer).

*Please note the following:

you can test with any file. If you would like to test with just text (and not a file), please use "t.set_text('your text') and don't use the t.set_binary_content command.

Example 1 - SciBite Hosted instance of TERMite

import pandas as pd
from scibite_toolkit import termite

# Initialize your TERMite Request
t = termite.TermiteRequestBuilder()

# Specify your TERMite API Endpoint and login URL
t.set_url('url_endpoint')
t.set_saas_login_url('login_url')

# Authenticate with the instance
username = 'username
password = 'password'
t.set_auth_saas(username, password)

# Set your runtime options
t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
t.set_input_format('medline.xml')  # the input format of the data sent to TERMite
t.set_output_format('json')  # the output format of the response from TERMite
t.set_binary_content('path/to/file')  # the file path of the file you want to annotate
t.set_subsume(True)  # set subsume run time option (RTO) to true

# Execute the request and convert response to dataframe for easy analysis
termite_response = t.execute()
resp_df = termite.get_termite_dataframe(termite_response)
print(resp_df.head(3))

Example 2 - Local Instance of TERMite (Hosted by Customer)

import pandas as pd
from scibite_toolkit import termite

# Initialize your TERMite Request
t = termite.TermiteRequestBuilder()

# Specify your TERMite API Endpoint and login URL
t.set_url('url_endpoint')

# Authenticate with the instance
username = 'username'
password = 'password^'
t.set_basic_auth(username, password)

# Set your runtime options
t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
t.set_input_format('medline.xml')  # the input format of the data sent to TERMite
t.set_output_format('json')  # the output format of the response from TERMite
t.set_binary_content('path/to/file')  # the file path of the file you want to annotate
t.set_subsume(True)  # set subsume run time option (RTO) to true

# Execute the request and convert response to dataframe for easy analysis
termite_response = t.execute()
resp_df = termite.get_termite_dataframe(termite_response)
print(resp_df.head(3))

Example call to TExpress

In this example call to TExpress, we will annotate one zip file from Medline and then process the output to a dataframe with the built in functions of the toolkit.

We will use the first zip file from PubMed's Annual Baseline files.

Two example scripts will be shown - one that authenticates with a SciBite hosted instance of TExpress and one that authenticates with a local instance of TExpress (hosted by the customer).

Please note the following:

you can test with any file. If you would like to test with just text (and not a file), please use "t.set_text('your text') and don't use the t.set_binary_content command.

Example 1 - SciBite Hosted Instance of TExpress

import pandas as pd
from scibite_toolkit import texpress

# Initialize your TERMite Request
t = texpress.TexpressRequestBuilder()

# Specify your TERMite API Endpoint and login URL
t.set_url('url_endpoint')
t.set_saas_login_url('login_url')

# Authenticate with the instance
username = 'username'
password = 'password'
t.set_auth_saas(username, password)

# Set your runtime options
t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
t.set_input_format('medline.xml')  # the input format of the data sent to TERMite
t.set_output_format('json')  # the output format of the response from TERMite
t.set_binary_content('path/to/file')  # the file path of the file you want to annotate
t.set_subsume(True)  # set subsume run time option (RTO) to true
t.set_pattern(':(INDICATION):{0,5}:(INDICATION)')  # pattern to tell TExpress what to look for within data

# Execute the request and convert response to dataframe for easy analysis
texpress_resp = t.execute()
resp_df = texpress.get_texpress_dataframe(texpress_resp)
print(resp_df.head(3))

Example 2 - Local Instance of TExpress (Hosted by Customer)

import pandas as pd
from scibite_toolkit import texpress

# Initialize your TERMite Request
t = texpress.TexpressRequestBuilder()

# Specify your TERMite API Endpoint
t.set_url('url_endpoint')

# Authenticate with the instance
username = 'username'
password = 'password'
t.set_basic_auth(username, password)

# Set your runtime options
t.set_entities('INDICATION')  # comma separated list of VOCabs you want to run over your data
t.set_input_format('pdf')  # the input format of the data sent to TERMite
t.set_output_format('medline.xml')  # the output format of the response from TERMite
t.set_binary_content('/path/to/file')  # the file path of the file you want to annotate
t.set_subsume(True)  # set subsume run time option (RTO) to true
t.set_pattern(':(INDICATION):{0,5}:(INDICATION)')  # pattern to tell TExpress what to look for within data

# Execute the request and convert response to dataframe for easy analysis
texpress_resp = t.execute()
resp_df = texpress.get_texpress_dataframe(texpress_resp)
print(resp_df.head(3))

Example call to SciBite Search

from scibite_toolkit import scibite_search

# First authenticate - The examples provided are assuming our SaaS-hosted instances, adapt accordingly
ss_home = 'https://yourdomain-search.saas.scibite.com/'
sbs_auth_url = "https://yourdomain.saas.scibite.com/"
client_id = "yourclientid"
client_secret ="yourclientsecret"
s = scibite_search.SBSRequestBuilder()
s.set_url(ss_home)
s.set_auth_url(sbs_auth_url)
s.set_oauth2(client_id,client_secret) #Authentication will last according to what was was set up when generating the client

# Now you can use the request object

# Search over documents
sample_query = 'schema_id="clinical_trial" AND (title~INDICATION$D011565 AND DRUG$*)'

# Note that endpoint is capped at 100 results, but you can paginate using the offset parameter
response = s.get_docs(query=sample_query,markup=True,limit=100)

# Co-ocurrence search across sentences
# Get the top 50 co-ocurrence sentence aggregates for psoriasis indication and any gene
response = s.get_aggregates(query='INDICATION$D011565',vocabs=['HGNCGENE'],limit=50)

Example call to Workbench

from scibite_toolkit import workbench
#first authenticate with the instance
username = 'username'
password = 'password'
client_id = 'client_id'
wb = WorkbenchRequestBuilder()
url = 'https://workbench-url.com'
wb.set_oauth2(client_id, username, password)
#then set up your call - here we will be creating a WB dataset, uploading a file to it and annotating it
wb.set_dataset_name = 'My Test Dataset'
wb.set_dataset_desc = 'My Test Description'
wb.create_dataset()
wb.set_file_input('path/to/file.xlsx')
wb.upload_file_to_dataset()
#In this example, we will only annotate two columns with pre-selected VOCabs.
#If you would like to tell WB to annotate the dataset without setting a termite config, just call auto_annotate_dataset
vocabs = [[5,6],[8,9]]
attrs = [200,201]
wb.set_termite_config('',vocabs,attrs)
wb.auto_annotate_dataset()

License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scibite_toolkit-1.0.0.tar.gz (40.3 kB view details)

Uploaded Source

Built Distribution

scibite_toolkit-1.0.0-py3-none-any.whl (42.8 kB view details)

Uploaded Python 3

File details

Details for the file scibite_toolkit-1.0.0.tar.gz.

File metadata

  • Download URL: scibite_toolkit-1.0.0.tar.gz
  • Upload date:
  • Size: 40.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for scibite_toolkit-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8e09e8399f5c4e0b6ce0f14cbc94b28842e1a1d428a20f10955c2398087e82f4
MD5 c279fa3e8eef04cbe36d674a45aa16a0
BLAKE2b-256 92bb8e5590d06ad456e1aa1b047bf0991c0897e8c7ed18aef904e3c14ae9d054

See more details on using hashes here.

File details

Details for the file scibite_toolkit-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scibite_toolkit-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c5dca88c6a6f692b2e41e6766c29820dd70e278fc9f46cbf2f0c46bb10a89859
MD5 309d9689e23eb44c68c0831460e5a579
BLAKE2b-256 6b5e6989e5cd5797f183f5937826028e7b20bffceed55d85eb1313ee283be7dd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page