scibite-toolkit - python library for calling SciBite applications: TERMite, TExpress, SciBite Search, CENtree and Workbench. The library also enables processing of the JSON results from such requests

These details have not been verified by PyPI

Project links

Project description

SciBite Toolkit

Python library for making API calls to SciBite's suite of products and processing the JSON responses.

Supported Products

TERMite - Entity recognition and semantic enrichment (version 6.x)
TERMite 7 - Next-generation entity recognition with modern OAuth2 authentication
TExpress - Pattern-based entity relationship extraction
SciBite Search - Semantic search, document and entity analytics
CENtree - Ontology management, navigation, and integration
CENtree VectorDB Uploader - Upload ontology embedding CSVs from S3 or local files to Qdrant
CENtree Vector Generator - End-to-end ontology→embedding CSV pipeline
CENtree Ontology ML - OWL→sentence corpus, embedding generation, and Qdrant indexing
Workbench - Dataset annotation and management

Installation

pip install scibite-toolkit

See versions on PyPI

Quick Start Examples

TERMite 7 - Modern client with OAuth2
TERMite 6 - Legacy client
TExpress - Pattern matching
SciBite Search
CENtree - Ontology navigation
CENtree VectorDB Uploader - S3/local→Qdrant upload
CENtree Vector Generator - Ontology→embedding CSV
CENtree Ontology ML - OWL→embeddings pipeline
Workbench

TERMite 7 Examples

TERMite 7 is the modern version with enhanced OAuth2 authentication and improved API.

OAuth2 Client Credentials (SaaS - Recommended)

For modern SaaS deployments using a separate authentication server:

from scibite_toolkit import termite7

# Initialize with context manager for automatic cleanup
with termite7.Termite7RequestBuilder() as t:
    # Set URLs
    t.set_url('https://termite.saas.scibite.com')
    t.set_token_url('https://auth.saas.scibite.com')

    # Authenticate with OAuth2 client credentials
    if not t.set_oauth2('your_client_id', 'your_client_secret'):
        print("Authentication failed!")
        exit(1)

    # Annotate text
    t.set_entities('DRUG,INDICATION')
    t.set_subsume(True)
    t.set_text('Aspirin is used to treat headaches and reduce inflammation.')

    response = t.annotate_text()

    # Process the response
    df = termite7.process_annotation_output(response)
    print(df.head())

OAuth2 Password Grant (Legacy)

For on-premise deployments using username/password authentication:

from scibite_toolkit import termite7

t = termite7.Termite7RequestBuilder()

# Set main TERMite URL and token URL (same server for legacy)
t.set_url('https://termite.example.com')
t.set_token_url('https://termite.example.com')

# Authenticate with username and password
if not t.set_oauth2_legacy('client_id', 'username', 'password'):
    print("Authentication failed!")
    exit(1)

# Annotate a document
t.set_entities('INDICATION,DRUG')
t.set_parser_id('generic')
t.set_file('path/to/document.pdf')

response = t.annotate_document()

# Process the response
df = termite7.process_annotation_output(response)
print(df)

# Clean up file handles
t.close()

Get System Status

from scibite_toolkit import termite7

t = termite7.Termite7RequestBuilder()
t.set_url('https://termite.example.com')
t.set_token_url('https://auth.example.com')
t.set_oauth2('client_id', 'client_secret')

# Get system status
status = termite7.get_system_status(t.url, t.headers)
print(f"Server Version: {status['data']['serverVersion']}")

# Get available vocabularies
vocabs = termite7.get_vocabs(t.url, t.headers)
print(f"Available vocabularies: {len(vocabs['data'])}")

# Get runtime options
rtos = termite7.get_runtime_options(t.url, t.headers)
print(rtos)

TERMite 6 Examples

For legacy TERMite 6.x deployments.

SciBite Hosted (SaaS)

from scibite_toolkit import termite

# Initialize
t = termite.TermiteRequestBuilder()

# Configure
t.set_url('https://termite.saas.scibite.com')
t.set_saas_login_url('https://login.saas.scibite.com')

# Authenticate
t.set_auth_saas('username', 'password')

# Set runtime options
t.set_entities('INDICATION')
t.set_input_format('medline.xml')
t.set_output_format('json')
t.set_binary_content('path/to/file.xml')
t.set_subsume(True)

# Execute and process
response = t.execute()
df = termite.get_termite_dataframe(response)
print(df.head(3))

Local Instance (Customer Hosted)

from scibite_toolkit import termite

t = termite.TermiteRequestBuilder()
t.set_url('https://termite.local.example.com')

# Basic authentication for local instances
t.set_basic_auth('username', 'password')

# Configure and execute
t.set_entities('INDICATION')
t.set_input_format('medline.xml')
t.set_output_format('json')
t.set_binary_content('path/to/file.xml')
t.set_subsume(True)

response = t.execute()
df = termite.get_termite_dataframe(response)
print(df.head(3))

TExpress Examples

Pattern-based entity relationship extraction.

SciBite Hosted

from scibite_toolkit import texpress

t = texpress.TexpressRequestBuilder()

t.set_url('https://texpress.saas.scibite.com')
t.set_saas_login_url('https://login.saas.scibite.com')
t.set_auth_saas('username', 'password')

# Set pattern to find relationships
t.set_entities('INDICATION,DRUG')
t.set_pattern(':(DRUG):{0,5}:(INDICATION)')  # Find DRUG within 5 words of INDICATION
t.set_input_format('medline.xml')
t.set_output_format('json')
t.set_binary_content('path/to/file.xml')

response = t.execute()
df = texpress.get_texpress_dataframe(response)
print(df.head())

Local Instance

from scibite_toolkit import texpress

t = texpress.TexpressRequestBuilder()
t.set_url('https://texpress.local.example.com')
t.set_basic_auth('username', 'password')

t.set_entities('INDICATION,DRUG')
t.set_pattern(':(INDICATION):{0,5}:(INDICATION)')
t.set_input_format('pdf')
t.set_output_format('json')
t.set_binary_content('/path/to/file.pdf')

response = t.execute()
df = texpress.get_texpress_dataframe(response)
print(df.head())

SciBite Search Example

Semantic search with entity-based queries and aggregations.

from scibite_toolkit import scibite_search

# Configure
s = scibite_search.SBSRequestBuilder()
s.set_url('https://yourdomain-search.saas.scibite.com/')
s.set_auth_url('https://yourdomain.saas.scibite.com/')

# Authenticate with OAuth2
s.set_oauth2('your_client_id', 'your_client_secret')

# Search documents
query = 'schema_id="clinical_trial" AND (title~INDICATION$D011565 AND DRUG$*)'
# Preferred: request specific fields using the new 'fields' parameter (legacy: 'additional_fields')
response = s.get_docs(query=query, markup=True, limit=100, fields=['*'])

# Get co-occurrence aggregations
# Find top 50 genes co-occurring with psoriasis
response = s.get_aggregates(
    query='INDICATION$D011565',
    vocabs=['HGNCGENE'],
    limit=50
)

Note: Preferred parameter name is fields. The legacy additional_fields is still supported for backward compatibility. When both are provided, fields takes precedence.

CENtree Examples

Ontology navigation and search.

Modern Client (Recommended)

The modern centree_clients module provides better error handling, retries, and context manager support.

from scibite_toolkit.centree_clients import CENtreeReaderClient

# Use context manager for automatic cleanup
with CENtreeReaderClient(
    base_url="https://centree.example.com",
    bearer_token="your_token",
    timeout=(3.0, None)  # Quick connect, unlimited read
) as reader:

    # Search by exact label
    hits = reader.get_classes_by_exact_label("efo", "neuron")
    print(f"Found {len(hits)} matches")

    # Get ontology roots
    roots = reader.get_root_entities("efo", "classes", size=10)

    # Get paths from root to target (great for LLM grounding)
    paths = reader.get_paths_from_root("efo", "MONDO_0007739", as_="labels")
    for path in paths:
        print(" → ".join(path))

# Or authenticate with OAuth2
from scibite_toolkit.centree_clients import CENtreeReaderClient

reader = CENtreeReaderClient(base_url="https://centree.example.com")
if reader.set_oauth2(client_id="...", client_secret="..."):
    hits = reader.get_classes_by_exact_label("efo", "lung")
    print(hits)

CENtree VectorDB Uploader Examples

Upload ontology embedding CSVs from S3 or local files to Qdrant for vector search.

Qdrant version compatibility: The qdrant-client Python package must match your Qdrant server version within one minor version (e.g. client 1.7.x for server 1.7.x or 1.8.x). A mismatch may cause silent data corruption or connection errors. Pin the client version to match your server: pip install qdrant-client==1.7.0

CLI Usage

# Upload all datasets under the configured S3 prefix
centree2vec-upload --config config.yaml

# Upload only specific ontologies
centree2vec-upload --config config.yaml --ontology efo mondo

# Upload local embedding files directly (no S3 required)
centree2vec-upload --config config.yaml --local efo_embeddings.csv.gz

# Replace existing vectors for each ontology before uploading
centree2vec-upload --config config.yaml --replace

# Combine --local and --replace to re-upload a single ontology
centree2vec-upload --config config.yaml --local efo_embeddings.csv.gz --replace

# Dry-run to preview which files would be processed
centree2vec-upload --config config.yaml --dry-run

# Public S3 bucket with anonymous access
centree2vec-upload --config config.yaml --anonymous

Python API

from scibite_toolkit.centree_vectordb_uploader import run, load_config

# Load YAML configuration
cfg = load_config("config.yaml")

# Run the upload pipeline
results = run(cfg)
for r in results:
    print(f"{r['ontology']}: {r['total_rows']} vectors uploaded")

# Replace existing vectors for each ontology before uploading
results = run(cfg, replace=True)

# Dry-run to inspect what would be uploaded
results = run(cfg, dry_run=True)

Generate a Starter Config

# Write the bundled example config to the current directory
centree2vec-upload --init

# Or specify a custom path
centree2vec-upload --init my-config.yaml

Configuration Reference

Key	Type	Default	Description
`qdrant.url`	str	—	Required. Qdrant server URL
`qdrant.collection_name`	str	—	Required. Target collection name
`qdrant.distance`	str	`cosine`	Distance metric: `cosine`, `euclid`, `dot`, `manhattan`
`qdrant.api_key_env`	str	—	Env var name holding the Qdrant API key
`qdrant.hnsw_config.m`	int	`32`	HNSW graph connectivity
`qdrant.hnsw_config.ef_construct`	int	`256`	HNSW index build search depth
`qdrant.hnsw_config.full_scan_threshold`	int	`10000`	Point count below which brute-force is used
`s3.bucket`	str	—	Required (S3 mode). S3 bucket name
`s3.prefix`	str	—	Required (S3 mode). S3 key prefix for embedding files
`s3.anonymous`	bool	`false`	Use unsigned requests for public buckets
`s3.endpoint_url`	str	—	Custom S3-compatible endpoint URL
`s3.region`	str	`eu-west-2`	AWS region
`ingest.vector_size`	int	`384`	Embedding dimension
`ingest.batch_size`	int	`1024`	Points per Qdrant upload batch
`ingest.chunk_size`	int	`500000`	Rows per pandas read chunk
`ingest.parallel_uploads`	int	`4`	Parallel upload threads
`ingest.build_indices_after_upload`	bool	`true`	Build payload indexes after upload
`ingest.payload_index_fields`	list	`[metadata.iri, metadata.id, metadata.ontology]`	Fields to index
`selection.ontologies`	list	—	Ontology names to ingest (all if omitted)
`selection.include_files`	list	—	S3 keys to force-include
`selection.exclude_files`	list	—	S3 keys to always skip (highest priority)

CENtree Vector Generator Examples

End-to-end pipeline that takes a local ontology file, generates a sentence corpus via Owl2Sentence, encodes embeddings with sentence-transformers, and writes a gzipped CSV ready for Qdrant upload. Requires the oml extras:

pip install scibite-toolkit[oml]

CLI Usage

# Generate embeddings from an OWL file (outputs <name>_embeddings.csv.gz)
centree2vec-generate ontology.owl

# Custom output path and model
centree2vec-generate ontology.owl -o output.csv.gz --model all-MiniLM-L6-v2

# With debug logging and custom batch size
centree2vec-generate ontology.owl --debug --batch-size 64

Python API

import argparse
from scibite_toolkit.centree_vector_generator import (
    validate_format,
    derive_ontology_name,
    generate_corpus,
    generate_embeddings,
    write_output,
    run,
)

# Use the full pipeline via run()
args = argparse.Namespace(
    input_file="ontology.owl",
    output="embeddings.csv.gz",
    model="sentence-transformers/all-MiniLM-L6-v2",
    batch_size=128,
    debug=False,
    include_sentences=False,
)
run(args)

# Or use individual stages
fmt = validate_format("ontology.owl")       # "xml"
name = derive_ontology_name("ontology.owl")  # "ontology"
df = generate_corpus("ontology.owl", name)
df = generate_embeddings(df, "sentence-transformers/all-MiniLM-L6-v2", batch_size=128)
write_output(df, "ontology_embeddings.csv.gz")

Arguments

Argument	Default	Description
`input_file`	(required)	Path to the ontology file
`--output`, `-o`	`<name>_embeddings.csv.gz`	Output file path
`--model`	`sentence-transformers/all-MiniLM-L6-v2`	Sentence-transformers model
`--batch-size`	`128`	Encoding batch size
`--debug`	`false`	Enable verbose Owl2Sentence logging

Output Format

Gzipped CSV with columns:

Column	Description
`id`	Unique identifier for the sentence
`iri`	IRI of the ontology class
`label`	Human-readable class label
`ontology`	Ontology name (derived from filename)
`content`	Generated sentence text
`embeddings`	JSON-encoded 384-dimensional float array

Pipeline

ontology.owl ──▶ Owl2Sentence ──▶ corpus (DataFrame) ──▶ SentenceTransformer ──▶ embeddings.csv.gz
                 (parse & generate     (id, iri, label,     (encode content         (ready for
                  sentences)            ontology, content)    column)                 Qdrant upload)

The output is directly compatible with centree2vec_qdrant_uploader.py.

CENtree Ontology ML Examples

Convert OWL ontologies to natural-language corpora, generate sentence embeddings, and index them in Qdrant. Requires the oml extras:

pip install scibite-toolkit[oml]

Python API

from scibite_toolkit.centree_ontology_ml import Owl2Sentence, generate_embeddings

# Load ontology and generate sentence corpus
o2s = Owl2Sentence(owl_file="ontology.owl")
documents = o2s.run()

# Generate embeddings
texts = [doc.content for doc in documents]
embeddings = generate_embeddings(texts, model_name="sentence-transformers/all-MiniLM-L6-v2")

CLI Usage

The owl2sentence command exposes three pipeline stages:

# 1. Convert OWL to sentence corpus
owl2sentence corpus -i ontology.owl -o corpus.csv

# 2. Generate embeddings
owl2sentence embed -i corpus.csv -o embeddings.csv -m sentence-transformers/all-MiniLM-L6-v2

# 3. Index in Qdrant
owl2sentence index -i embeddings.csv --url http://localhost:6333 --collection my_ontology

# Pipeline chaining via stdout/stdin
owl2sentence corpus -i ontology.owl -o - | owl2sentence embed -i - -o - | owl2sentence index -i - --url http://localhost:6333 --collection my_ontology

Workbench Example

Dataset management and annotation.

from scibite_toolkit import workbench

# Initialize
wb = workbench.WorkbenchRequestBuilder()
wb.set_url('https://workbench.example.com')

# Authenticate
wb.set_oauth2('client_id', 'username', 'password')

# Create dataset
wb.set_dataset_name('My Analysis Dataset')
wb.set_dataset_desc('Dataset for clinical trial analysis')
wb.create_dataset()

# Upload file
wb.set_file_input('path/to/data.xlsx')
wb.upload_file_to_dataset()

# Configure and run annotation
vocabs = [[5, 6], [8, 9]]  # Vocabulary IDs
attrs = [200, 201]  # Attribute IDs
wb.set_termite_config('', vocabs, attrs)
wb.auto_annotate_dataset()

Key Features

Context Manager Support (TERMite 7, CENtree Clients)

Modern clients support context managers for automatic resource cleanup:

with termite7.Termite7RequestBuilder() as t:
    t.set_url('...')
    # ... work with client ...
# File handles automatically closed

Error Handling

All OAuth2 methods return boolean status for easy error handling:

if not t.set_oauth2(client_id, client_secret):
    print("Authentication failed - check credentials")
    exit(1)

Logging

Enable detailed logging for debugging:

import logging

logging.basicConfig(level=logging.DEBUG)

# Or set per-client
t = termite7.Termite7RequestBuilder(log_level='DEBUG')

Session Management

All clients use requests.Session() for efficient connection pooling and automatic retry handling.

License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.5.0a2 pre-release

May 18, 2026

1.5.0a1 pre-release

Apr 29, 2026

1.4.0

Apr 22, 2026

1.4.0rc1 pre-release

Apr 13, 2026

1.3.1

Apr 13, 2026

1.3.1rc1 pre-release

Apr 7, 2026

1.3.0

Feb 19, 2026

1.3.0rc1 pre-release

Jan 30, 2026

1.2.0

Nov 17, 2025

1.1.0

Oct 22, 2025

1.0.0

Aug 29, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scibite_toolkit-1.5.0a2.tar.gz (169.5 kB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scibite_toolkit-1.5.0a2-py3-none-any.whl (181.8 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file scibite_toolkit-1.5.0a2.tar.gz.

File metadata

Download URL: scibite_toolkit-1.5.0a2.tar.gz
Upload date: May 18, 2026
Size: 169.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for scibite_toolkit-1.5.0a2.tar.gz
Algorithm	Hash digest
SHA256	`af422edc7bb9c2c9ee3c8b635af452925ce6f3c05bcf786dee3fb47d41d9378d`
MD5	`f85501785162a73a8c3bb099fdd199b7`
BLAKE2b-256	`38828cc735df162f17c9f9d71861d72385b5ae8175ca9af233eae896e2b00f3c`

See more details on using hashes here.

File details

Details for the file scibite_toolkit-1.5.0a2-py3-none-any.whl.

File metadata

Download URL: scibite_toolkit-1.5.0a2-py3-none-any.whl
Upload date: May 18, 2026
Size: 181.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for scibite_toolkit-1.5.0a2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a07b76d10b30208dea49b3e3714f67b8a2cb0512d43fcbc808dfd4eac9975398`
MD5	`c9ff1278599d243db7dec8f90295857f`
BLAKE2b-256	`e4cc1cb3aede037697f2200055d1b0c80a401c5481b1020b4f691cc818be2e34`

See more details on using hashes here.

scibite-toolkit 1.5.0a2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SciBite Toolkit

Supported Products

Installation

Quick Start Examples

TERMite 7 Examples

OAuth2 Client Credentials (SaaS - Recommended)

OAuth2 Password Grant (Legacy)

Get System Status

TERMite 6 Examples

SciBite Hosted (SaaS)

Local Instance (Customer Hosted)

TExpress Examples

SciBite Hosted

Local Instance

SciBite Search Example

CENtree Examples

Modern Client (Recommended)

CENtree VectorDB Uploader Examples

CLI Usage

Python API

Generate a Starter Config

Configuration Reference

CENtree Vector Generator Examples

CLI Usage

Python API

Arguments

Output Format

Pipeline

CENtree Ontology ML Examples

Python API

CLI Usage

Workbench Example

Key Features

Context Manager Support (TERMite 7, CENtree Clients)

Error Handling

Logging

Session Management

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes