Online session indexing for qortex: chunking, concept extraction, and real-time graph wiring.
Project description
qortex-online
Online session indexing for qortex: chunking, concept extraction, and real-time graph wiring.
Install
pip install qortex-online # core (chunking + extraction protocol)
pip install 'qortex-online[nlp]' # + spaCy NER extraction
pip install 'qortex-online[all]' # everything
Quick Start
from qortex.online import default_chunker, SpaCyExtractor
# Chunk conversation text
chunks = default_chunker("User said JWT tokens expire after 30 minutes. The auth module validates them.")
# Extract concepts and relations
extractor = SpaCyExtractor()
for chunk in chunks:
result = extractor(chunk.text, domain="auth")
for concept in result.concepts:
print(f" {concept.name} ({concept.confidence:.1f})")
for rel in result.relations:
print(f" {rel.source_name} --{rel.relation_type}--> {rel.target_name}")
What It Does
qortex-online handles the real-time path from conversation text to knowledge graph nodes and edges. While qortex-ingest handles batch document ingestion with LLM extraction, qortex-online handles the live session path: chunking messages as they arrive, extracting named concepts locally, and wiring them into the graph with typed relationships.
Phase 1: Chunking
SentenceBoundaryChunker splits text on sentence boundaries (regex [.!?\n]), using a 1 token = 4 chars approximation. Each chunk gets a deterministic SHA256 ID for deduplication across sessions.
from qortex.online import default_chunker, Chunk
chunks: list[Chunk] = default_chunker(
text="Long conversation...",
max_tokens=256, # ~1024 chars per chunk
overlap_tokens=32, # 128-char overlap for context
source_id="session-1",
)
Phase 2: Concept Extraction
Three pluggable strategies, selected via QORTEX_EXTRACTION env var:
| Strategy | Env Value | Speed | Cost | Features |
|---|---|---|---|---|
SpaCyExtractor |
spacy (default) |
Fast | Free | NER entities + noun chunks + dep-parse relations |
LLMExtractor |
llm |
Slow | API cost | Full Anthropic/Ollama extraction via qortex-ingest |
NullExtractor |
none |
Instant | Free | No-op, pipeline uses raw text only |
SpaCy Extraction Pipeline
The default SpaCyExtractor runs four sub-steps, each with its own OpenTelemetry span:
- NLP Processing (
extraction.spacy.nlp_process) -- Run the spaCyen_core_web_smpipeline - Entity Extraction (
extraction.spacy.extract_entities) -- Pull NER entities (PERSON, ORG, PRODUCT, GPE, WORK_OF_ART, EVENT, FAC, LAW, LANGUAGE, NORP) - Noun Chunk Extraction (
extraction.spacy.extract_noun_chunks) -- Collect noun phrases, filtering pronouns and determiners - Deduplication (
extraction.spacy.deduplicate) -- Merge entities and noun chunks, preferring NER on span overlap - Relation Inference (
extraction.spacy.infer_relations) -- Dependency-parse verb patterns and coordination
Phase 3: Relation Inference
Relations are inferred from dependency parse patterns:
| Verb Pattern | Relation Type |
|---|---|
| use, utilize, call, invoke | USES |
| require, need, depend, import | REQUIRES |
| contain, include, have, hold | CONTAINS |
| implement, extend, inherit | IMPLEMENTS |
| refine, specialize, customize | REFINES |
| "X and Y" coordination | SIMILAR_TO |
Pluggable Strategies
Both chunking and extraction follow the protocol pattern. Any callable matching the signature works:
from qortex.online import ChunkingStrategy, ExtractionStrategy, Chunk, ExtractionResult
# Custom chunker (e.g. tiktoken-based)
class TiktokenChunker:
def __call__(
self, text: str, max_tokens: int = 256,
overlap_tokens: int = 32, source_id: str = "",
) -> list[Chunk]:
...
# Custom extractor (e.g. OpenAI function calling)
class OpenAIExtractor:
def __call__(self, text: str, domain: str = "") -> ExtractionResult:
...
Observability
Every extraction step emits OpenTelemetry spans visible in Jaeger:
extraction.spacy [total time]
extraction.spacy.nlp_process [spaCy pipeline]
extraction.spacy.extract_entities [NER pass]
extraction.spacy.extract_noun_chunks [noun chunks]
extraction.spacy.deduplicate [span merging]
extraction.spacy.infer_relations [dep-parse]
When QORTEX_OTEL_ENABLED=true, these spans are exported alongside the parent online_index_pipeline span from the MCP server.
Configuration
| Env Var | Default | Purpose |
|---|---|---|
QORTEX_EXTRACTION |
spacy |
Extraction strategy: spacy, llm, none |
QORTEX_OTEL_ENABLED |
false |
Enable OpenTelemetry span export |
Data Types
@dataclass(frozen=True)
class Chunk:
id: str # SHA256[:16] deterministic hash
text: str # Chunk content
index: int # Position in sequence
@dataclass(frozen=True)
class ExtractedConcept:
name: str # e.g. "JWT Tokens"
description: str # One-sentence context
confidence: float # 0.9 (NER), 0.7 (noun chunk)
@dataclass(frozen=True)
class ExtractedRelation:
source_name: str # Source concept name
target_name: str # Target concept name
relation_type: str # Maps to RelationType enum
confidence: float # 0.5-0.8 depending on signal
@dataclass(frozen=True)
class ExtractionResult:
concepts: list[ExtractedConcept]
relations: list[ExtractedRelation]
Requirements
- Python 3.11+
- spaCy 3.7+ with
en_core_web_sm(optional, for SpaCy extraction) qortex-observe(optional, for OpenTelemetry span tracing)qortex-ingest(optional, for LLM extraction backend)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qortex_online-0.1.0.tar.gz.
File metadata
- Download URL: qortex_online-0.1.0.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83a0df3b2198ce7d72c17da6f7edbc40198c1b1fe739ec0506c659ca7155f5d3
|
|
| MD5 |
f79f0d9e81edd32e7a15dc77c5de865b
|
|
| BLAKE2b-256 |
a03bd01ba2e46f52c307b901f3c9a096baff25b09afc5a716ba5f26f3a4326cc
|
Provenance
The following attestation bundles were made for qortex_online-0.1.0.tar.gz:
Publisher:
publish-online.yml on Peleke/qortex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
qortex_online-0.1.0.tar.gz -
Subject digest:
83a0df3b2198ce7d72c17da6f7edbc40198c1b1fe739ec0506c659ca7155f5d3 - Sigstore transparency entry: 956484683
- Sigstore integration time:
-
Permalink:
Peleke/qortex@c71a811bd979b51c3fd082d3429713c7bf417f39 -
Branch / Tag:
refs/tags/online-v0.1.0 - Owner: https://github.com/Peleke
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-online.yml@c71a811bd979b51c3fd082d3429713c7bf417f39 -
Trigger Event:
push
-
Statement type:
File details
Details for the file qortex_online-0.1.0-py3-none-any.whl.
File metadata
- Download URL: qortex_online-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ccf47c539a001418b7bfce1b64b3af7b82f633da9fc9c388ffe7bd1d05cf219
|
|
| MD5 |
a85b98964aff21bc4be796a5c2f2f709
|
|
| BLAKE2b-256 |
09c553647592f9e034ac610edd0d3f78afe6e2af7fe79defb25df93afb372ebf
|
Provenance
The following attestation bundles were made for qortex_online-0.1.0-py3-none-any.whl:
Publisher:
publish-online.yml on Peleke/qortex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
qortex_online-0.1.0-py3-none-any.whl -
Subject digest:
3ccf47c539a001418b7bfce1b64b3af7b82f633da9fc9c388ffe7bd1d05cf219 - Sigstore transparency entry: 956484697
- Sigstore integration time:
-
Permalink:
Peleke/qortex@c71a811bd979b51c3fd082d3429713c7bf417f39 -
Branch / Tag:
refs/tags/online-v0.1.0 - Owner: https://github.com/Peleke
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-online.yml@c71a811bd979b51c3fd082d3429713c7bf417f39 -
Trigger Event:
push
-
Statement type: