Skip to main content

Online session indexing for qortex: chunking, concept extraction, and real-time graph wiring.

Project description

qortex-online

Online session indexing for qortex: chunking, concept extraction, and real-time graph wiring.

<style> .onl-bg { fill: #0d1117; } .onl-box { fill: #161b22; stroke: #30363d; stroke-width: 1; rx: 6; } .onl-box-accent { fill: #161b22; stroke: #6366f1; stroke-width: 1.5; rx: 6; filter: url(#onl-glow); } .onl-label { font-family: 'JetBrains Mono', monospace; font-size: 8px; fill: #8b949e; text-transform: uppercase; letter-spacing: 0.05em; } .onl-title { font-family: system-ui, sans-serif; font-size: 13px; fill: #e6edf3; } .onl-subtitle { font-family: system-ui, sans-serif; font-size: 10px; fill: #8b949e; } .onl-flow { stroke: #6366f1; stroke-width: 1.2; stroke-dasharray: 4 3; fill: none; opacity: 0.5; } .onl-flow-anim { animation: onl-dash 2s linear infinite; } @keyframes onl-dash { to { stroke-dashoffset: -14; } } .onl-arrow { fill: #6366f1; opacity: 0.5; } </style>

Install

pip install qortex-online                # core (chunking + extraction protocol)
pip install 'qortex-online[nlp]'         # + spaCy NER extraction
pip install 'qortex-online[all]'         # everything

Quick Start

from qortex.online import default_chunker, SpaCyExtractor

# Chunk conversation text
chunks = default_chunker("User said JWT tokens expire after 30 minutes. The auth module validates them.")

# Extract concepts and relations
extractor = SpaCyExtractor()
for chunk in chunks:
    result = extractor(chunk.text, domain="auth")
    for concept in result.concepts:
        print(f"  {concept.name} ({concept.confidence:.1f})")
    for rel in result.relations:
        print(f"  {rel.source_name} --{rel.relation_type}--> {rel.target_name}")

What It Does

qortex-online handles the real-time path from conversation text to knowledge graph nodes and edges. While qortex-ingest handles batch document ingestion with LLM extraction, qortex-online handles the live session path: chunking messages as they arrive, extracting named concepts locally, and wiring them into the graph with typed relationships.

Phase 1: Chunking

SentenceBoundaryChunker splits text on sentence boundaries (regex [.!?\n]), using a 1 token = 4 chars approximation. Each chunk gets a deterministic SHA256 ID for deduplication across sessions.

from qortex.online import default_chunker, Chunk

chunks: list[Chunk] = default_chunker(
    text="Long conversation...",
    max_tokens=256,       # ~1024 chars per chunk
    overlap_tokens=32,    # 128-char overlap for context
    source_id="session-1",
)

Phase 2: Concept Extraction

Three pluggable strategies, selected via QORTEX_EXTRACTION env var:

Strategy Env Value Speed Cost Features
SpaCyExtractor spacy (default) Fast Free NER entities + noun chunks + dep-parse relations
LLMExtractor llm Slow API cost Full Anthropic/Ollama extraction via qortex-ingest
NullExtractor none Instant Free No-op, pipeline uses raw text only

SpaCy Extraction Pipeline

The default SpaCyExtractor runs four sub-steps, each with its own OpenTelemetry span:

  1. NLP Processing (extraction.spacy.nlp_process) -- Run the spaCy en_core_web_sm pipeline
  2. Entity Extraction (extraction.spacy.extract_entities) -- Pull NER entities (PERSON, ORG, PRODUCT, GPE, WORK_OF_ART, EVENT, FAC, LAW, LANGUAGE, NORP)
  3. Noun Chunk Extraction (extraction.spacy.extract_noun_chunks) -- Collect noun phrases, filtering pronouns and determiners
  4. Deduplication (extraction.spacy.deduplicate) -- Merge entities and noun chunks, preferring NER on span overlap
  5. Relation Inference (extraction.spacy.infer_relations) -- Dependency-parse verb patterns and coordination

Phase 3: Relation Inference

Relations are inferred from dependency parse patterns:

Verb Pattern Relation Type
use, utilize, call, invoke USES
require, need, depend, import REQUIRES
contain, include, have, hold CONTAINS
implement, extend, inherit IMPLEMENTS
refine, specialize, customize REFINES
"X and Y" coordination SIMILAR_TO

Pluggable Strategies

Both chunking and extraction follow the protocol pattern. Any callable matching the signature works:

from qortex.online import ChunkingStrategy, ExtractionStrategy, Chunk, ExtractionResult

# Custom chunker (e.g. tiktoken-based)
class TiktokenChunker:
    def __call__(
        self, text: str, max_tokens: int = 256,
        overlap_tokens: int = 32, source_id: str = "",
    ) -> list[Chunk]:
        ...

# Custom extractor (e.g. OpenAI function calling)
class OpenAIExtractor:
    def __call__(self, text: str, domain: str = "") -> ExtractionResult:
        ...

Observability

Every extraction step emits OpenTelemetry spans visible in Jaeger:

extraction.spacy                    [total time]
  extraction.spacy.nlp_process      [spaCy pipeline]
  extraction.spacy.extract_entities [NER pass]
  extraction.spacy.extract_noun_chunks [noun chunks]
  extraction.spacy.deduplicate      [span merging]
  extraction.spacy.infer_relations  [dep-parse]

When QORTEX_OTEL_ENABLED=true, these spans are exported alongside the parent online_index_pipeline span from the MCP server.

Configuration

Env Var Default Purpose
QORTEX_EXTRACTION spacy Extraction strategy: spacy, llm, none
QORTEX_OTEL_ENABLED false Enable OpenTelemetry span export

Data Types

@dataclass(frozen=True)
class Chunk:
    id: str       # SHA256[:16] deterministic hash
    text: str     # Chunk content
    index: int    # Position in sequence

@dataclass(frozen=True)
class ExtractedConcept:
    name: str           # e.g. "JWT Tokens"
    description: str    # One-sentence context
    confidence: float   # 0.9 (NER), 0.7 (noun chunk)

@dataclass(frozen=True)
class ExtractedRelation:
    source_name: str     # Source concept name
    target_name: str     # Target concept name
    relation_type: str   # Maps to RelationType enum
    confidence: float    # 0.5-0.8 depending on signal

@dataclass(frozen=True)
class ExtractionResult:
    concepts: list[ExtractedConcept]
    relations: list[ExtractedRelation]

Requirements

  • Python 3.11+
  • spaCy 3.7+ with en_core_web_sm (optional, for SpaCy extraction)
  • qortex-observe (optional, for OpenTelemetry span tracing)
  • qortex-ingest (optional, for LLM extraction backend)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qortex_online-0.1.0.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qortex_online-0.1.0-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file qortex_online-0.1.0.tar.gz.

File metadata

  • Download URL: qortex_online-0.1.0.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for qortex_online-0.1.0.tar.gz
Algorithm Hash digest
SHA256 83a0df3b2198ce7d72c17da6f7edbc40198c1b1fe739ec0506c659ca7155f5d3
MD5 f79f0d9e81edd32e7a15dc77c5de865b
BLAKE2b-256 a03bd01ba2e46f52c307b901f3c9a096baff25b09afc5a716ba5f26f3a4326cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for qortex_online-0.1.0.tar.gz:

Publisher: publish-online.yml on Peleke/qortex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file qortex_online-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: qortex_online-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for qortex_online-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3ccf47c539a001418b7bfce1b64b3af7b82f633da9fc9c388ffe7bd1d05cf219
MD5 a85b98964aff21bc4be796a5c2f2f709
BLAKE2b-256 09c553647592f9e034ac610edd0d3f78afe6e2af7fe79defb25df93afb372ebf

See more details on using hashes here.

Provenance

The following attestation bundles were made for qortex_online-0.1.0-py3-none-any.whl:

Publisher: publish-online.yml on Peleke/qortex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page