Skip to main content

Pluggable ingestors for qortex (PDF, Markdown, text).

Project description

qortex-ingest

Pluggable document ingestion for qortex: extract concepts, relations, and rules from any source into a knowledge graph.

<style> .ing-bg { fill: #0d1117; } .ing-box { fill: #161b22; stroke: #30363d; stroke-width: 1; rx: 6; } .ing-box-accent { fill: #161b22; stroke: #6366f1; stroke-width: 1.5; rx: 6; filter: url(#ing-glow); } .ing-label { font-family: 'JetBrains Mono', monospace; font-size: 8px; fill: #8b949e; text-transform: uppercase; letter-spacing: 0.05em; } .ing-title { font-family: system-ui, sans-serif; font-size: 13px; fill: #e6edf3; } .ing-subtitle { font-family: system-ui, sans-serif; font-size: 10px; fill: #8b949e; } .ing-flow { stroke: #6366f1; stroke-width: 1.2; stroke-dasharray: 4 3; fill: none; opacity: 0.5; } .ing-flow-anim { animation: ing-dash 2s linear infinite; } @keyframes ing-dash { to { stroke-dashoffset: -14; } } .ing-arrow { fill: #6366f1; opacity: 0.5; } </style>

Install

pip install qortex-ingest

With extraction backends:

pip install "qortex-ingest[anthropic]"   # Claude API extraction
pip install "qortex-ingest[pdf]"         # PDF support (pymupdf + pdfplumber)
pip install "qortex-ingest[all]"         # everything

Quick Start

from qortex.ingest import IngestionManifest
from qortex.ingest.text import TextIngestor
from qortex.ingest.backends import get_extraction_backend

# Auto-detect best available backend (Anthropic > Ollama > Stub)
backend = get_extraction_backend()

ingestor = TextIngestor(backend=backend)
manifest: IngestionManifest = ingestor.ingest(
    source_path="notes.txt",
    domain="my-project",
)

print(f"Extracted {len(manifest.concepts)} concepts, {len(manifest.edges)} relations")

What It Does

qortex-ingest converts documents into structured knowledge graph components:

  1. Chunk — Split source by format (paragraphs, headings, sentences)
  2. Extract — Two-pass LLM extraction: generalizable concepts, then illustrative examples reconciled onto parents
  3. Relate — 10 relation types: REQUIRES, USES, REFINES, IMPLEMENTS, PART_OF, SIMILAR_TO, ALTERNATIVE_TO, SUPPORTS, CHALLENGES, CONTRADICTS
  4. Assemble — Output a single IngestionManifest (the universal contract)

Ingestors

Ingestor Format Chunking Strategy
TextIngestor Plain text Fixed-size with configurable overlap
MarkdownIngestor Markdown By heading hierarchy, preserves structure
SentenceBoundaryChunker Online/streaming Regex sentence boundaries, SHA256 IDs

Pluggable Chunkers

Any callable matching ChunkingStrategy can replace the default:

from qortex.online.chunker import Chunk

def my_chunker(
    text: str,
    max_tokens: int = 256,
    overlap_tokens: int = 32,
    source_id: str = "",
) -> list[Chunk]:
    # Your custom chunking logic (tiktoken, semantic, etc.)
    ...

Extraction Backends

Backend Cost Features
AnthropicExtractionBackend ~$0.60/57KB Full extraction: concepts, relations, rules, code examples
OllamaExtractionBackend Free (local) Concepts, relations, rules (no code examples)
StubLLMBackend Free Testing only — returns configured fixtures

Auto-detection priority: Anthropic (if ANTHROPIC_API_KEY set) > Ollama (if reachable) > Stub.

Output: IngestionManifest

The manifest is the universal contract between ingestion and the knowledge graph:

@dataclass
class IngestionManifest:
    source: SourceMetadata        # origin info + stats
    domain: str                   # knowledge domain name
    concepts: list[ConceptNode]   # extracted concepts with embeddings
    edges: list[ConceptEdge]      # typed relations between concepts
    rules: list[ExplicitRule]     # best practices, warnings, principles
    code_examples: list[CodeExample]  # linked to concepts and rules

Requirements

  • Python 3.11+
  • qortex (for core models — IngestionManifest, ConceptNode, etc.)
  • anthropic (optional, for Claude extraction)
  • pymupdf + pdfplumber (optional, for PDF support)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qortex_ingest-0.1.0.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qortex_ingest-0.1.0-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file qortex_ingest-0.1.0.tar.gz.

File metadata

  • Download URL: qortex_ingest-0.1.0.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for qortex_ingest-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7ab9547e9c7b0cf1204d12e3fa7ee80a22facb1070d41d8a65ba23fc9a652398
MD5 e649eae5bb1e4e263eab73bc6bed43eb
BLAKE2b-256 6c3002e1eb59cb3e5e9f0c0e4fed020d89281d62262226503c2d949706355a55

See more details on using hashes here.

Provenance

The following attestation bundles were made for qortex_ingest-0.1.0.tar.gz:

Publisher: publish-ingest.yml on Peleke/qortex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file qortex_ingest-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: qortex_ingest-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for qortex_ingest-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c68b9ea3557a7d11b64c0360c7a7b0beebc5723d93a7f0979d3c138c030eeb45
MD5 6b82927c77431ee82f6ebcf3efb56cdf
BLAKE2b-256 172a6804de65045c9fa0340a46c402f13d868139302845694ae02124b6ceb87b

See more details on using hashes here.

Provenance

The following attestation bundles were made for qortex_ingest-0.1.0-py3-none-any.whl:

Publisher: publish-ingest.yml on Peleke/qortex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page