Pluggable ingestors for qortex (PDF, Markdown, text).
Project description
qortex-ingest
Pluggable document ingestion for qortex: extract concepts, relations, and rules from any source into a knowledge graph.
Install
pip install qortex-ingest
With extraction backends:
pip install "qortex-ingest[anthropic]" # Claude API extraction
pip install "qortex-ingest[pdf]" # PDF support (pymupdf + pdfplumber)
pip install "qortex-ingest[all]" # everything
Quick Start
from qortex.ingest import IngestionManifest
from qortex.ingest.text import TextIngestor
from qortex.ingest.backends import get_extraction_backend
# Auto-detect best available backend (Anthropic > Ollama > Stub)
backend = get_extraction_backend()
ingestor = TextIngestor(backend=backend)
manifest: IngestionManifest = ingestor.ingest(
source_path="notes.txt",
domain="my-project",
)
print(f"Extracted {len(manifest.concepts)} concepts, {len(manifest.edges)} relations")
What It Does
qortex-ingest converts documents into structured knowledge graph components:
- Chunk — Split source by format (paragraphs, headings, sentences)
- Extract — Two-pass LLM extraction: generalizable concepts, then illustrative examples reconciled onto parents
- Relate — 10 relation types:
REQUIRES,USES,REFINES,IMPLEMENTS,PART_OF,SIMILAR_TO,ALTERNATIVE_TO,SUPPORTS,CHALLENGES,CONTRADICTS - Assemble — Output a single
IngestionManifest(the universal contract)
Ingestors
| Ingestor | Format | Chunking Strategy |
|---|---|---|
TextIngestor |
Plain text | Fixed-size with configurable overlap |
MarkdownIngestor |
Markdown | By heading hierarchy, preserves structure |
SentenceBoundaryChunker |
Online/streaming | Regex sentence boundaries, SHA256 IDs |
Pluggable Chunkers
Any callable matching ChunkingStrategy can replace the default:
from qortex.online.chunker import Chunk
def my_chunker(
text: str,
max_tokens: int = 256,
overlap_tokens: int = 32,
source_id: str = "",
) -> list[Chunk]:
# Your custom chunking logic (tiktoken, semantic, etc.)
...
Extraction Backends
| Backend | Cost | Features |
|---|---|---|
AnthropicExtractionBackend |
~$0.60/57KB | Full extraction: concepts, relations, rules, code examples |
OllamaExtractionBackend |
Free (local) | Concepts, relations, rules (no code examples) |
StubLLMBackend |
Free | Testing only — returns configured fixtures |
Auto-detection priority: Anthropic (if ANTHROPIC_API_KEY set) > Ollama (if reachable) > Stub.
Output: IngestionManifest
The manifest is the universal contract between ingestion and the knowledge graph:
@dataclass
class IngestionManifest:
source: SourceMetadata # origin info + stats
domain: str # knowledge domain name
concepts: list[ConceptNode] # extracted concepts with embeddings
edges: list[ConceptEdge] # typed relations between concepts
rules: list[ExplicitRule] # best practices, warnings, principles
code_examples: list[CodeExample] # linked to concepts and rules
Requirements
- Python 3.11+
qortex(for core models —IngestionManifest,ConceptNode, etc.)anthropic(optional, for Claude extraction)pymupdf+pdfplumber(optional, for PDF support)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qortex_ingest-0.1.0.tar.gz.
File metadata
- Download URL: qortex_ingest-0.1.0.tar.gz
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ab9547e9c7b0cf1204d12e3fa7ee80a22facb1070d41d8a65ba23fc9a652398
|
|
| MD5 |
e649eae5bb1e4e263eab73bc6bed43eb
|
|
| BLAKE2b-256 |
6c3002e1eb59cb3e5e9f0c0e4fed020d89281d62262226503c2d949706355a55
|
Provenance
The following attestation bundles were made for qortex_ingest-0.1.0.tar.gz:
Publisher:
publish-ingest.yml on Peleke/qortex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
qortex_ingest-0.1.0.tar.gz -
Subject digest:
7ab9547e9c7b0cf1204d12e3fa7ee80a22facb1070d41d8a65ba23fc9a652398 - Sigstore transparency entry: 953556752
- Sigstore integration time:
-
Permalink:
Peleke/qortex@8640a68a84cde653c35103fb723609ff05ed107d -
Branch / Tag:
refs/tags/ingest-v0.1.0 - Owner: https://github.com/Peleke
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-ingest.yml@8640a68a84cde653c35103fb723609ff05ed107d -
Trigger Event:
push
-
Statement type:
File details
Details for the file qortex_ingest-0.1.0-py3-none-any.whl.
File metadata
- Download URL: qortex_ingest-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c68b9ea3557a7d11b64c0360c7a7b0beebc5723d93a7f0979d3c138c030eeb45
|
|
| MD5 |
6b82927c77431ee82f6ebcf3efb56cdf
|
|
| BLAKE2b-256 |
172a6804de65045c9fa0340a46c402f13d868139302845694ae02124b6ceb87b
|
Provenance
The following attestation bundles were made for qortex_ingest-0.1.0-py3-none-any.whl:
Publisher:
publish-ingest.yml on Peleke/qortex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
qortex_ingest-0.1.0-py3-none-any.whl -
Subject digest:
c68b9ea3557a7d11b64c0360c7a7b0beebc5723d93a7f0979d3c138c030eeb45 - Sigstore transparency entry: 953556754
- Sigstore integration time:
-
Permalink:
Peleke/qortex@8640a68a84cde653c35103fb723609ff05ed107d -
Branch / Tag:
refs/tags/ingest-v0.1.0 - Owner: https://github.com/Peleke
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-ingest.yml@8640a68a84cde653c35103fb723609ff05ed107d -
Trigger Event:
push
-
Statement type: