Async client for SVO semantic chunker microservice.
Project description
svo-client
Async Python client for SVO Semantic Chunker microservice.
Installation
pip install svo-client
Quick Start
Text Chunking
from svo_client import ChunkerClient
import asyncio
async def main():
async with ChunkerClient(
host="localhost", port=8009,
cert="client.crt", key="client.key", ca="ca.crt",
) as client:
chunks = await client.chunk(["Your text here."])
for text_chunks in chunks:
for chunk in text_chunks:
print(chunk.text)
asyncio.run(main())
File Chunking
async with ChunkerClient(
host="localhost", port=8009,
cert=cert, key=key, ca=ca,
) as client:
# Any file, any format — server handles processing.
# Default: no timeout limit (server can work up to 1 hour).
chunks = await client.chunk_file(
filepath="/path/to/document.pdf",
filter_name="plain_text",
)
# Explicit timeout limit (optional):
chunks = await client.chunk_file(
filepath="/path/to/document.pdf",
timeout=1800, # 30 min max
)
API Reference
ChunkerClient
ChunkerClient(
*,
config: Optional[Dict[str, Any]] = None,
host: str = "localhost",
port: int = 8009,
cert: Optional[str] = None,
key: Optional[str] = None,
ca: Optional[str] = None,
token: Optional[str] = None,
token_header: str = "X-API-Key",
check_hostname: bool = False,
timeout: Optional[float] = None,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
config |
Optional[Dict] |
None |
Pre-built config; if set, other args ignored |
host |
str |
"localhost" |
Server host |
port |
int |
8009 |
Server port |
cert |
Optional[str] |
None |
Client certificate path (mTLS) |
key |
Optional[str] |
None |
Client key path (mTLS) |
ca |
Optional[str] |
None |
CA certificate path (mTLS) |
token |
Optional[str] |
None |
API key for authentication |
token_header |
str |
"X-API-Key" |
HTTP header for API key |
check_hostname |
bool |
False |
Verify SSL hostname |
timeout |
Optional[float] |
None |
Default request timeout (seconds); None = no timeout |
Methods
chunk(texts, use_sv=False, timeout=0.0, verify_integrity=False, **params)
Chunk a list of texts via WebSocket. Returns List[List[SemanticChunk]] —
one list of chunks per input text.
| Parameter | Type | Default | Description |
|---|---|---|---|
texts |
List[str] |
required | Texts to chunk |
use_sv |
bool |
False |
Use semantic verification |
timeout |
float |
0.0 |
Timeout in seconds; 0 = no limit |
verify_integrity |
bool |
False |
Check text integrity after chunking |
**params |
Any |
— | Additional chunk parameters |
file(*, filepath=None, filename=None, file_content=None, filter_name="plain_text", timeout=None)
Send a file to the server for text extraction. Returns FileResponse.
Two input modes:
- CLI channel: provide
filepath(server reads the file). - API channel: provide
filename+file_content(bytes; base64-encoded internally).
| Parameter | Type | Default | Description |
|---|---|---|---|
filepath |
Optional[str] |
None |
Local file path (CLI channel) |
filename |
Optional[str] |
None |
Filename (API channel) |
file_content |
Optional[bytes] |
None |
Raw file bytes (API channel) |
filter_name |
str |
"plain_text" |
Server-side filter name |
timeout |
Optional[float] |
None |
Per-call timeout in seconds |
chunk_file(*, filepath=None, filename=None, file_content=None, filter_name="plain_text", use_sv=False, timeout=0, verify_integrity=False, **chunk_params)
Convenience method: file() + chunk() in one call. Returns
List[List[SemanticChunk]].
Any file format supported (PDF, DOCX, images, text, markdown, etc.) — server handles all processing.
| Parameter | Type | Default | Description |
|---|---|---|---|
filepath |
Optional[str] |
None |
Local file path |
filename |
Optional[str] |
None |
Filename for API channel |
file_content |
Optional[bytes] |
None |
Raw file bytes |
filter_name |
str |
"plain_text" |
Server-side filter name |
use_sv |
bool |
False |
Use semantic verification |
timeout |
Optional[float] |
0 |
Timeout in seconds; 0 = no limit |
verify_integrity |
bool |
False |
Check text integrity |
**chunk_params |
Any |
— | Additional chunk parameters |
config(timeout=None)
Retrieve server configuration. Returns Dict[str, Any].
help_cmd(command=None, timeout=None)
Retrieve server help information. Returns Dict[str, Any].
If command is given, returns help for that specific command.
health()
Health check — verifies the server is up. Returns Dict[str, Any].
open_ws_channel(receive_timeout=60.0, heartbeat=30.0)
Open a bidirectional WebSocket channel for multiple requests.
Returns BidirectionalWsChannel.
| Parameter | Type | Default | Description |
|---|---|---|---|
receive_timeout |
float |
60.0 |
Per-message receive timeout (seconds) |
heartbeat |
float |
30.0 |
WebSocket keepalive interval (seconds) |
close()
Close the underlying client connection. Also available via async context
manager (async with).
Long-Running File Operations
File processing is entirely server-side. The client sends the file as-is (any format: PDF, DOCX, images, text, markdown, etc.).
chunk_file()defaults to no timeout limit — the client waits as long as the server needs (up to 1 hour for large files).- Results arrive via WebSocket; the adapter heartbeat (30s) keeps the connection alive.
- To set an explicit limit, pass
timeout=N(seconds):
# Wait up to 30 minutes:
chunks = await client.chunk_file(
filepath="/path/to/large.pdf",
timeout=1800,
)
CLI Reference
# Chunk text
svo-chunker chunk --text "Your text here" [--use-sv] [--type Draft]
# Chunk multiple texts (batch)
svo-chunker chunk-batch --text "Text one" --text "Text two"
# Process file (extraction + optional chunking)
svo-chunker file --filepath /path/to/file [--filter plain_text] [--chunk]
# Server configuration
svo-chunker config
# Health check
svo-chunker health
Error Handling
All exceptions are importable from svo_client:
from svo_client import SVOServerError, SVOTimeoutError, SVOFileError
| Exception | When Raised |
|---|---|
SVOServerError |
Server application-level error |
SVOChunkingIntegrityError |
Text integrity check fails (subclass of SVOServerError) |
SVOJSONRPCError |
JSON-RPC error response |
SVOHTTPError |
HTTP error or invalid response |
SVOWebSocketRequiredError |
WebSocket required but unavailable |
SVOConnectionError |
Network/connection issues |
SVOTimeoutError |
Request timeout exceeded |
SVOEmbeddingError |
Embedding service error |
SVOFileError |
Base file-command error |
SVOFilePayloadError |
Invalid file payload (subclass of SVOFileError) |
SVOFileTypeError |
Unknown filter_name (subclass of SVOFileError) |
SVOFileNotFoundError |
File not found (subclass of SVOFileError) |
SVOFilePermissionError |
Permission denied (subclass of SVOFileError) |
SVOFileReadError |
OS-level file read error (subclass of SVOFileError) |
Filter Names
Supported server-side filter names for file processing:
plain_text— extract plain text content (default)markdown— extract and process markdowntxt— raw text extraction
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file svo_client-2.3.2.tar.gz.
File metadata
- Download URL: svo_client-2.3.2.tar.gz
- Upload date:
- Size: 95.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a7afda72d907c61e52fc0888c1dacd8bb3b828227fbf6fad1b56c7186b29aa1
|
|
| MD5 |
b56a53e4c4f3a337dfedd6c635b000e6
|
|
| BLAKE2b-256 |
a6dc5c6aee9c5742e5dce9e3864fdafe4387524f523900ceb27d724efc0b6321
|
File details
Details for the file svo_client-2.3.2-py3-none-any.whl.
File metadata
- Download URL: svo_client-2.3.2-py3-none-any.whl
- Upload date:
- Size: 127.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07caae70276f4f728ed0919c06650544cd754901dada49a02f3816e535aecc7c
|
|
| MD5 |
83a2465fbf4d6305f4f34fe32ef215f3
|
|
| BLAKE2b-256 |
336a6520a46fef9a138a3a7900ee26a71b7488edc5b3b67c3044e2e060f881cc
|