Skip to main content

Async client for SVO semantic chunker microservice.

Project description

svo-client

Async Python client for SVO Semantic Chunker microservice.

Installation

pip install svo-client

Quick Start

Text Chunking

from svo_client import ChunkerClient
import asyncio

async def main():
    async with ChunkerClient(
        host="localhost", port=8009,
        cert="client.crt", key="client.key", ca="ca.crt",
    ) as client:
        chunks = await client.chunk(["Your text here."])
        for text_chunks in chunks:
            for chunk in text_chunks:
                print(chunk.text)

asyncio.run(main())

File Chunking

async with ChunkerClient(
    host="localhost", port=8009,
    cert=cert, key=key, ca=ca,
) as client:
    # Any file, any format — server handles processing.
    # Default: no timeout limit (server can work up to 1 hour).
    chunks = await client.chunk_file(
        filepath="/path/to/document.pdf",
        filter_name="plain_text",
    )

    # Explicit timeout limit (optional):
    chunks = await client.chunk_file(
        filepath="/path/to/document.pdf",
        timeout=1800,  # 30 min max
    )

API Reference

ChunkerClient

ChunkerClient(
    *,
    config: Optional[Dict[str, Any]] = None,
    host: str = "localhost",
    port: int = 8009,
    cert: Optional[str] = None,
    key: Optional[str] = None,
    ca: Optional[str] = None,
    token: Optional[str] = None,
    token_header: str = "X-API-Key",
    check_hostname: bool = False,
    timeout: Optional[float] = None,
)
Parameter Type Default Description
config Optional[Dict] None Pre-built config; if set, other args ignored
host str "localhost" Server host
port int 8009 Server port
cert Optional[str] None Client certificate path (mTLS)
key Optional[str] None Client key path (mTLS)
ca Optional[str] None CA certificate path (mTLS)
token Optional[str] None API key for authentication
token_header str "X-API-Key" HTTP header for API key
check_hostname bool False Verify SSL hostname
timeout Optional[float] None Default request timeout (seconds); None = no timeout

Methods

chunk(texts, use_sv=False, timeout=0.0, verify_integrity=False, **params)

Chunk a list of texts via WebSocket. Returns List[List[SemanticChunk]] — one list of chunks per input text.

Parameter Type Default Description
texts List[str] required Texts to chunk
use_sv bool False Use semantic verification
timeout float 0.0 Timeout in seconds; 0 = no limit
verify_integrity bool False Check text integrity after chunking
**params Any Additional chunk parameters

file(*, filepath=None, filename=None, file_content=None, filter_name="plain_text", timeout=None)

Send a file to the server for text extraction. Returns FileResponse.

Two input modes:

  • CLI channel: provide filepath (server reads the file).
  • API channel: provide filename + file_content (bytes; base64-encoded internally).
Parameter Type Default Description
filepath Optional[str] None Local file path (CLI channel)
filename Optional[str] None Filename (API channel)
file_content Optional[bytes] None Raw file bytes (API channel)
filter_name str "plain_text" Server-side filter name
timeout Optional[float] None Per-call timeout in seconds

chunk_file(*, filepath=None, filename=None, file_content=None, filter_name="plain_text", use_sv=False, timeout=0, verify_integrity=False, **chunk_params)

Convenience method: file() + chunk() in one call. Returns List[List[SemanticChunk]].

Any file format supported (PDF, DOCX, images, text, markdown, etc.) — server handles all processing.

Parameter Type Default Description
filepath Optional[str] None Local file path
filename Optional[str] None Filename for API channel
file_content Optional[bytes] None Raw file bytes
filter_name str "plain_text" Server-side filter name
use_sv bool False Use semantic verification
timeout Optional[float] 0 Timeout in seconds; 0 = no limit
verify_integrity bool False Check text integrity
**chunk_params Any Additional chunk parameters

config(timeout=None)

Retrieve server configuration. Returns Dict[str, Any].

help_cmd(command=None, timeout=None)

Retrieve server help information. Returns Dict[str, Any]. If command is given, returns help for that specific command.

health()

Health check — verifies the server is up. Returns Dict[str, Any].

open_ws_channel(receive_timeout=60.0, heartbeat=30.0)

Open a bidirectional WebSocket channel for multiple requests. Returns BidirectionalWsChannel.

Parameter Type Default Description
receive_timeout float 60.0 Per-message receive timeout (seconds)
heartbeat float 30.0 WebSocket keepalive interval (seconds)

close()

Close the underlying client connection. Also available via async context manager (async with).

Long-Running File Operations

File processing is entirely server-side. The client sends the file as-is (any format: PDF, DOCX, images, text, markdown, etc.).

  • chunk_file() defaults to no timeout limit — the client waits as long as the server needs (up to 1 hour for large files).
  • Results arrive via WebSocket; the adapter heartbeat (30s) keeps the connection alive.
  • To set an explicit limit, pass timeout=N (seconds):
# Wait up to 30 minutes:
chunks = await client.chunk_file(
    filepath="/path/to/large.pdf",
    timeout=1800,
)

CLI Reference

# Chunk text
svo-chunker chunk --text "Your text here" [--use-sv] [--type Draft]

# Chunk multiple texts (batch)
svo-chunker chunk-batch --text "Text one" --text "Text two"

# Process file (extraction + optional chunking)
svo-chunker file --filepath /path/to/file [--filter plain_text] [--chunk]

# Server configuration
svo-chunker config

# Health check
svo-chunker health

Error Handling

All exceptions are importable from svo_client:

from svo_client import SVOServerError, SVOTimeoutError, SVOFileError
Exception When Raised
SVOServerError Server application-level error
SVOChunkingIntegrityError Text integrity check fails (subclass of SVOServerError)
SVOJSONRPCError JSON-RPC error response
SVOHTTPError HTTP error or invalid response
SVOWebSocketRequiredError WebSocket required but unavailable
SVOConnectionError Network/connection issues
SVOTimeoutError Request timeout exceeded
SVOEmbeddingError Embedding service error
SVOFileError Base file-command error
SVOFilePayloadError Invalid file payload (subclass of SVOFileError)
SVOFileTypeError Unknown filter_name (subclass of SVOFileError)
SVOFileNotFoundError File not found (subclass of SVOFileError)
SVOFilePermissionError Permission denied (subclass of SVOFileError)
SVOFileReadError OS-level file read error (subclass of SVOFileError)

Filter Names

Supported server-side filter names for file processing:

  • plain_text — extract plain text content (default)
  • markdown — extract and process markdown
  • txt — raw text extraction

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svo_client-2.3.2.tar.gz (95.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

svo_client-2.3.2-py3-none-any.whl (127.2 kB view details)

Uploaded Python 3

File details

Details for the file svo_client-2.3.2.tar.gz.

File metadata

  • Download URL: svo_client-2.3.2.tar.gz
  • Upload date:
  • Size: 95.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for svo_client-2.3.2.tar.gz
Algorithm Hash digest
SHA256 1a7afda72d907c61e52fc0888c1dacd8bb3b828227fbf6fad1b56c7186b29aa1
MD5 b56a53e4c4f3a337dfedd6c635b000e6
BLAKE2b-256 a6dc5c6aee9c5742e5dce9e3864fdafe4387524f523900ceb27d724efc0b6321

See more details on using hashes here.

File details

Details for the file svo_client-2.3.2-py3-none-any.whl.

File metadata

  • Download URL: svo_client-2.3.2-py3-none-any.whl
  • Upload date:
  • Size: 127.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for svo_client-2.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 07caae70276f4f728ed0919c06650544cd754901dada49a02f3816e535aecc7c
MD5 83a2465fbf4d6305f4f34fe32ef215f3
BLAKE2b-256 336a6520a46fef9a138a3a7900ee26a71b7488edc5b3b67c3044e2e060f881cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page