Skip to main content

Async client for SVO semantic chunker microservice.

Project description

svo-client

Async Python client for SVO Semantic Chunker microservice.

Installation

pip install svo-client

Quick Start

Text Chunking

from svo_client import ChunkerClient
import asyncio

async def main():
    async with ChunkerClient(
        host="localhost", port=8009,
        cert="client.crt", key="client.key", ca="ca.crt",
    ) as client:
        chunks = await client.chunk(["Your text here."])
        for text_chunks in chunks:
            for chunk in text_chunks:
                print(chunk.text)

asyncio.run(main())

File Chunking

async with ChunkerClient(
    host="localhost", port=8009,
    cert=cert, key=key, ca=ca,
) as client:
    # Any file, any format — server handles processing.
    # Default: no timeout limit (server can work up to 1 hour).
    chunks = await client.chunk_file(
        filepath="/path/to/document.pdf",
        filter_name="plain_text",
    )

    # Explicit timeout limit (optional):
    chunks = await client.chunk_file(
        filepath="/path/to/document.pdf",
        timeout=1800,  # 30 min max
    )

API Reference

ChunkerClient

ChunkerClient(
    *,
    config: Optional[Dict[str, Any]] = None,
    host: str = "localhost",
    port: int = 8009,
    cert: Optional[str] = None,
    key: Optional[str] = None,
    ca: Optional[str] = None,
    token: Optional[str] = None,
    token_header: str = "X-API-Key",
    check_hostname: bool = False,
    timeout: Optional[float] = None,
)
Parameter Type Default Description
config Optional[Dict] None Pre-built config; if set, other args ignored
host str "localhost" Server host
port int 8009 Server port
cert Optional[str] None Client certificate path (mTLS)
key Optional[str] None Client key path (mTLS)
ca Optional[str] None CA certificate path (mTLS)
token Optional[str] None API key for authentication
token_header str "X-API-Key" HTTP header for API key
check_hostname bool False Verify SSL hostname
timeout Optional[float] None Default request timeout (seconds); None = no timeout

Methods

chunk(texts, use_sv=False, timeout=0.0, verify_integrity=False, **params)

Chunk a list of texts via WebSocket. Returns List[List[SemanticChunk]] — one list of chunks per input text.

Parameter Type Default Description
texts List[str] required Texts to chunk
use_sv bool False Use semantic verification
timeout float 0.0 Timeout in seconds; 0 = no limit
verify_integrity bool False Check text integrity after chunking
**params Any Additional chunk parameters

file(*, filepath=None, filename=None, file_content=None, filter_name="plain_text", timeout=None)

Send a file to the server for text extraction. Returns FileResponse.

Two input modes:

  • CLI channel: provide filepath (server reads the file).
  • API channel: provide filename + file_content (bytes; base64-encoded internally).
Parameter Type Default Description
filepath Optional[str] None Local file path (CLI channel)
filename Optional[str] None Filename (API channel)
file_content Optional[bytes] None Raw file bytes (API channel)
filter_name str "plain_text" Server-side filter name
timeout Optional[float] None Per-call timeout in seconds

chunk_file(*, filepath=None, filename=None, file_content=None, filter_name="plain_text", use_sv=False, timeout=0, verify_integrity=False, **chunk_params)

Convenience method: file() + chunk() in one call. Returns List[List[SemanticChunk]].

Any file format supported (PDF, DOCX, images, text, markdown, etc.) — server handles all processing.

Parameter Type Default Description
filepath Optional[str] None Local file path
filename Optional[str] None Filename for API channel
file_content Optional[bytes] None Raw file bytes
filter_name str "plain_text" Server-side filter name
use_sv bool False Use semantic verification
timeout Optional[float] 0 Timeout in seconds; 0 = no limit
verify_integrity bool False Check text integrity
**chunk_params Any Additional chunk parameters

config(timeout=None)

Retrieve server configuration. Returns Dict[str, Any].

help_cmd(command=None, timeout=None)

Retrieve server help information. Returns Dict[str, Any]. If command is given, returns help for that specific command.

health()

Health check — verifies the server is up. Returns Dict[str, Any].

open_ws_channel(receive_timeout=60.0, heartbeat=30.0)

Open a bidirectional WebSocket channel for multiple requests. Returns BidirectionalWsChannel.

Parameter Type Default Description
receive_timeout float 60.0 Per-message receive timeout (seconds)
heartbeat float 30.0 WebSocket keepalive interval (seconds)

close()

Close the underlying client connection. Also available via async context manager (async with).

Long-Running File Operations

File processing is entirely server-side. The client sends the file as-is (any format: PDF, DOCX, images, text, markdown, etc.).

  • chunk_file() defaults to no timeout limit — the client waits as long as the server needs (up to 1 hour for large files).
  • Results arrive via WebSocket; the adapter heartbeat (30s) keeps the connection alive.
  • To set an explicit limit, pass timeout=N (seconds):
# Wait up to 30 minutes:
chunks = await client.chunk_file(
    filepath="/path/to/large.pdf",
    timeout=1800,
)

CLI Reference

# Chunk text
svo-chunker chunk --text "Your text here" [--use-sv] [--type Draft]

# Chunk multiple texts (batch)
svo-chunker chunk-batch --text "Text one" --text "Text two"

# Process file (extraction + optional chunking)
svo-chunker file --filepath /path/to/file [--filter plain_text] [--chunk]

# Server configuration
svo-chunker config

# Health check
svo-chunker health

Error Handling

All exceptions are importable from svo_client:

from svo_client import SVOServerError, SVOTimeoutError, SVOFileError
Exception When Raised
SVOServerError Server application-level error
SVOChunkingIntegrityError Text integrity check fails (subclass of SVOServerError)
SVOJSONRPCError JSON-RPC error response
SVOHTTPError HTTP error or invalid response
SVOWebSocketRequiredError WebSocket required but unavailable
SVOConnectionError Network/connection issues
SVOTimeoutError Request timeout exceeded
SVOEmbeddingError Embedding service error
SVOFileError Base file-command error
SVOFilePayloadError Invalid file payload (subclass of SVOFileError)
SVOFileTypeError Unknown filter_name (subclass of SVOFileError)
SVOFileNotFoundError File not found (subclass of SVOFileError)
SVOFilePermissionError Permission denied (subclass of SVOFileError)
SVOFileReadError OS-level file read error (subclass of SVOFileError)

Filter Names

Supported server-side filter names for file processing:

  • plain_text — extract plain text content (default)
  • markdown — extract and process markdown
  • txt — raw text extraction

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svo_client-2.3.1.tar.gz (89.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

svo_client-2.3.1-py3-none-any.whl (66.7 kB view details)

Uploaded Python 3

File details

Details for the file svo_client-2.3.1.tar.gz.

File metadata

  • Download URL: svo_client-2.3.1.tar.gz
  • Upload date:
  • Size: 89.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for svo_client-2.3.1.tar.gz
Algorithm Hash digest
SHA256 9e2af9a29ed8a40c04b1734054a0ea25d478a0ce0f68735d97d9c3d2182083e5
MD5 c05bce18708da08561689bc1fda30860
BLAKE2b-256 5e84710a39e74535f60e63b957556284cf6512190c26c697153d0db0e54566d7

See more details on using hashes here.

File details

Details for the file svo_client-2.3.1-py3-none-any.whl.

File metadata

  • Download URL: svo_client-2.3.1-py3-none-any.whl
  • Upload date:
  • Size: 66.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for svo_client-2.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bdf418cd131b49df9015f80a812d3019ce6aff64322e4ece19124b8c50c9b5a8
MD5 8fdae9315d8d59c1d45fe46b0759ccaa
BLAKE2b-256 eb84349781217e061e4cc12d38cf87e6c77cf3034f26d8a80cd5cb24c2cfdbfb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page