vLLM OpenAI-compatible model adapter plugin for Spakky Agent

Project description

spakky-vllm

spakky-vllm은 spakky-agent를 위한 첫 공식 IAgentModel 어댑터 패키지입니다. 코어에 모델 SDK 의존성을 넣지 않고, Spakky Agent workflow를 로컬 vLLM OpenAI-compatible HTTP 엔드포인트에 연결합니다.

언제 필요한가

애플리케이션이 로컬 vLLM 서버를 대상으로 @Agent workflow를 실행하고, 코어 IAgentModel 포트를 통해 모델 구현체를 주입해야 할 때 사용합니다.

설치

pip install spakky-vllm

durable Agent 실행에는 spakky-agent와 spakky-sqlalchemy[agent] 같은 persistence provider도 필요합니다. spakky-vllm은 모델 어댑터만 제공합니다. state, signal, evidence repository, inbound HTTP/CLI 어댑터, 운영용 in-memory persistence fallback은 제공하지 않습니다.

설정

설정은 SPAKKY_VLLM__ 환경변수 접두사를 사용하는 VllmConfig로 읽습니다.

설정	기본값	목적
`SPAKKY_VLLM__ENDPOINT_URL`	`http://127.0.0.1:8000/v1`	OpenAI-compatible API 기본 URL
`SPAKKY_VLLM__MODEL`	`default`	chat completion 요청에 전달할 model id
`SPAKKY_VLLM__REQUEST_TIMEOUT_SECONDS`	`30.0`	비스트리밍 요청 timeout
`SPAKKY_VLLM__STREAM_TIMEOUT_SECONDS`	`300.0`	스트리밍 요청 timeout
`SPAKKY_VLLM__STREAM_ENABLED`	`true`	public streaming surface 활성화 여부
`SPAKKY_VLLM__CHAT_TEMPLATE_KWARGS__ENABLE_THINKING`	미설정	vLLM chat template에 전달할 모델별 옵션 예시

chat_template_kwargs는 vLLM의 모델별 chat template 옵션을 요청 payload에 그대로 전달합니다. 예를 들어 일부 reasoning/thinking 계열 모델은 enable_thinking=false 같은 template switch를 지원하며, 짧은 검증 요청에서는 이런 옵션을 통해 응답 토큰 예산을 더 예측 가능하게 만들 수 있습니다.

플러그인 표면

플러그인을 로드하면 다음 항목을 등록합니다.

VllmConfig
HttpxVllmChatClient
VllmAgentModel
명시적 IAgentModel -> VllmAgentModel binding

VllmAgentModel.complete()는 OpenAI-compatible chat completion 요청을 보내고, provider 응답을 ModelResponse로 변환합니다. structured output 요청에는 OpenAI-compatible response_format과 vLLM structured_outputs.json 제약을 함께 실어 보낸 뒤, 반환된 JSON을 파싱하고 검증해 ModelResponse.structured_output으로 노출합니다.

tool calling은 @agent_tool descriptor에서 생성된 ModelToolSpec.parameters.schema 객체를 vLLM function parameter schema로 사용합니다. required tool choice는 constrained decoding으로 취급합니다. auto와 strict tool schema를 함께 쓰는 요청은 capability error로 실패합니다. vLLM이 auto tool arguments의 schema-constrained decoding을 보장하지 않기 때문입니다. 반환된 tool arguments는 같은 schema로 파싱·검증한 뒤에만 ModelToolCall로 노출됩니다.

VllmAgentModel.stream()은 같은 요청에 stream=true를 붙여 전송하고, server-sent event chunk를 디코딩해 provider-neutral ModelStreamEvent 값을 내보냅니다.

token delta는 ModelStreamEventKind.TOKEN_DELTA
streamed function-call fragment는 tool_calls finish boundary에서 TOOL_CALL_CANDIDATE
structured JSON content는 terminal validation 이후 STRUCTURED_OUTPUT
StreamingOptions.include_usage가 켜진 경우 usage chunk는 마지막 DONE event에 첨부
timeout, transport, invalid chunk, invalid structured output, provider error, refusal, unsupported constrained decoding mode, non-success finish reason은 typed ERROR event 뒤 DONE으로 종료

Agent 구현체는 token event를 AgentYieldKind.TOKEN payload로 전달할 수 있습니다. 취소 lifecycle이 모델 호출 중단을 요구하면 async stream을 닫으면 됩니다. HTTP stream은 async generator가 소유하므로 aclose()가 underlying client stream을 해제하고, background request를 남기지 않습니다.

검증 전략

이 패키지의 테스트는 실제 vLLM 서버나 로컬 모델을 호출하지 않습니다. CI와 로컬 커밋 시간을 예측 가능하게 유지하기 위해 IVllmChatClient fake를 사용해 adapter 계약을 검증합니다.

주요 검증 범위:

OpenAI-compatible request payload 변환
structured output 요청과 JSON schema 검증
required tool calling payload와 반환 argument 검증
server-sent event chunk를 ModelStreamEvent로 변환하는 streaming adapter 동작
timeout, transport, provider error, refusal, unsupported constrained decoding mode, non-success finish reason의 typed error mapping

Project details

Release history Release notifications | RSS feed

6.8.0

Jun 15, 2026

6.7.0

Jun 14, 2026

6.6.1

Jun 14, 2026

This version

6.6.0

Jun 14, 2026

6.5.0

Jun 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spakky_vllm-6.6.0.tar.gz (10.9 kB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spakky_vllm-6.6.0-py3-none-any.whl (13.7 kB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file spakky_vllm-6.6.0.tar.gz.

File metadata

Download URL: spakky_vllm-6.6.0.tar.gz
Upload date: Jun 14, 2026
Size: 10.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for spakky_vllm-6.6.0.tar.gz
Algorithm	Hash digest
SHA256	`ee50c74932024cdd464aeccb4e37a0b6277abce1c20410ee4eb7a0a36352ba35`
MD5	`b61a207f12093f6e991f35686fb9b8ad`
BLAKE2b-256	`1daf49e199bc75ba05536f119e1f96dd41150d542c514ec8dd18a4ad1e783af3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for spakky_vllm-6.6.0.tar.gz:

Publisher: publish-package.yml on E5presso/spakky-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: spakky_vllm-6.6.0.tar.gz
- Subject digest: ee50c74932024cdd464aeccb4e37a0b6277abce1c20410ee4eb7a0a36352ba35
- Sigstore transparency entry: 1817554253
- Sigstore integration time: Jun 14, 2026
Source repository:
- Permalink: E5presso/spakky-framework@a750eb66a4ec78130f7782372b5e92a25c2b9839
- Branch / Tag: refs/heads/main
- Owner: https://github.com/E5presso
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-package.yml@a750eb66a4ec78130f7782372b5e92a25c2b9839
- Trigger Event: workflow_dispatch

File details

Details for the file spakky_vllm-6.6.0-py3-none-any.whl.

File metadata

Download URL: spakky_vllm-6.6.0-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 13.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for spakky_vllm-6.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a8d03d433240d1cb14707d5a0a71d85dea93e3b313f745e0e7cc8e5e9e32935f`
MD5	`fe198061e7e940fce47e130ba64a8eb4`
BLAKE2b-256	`69ab1555f1f4fa4f57b05d20487dab29dfdc7f735816131bf024fef4a601a34a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for spakky_vllm-6.6.0-py3-none-any.whl:

Publisher: publish-package.yml on E5presso/spakky-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: spakky_vllm-6.6.0-py3-none-any.whl
- Subject digest: a8d03d433240d1cb14707d5a0a71d85dea93e3b313f745e0e7cc8e5e9e32935f
- Sigstore transparency entry: 1817554416
- Sigstore integration time: Jun 14, 2026
Source repository:
- Permalink: E5presso/spakky-framework@a750eb66a4ec78130f7782372b5e92a25c2b9839
- Branch / Tag: refs/heads/main
- Owner: https://github.com/E5presso
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-package.yml@a750eb66a4ec78130f7782372b5e92a25c2b9839
- Trigger Event: workflow_dispatch

spakky-vllm 6.6.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Project description

spakky-vllm

언제 필요한가

설치

설정

플러그인 표면

검증 전략

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance