Quarantine your imports — configurable content classification pipeline

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Xof

These details have not been verified by PyPI

Project description

Poveglia

Quarantine your imports.

A Python library that provides a configurable pipeline of content classifiers for scanning uploaded files. Virus scanning, explicit content detection, CSAM reporting, zip bomb detection, AI-generated image detection, and more — all through a single async API.

For how the system is built internally, see ARCHITECTURE.md; for the design rationale and tradeoffs, see THEORY.md.

Quick Start

pip install poveglia

import asyncio
from poveglia import classify, Status

result = asyncio.run(classify({
    "url": "s3://my-bucket/uploads/photo.jpg",
    "classifiers": ["virus", "explicit", "csam", "policy"],
    "classifier_config": {
        "explicit": {"api_callable": my_vision_api, "threshold": 0.7},
        "csam": {"api_callable": my_csam_api, "callback": my_csam_reporter},
        "policy": {"max_size_bytes": 50_000_000, "forbidden_mimetypes": ["video/*"]},
    },
    "metadata": {"user_id": "u_123", "upload_id": "up_456"},
}))

if result.status == Status.FORBID:
    reject_upload(result)
elif result.status == Status.REVIEW:
    queue_for_human_review(result)

Or use the sync wrapper:

from poveglia import classify_sync

result = classify_sync({...})

How It Works

Poveglia runs classifiers in series, in the order you specify. Each classifier returns one of four statuses:

Status	Meaning	Pipeline behavior
`allow`	Content passes	Continue to next classifier
`review`	Uncertain — flag for human review	Continue to next classifier
`forbid`	Content fails	Stop pipeline
`mandatory_action`	Content fails, action required	Execute callback, then stop pipeline

The result includes a top-level status (the worst across all classifiers), per-classifier details, any actions taken, and your metadata passed through untouched.

Scoring Mode

If you want all classifiers to run regardless of failures (for ranking rather than gating):

result = await classify({
    ...
    "scoring_mode": True,
})
# result.status is still the worst, but nothing was short-circuited

Bundled Classifiers

Detection

Name	What it detects	Optional deps
`virus`	Malware via ClamAV	`poveglia[clamav]`
`zip_bomb`	Zip bombs (compression ratio, nesting depth)	none
`explicit`	Nudity, gore, violence, suggestive content	`poveglia[vision]`
`csam`	CSAM — returns `mandatory_action` on high-confidence hits	`poveglia[vision]`
`generated`	AI-generated imagery	`poveglia[vision]`
`identifiable`	Identifiable people (faces)	`poveglia[vision]`
`policy`	File size, MIME type (extension-based)	none

Actions

These run in the pipeline like any classifier, but are also available as standalone API calls:

Name	What it does	Standalone API
`reporting`	Submits reports when classifier scores exceed thresholds	`poveglia.reporting.submit()`
`legal_hold`	Places objects on legal hold in storage	`poveglia.legal_hold.apply()`
`metadata`	Writes classification metadata to object store	`poveglia.metadata.upload()`

The Input Control Structure

{
    # Required
    "url": "s3://bucket/uploads/file.jpg",
    "classifiers": ["virus", "zip_bomb", "explicit", "csam",
                     "identifiable", "reporting", "metadata"],

    # Per-classifier configuration
    "classifier_config": {
        "explicit": {
            "api_callable": my_vision_api,  # async callable
            "threshold": 0.7,               # forbid above this
            "review_threshold": 0.4,        # review above this
        },
        "csam": {
            "api_callable": my_csam_api,
            "callback": my_csam_handler,    # fires on mandatory_action
            "threshold": 0.8,
        },
        "reporting": {
            "triggers": {"csam": 0.8, "explicit": 0.95},
            "handler": my_report_handler,
        },
        "policy": {
            "max_size_bytes": 52428800,
            "forbidden_mimetypes": ["video/*"],
            "allowed_mimetypes": ["image/*"],
        },
        "metadata": {
            "backend": my_metadata_writer,
        },
    },

    # Skip downloading — use a local copy instead
    "local_path": "/tmp/staged/file.jpg",

    # Cap bytes pulled from a remote URL (DoS guard); omit or None for no cap.
    # Exceeding it raises ContentTooLargeError, recorded in result.errors.
    "max_download_bytes": 52428800,

    # Run all classifiers, never short-circuit
    "scoring_mode": False,

    # Where transformation classifiers write output. Exposed to classifiers as
    # content.output_url; a transforming classifier writes there and returns it
    # as ClassifierResult.transformed_url (surfaced on result.transformed_url).
    "output_url": "s3://bucket/transformed/file.jpg",

    # Passed through untouched to the result
    "metadata": {"user_id": "u_123", "upload_id": "up_456"},
}

The classifiers list controls both which classifiers run and in what order. Order matters — classifiers can share results through the blackboard (see below).

The Result Object

result.status             # Status.ALLOW / REVIEW / FORBID / MANDATORY_ACTION
result.is_clean           # True only if status == ALLOW AND errors is empty
result.classifiers        # {"virus": ClassifierResult(...), "explicit": ClassifierResult(...)}
result.actions_taken      # [ActionRecord(classifier="reporting", action="callback", result={...})]
result.errors             # [ErrorRecord(classifier="generated", error="ServiceUnavailable", ...)]
result.transformed_url    # "s3://..." if a transformation classifier produced output
result.metadata           # {"user_id": "u_123"} — your passthrough data

Important: result.status alone is not a "safe to ship" signal. Classifier exceptions are recorded in result.errors and do not raise the aggregate status — a run where every classifier raised yields Status.ALLOW with populated errors. Use result.is_clean as the binary pass/fail predicate, or check result.errors explicitly alongside result.status.

Content Access

Poveglia accesses files through a lazy content resolver. Some classifiers need only the URL (to pass to external APIs); others need the raw bytes or a local file path.

The resolver downloads only when needed, and caches the result — so if three classifiers call .bytes(), the file is downloaded once.

To avoid the download entirely, provide a local_path in the control structure pointing to a locally-staged copy.

Memory footprint

ContentResolver.bytes() holds the full content in memory for the resolver's lifetime. For small uploads (images, documents) this is fine and avoids redundant I/O. For large files (video, archives, disk images) prefer local_path() in your classifier — it materializes a temp file once and hands out paths instead of keeping bytes resident. Classifiers that shell out to external binaries (ClamAV, ffmpeg, etc.) should always use local_path() regardless of size.

The Blackboard

Classifiers can share intermediate results through a shared context dict, avoiding redundant API calls.

For example, if explicit calls a vision API that also returns face detection data, identifiable can reuse it instead of making a second call:

# explicit classifier writes to the blackboard:
context["explicit.faces"] = [{"confidence": 0.85}, ...]

# identifiable classifier checks the blackboard first:
faces = context.get("explicit.faces")
if faces is not None:
    # reuse — no API call needed

Keys follow the convention <classifier_name>.<key>. Classifiers must always work standalone if the blackboard is empty — the optimization is never a hard dependency.

Writing Custom Classifiers

from poveglia import Classifier, ClassifierResult, Status

class MyClassifier(Classifier):
    name = "my_check"

    async def classify(self, content, config, context):
        data = await content.bytes()

        if looks_bad(data):
            return ClassifierResult(
                status=Status.FORBID,
                detail={"reason": "failed my_check"},
            )

        return ClassifierResult(
            status=Status.ALLOW,
            detail={"clean": True},
        )

[project.entry-points."poveglia.classifiers"]
my_check = "my_package.classifiers:MyClassifier"

Then reference it by name: "classifiers": ["virus", "my_check", "policy"].

CSAM Handling

The CSAM classifier returns mandatory_action on high-confidence hits. This means:

The pipeline short-circuits (no further classifiers run)
The callback you provided in classifier_config.csam.callback fires automatically
The callback result is recorded in result.actions_taken

If no callback is configured, the classifier falls back to forbid — the content is still rejected, but no automatic reporting occurs. A warning is emitted on the poveglia.classifiers.csam logger whenever this fallback fires; route that logger at WARNING or above to your alerting channel.

For deployments where missing the callback is a compliance violation (not merely a dev-mode inconvenience), set require_callback: True in the csam config. With that flag on, a high-confidence detection without a callback raises — the misconfiguration lands in result.errors instead of silently rejecting the content.

Poveglia ships a reporting utility (poveglia.reporting.submit()) and a legal hold utility (poveglia.legal_hold.apply()) that you can wire up as callbacks. You are responsible for configuring and using these — Poveglia provides the tools, not the compliance.

Error Handling

If a classifier raises an exception, the pipeline catches it and continues. The error is recorded in result.errors, but it doesn't stop other classifiers from running and doesn't affect the top-level status.

A failed mandatory callback (e.g., a CSAM report that couldn't be submitted) is recorded in result.actions_taken with error detail — surface this loudly so you can retry.

Principle: fail open in the pipeline, fail loud in the results.

The one exception is configuration errors. An unknown classifier name in classifiers is not caught — classify() / classify_sync() raises KeyError before any classifier runs (and before any download), so a typo'd name fails fast rather than silently producing an incomplete result. This is deliberate: a missing classifier is a programming error, not a content verdict.

Installation

# Core + all classifiers (light deps only)
pip install poveglia

# With vision classifier dependencies
pip install poveglia[vision]

# With ClamAV support
pip install poveglia[clamav]

# With object storage support (metadata, legal_hold)
pip install poveglia[storage]

# Everything
pip install poveglia[all]

Requirements

Python 3.11+
A running ClamAV daemon (for the virus classifier)
Vision/CSAM API credentials (for explicit, csam, generated, identifiable)

Development

# Editable install with the dev toolchain
pip install -e '.[dev]'

# Run the test suite (the "integration" marker is reserved for real-service
# tests; none exist yet, so this currently runs everything)
pytest -m "not integration"

# Lint and type-check — the same gates CI enforces
ruff check poveglia tests
mypy poveglia

CI runs lint, type-check, and tests on Python 3.11, 3.12, and 3.13 for every push and pull request; a pip-audit dependency scan runs report-only.

Releasing

Releases publish to PyPI via GitHub Actions OIDC trusted publishing — no API token is stored anywhere. Publishing a GitHub Release triggers .github/workflows/publish.yml, which builds the sdist + wheel and uploads them with attestations.

One-time setup (PyPI side): add a Trusted Publisher for project poveglia → owner Xof, repo poveglia, workflow publish.yml, environment pypi.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Xof

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

Jun 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

poveglia-1.0.0.tar.gz (41.5 kB view details)

Uploaded Jun 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

poveglia-1.0.0-py3-none-any.whl (41.8 kB view details)

Uploaded Jun 22, 2026 Python 3

File details

Details for the file poveglia-1.0.0.tar.gz.

File metadata

Download URL: poveglia-1.0.0.tar.gz
Upload date: Jun 22, 2026
Size: 41.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for poveglia-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`29c5b71e403f081f3f6c90ba8d496871f16a135f10b69546ed2857ce81edd951`
MD5	`94dc5e0968cf34fe99f3a19b38dcb38d`
BLAKE2b-256	`21e9a613966a3c0e1758b55f7ca5c3ee4487ae1ed5c6dacc56a09703a6a4dffe`

See more details on using hashes here.

Provenance

The following attestation bundles were made for poveglia-1.0.0.tar.gz:

Publisher: publish.yml on Xof/poveglia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: poveglia-1.0.0.tar.gz
- Subject digest: 29c5b71e403f081f3f6c90ba8d496871f16a135f10b69546ed2857ce81edd951
- Sigstore transparency entry: 1906107294
- Sigstore integration time: Jun 22, 2026
Source repository:
- Permalink: Xof/poveglia@6ef3c64ff343536f92269b032bbba7957ab13bf7
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/Xof
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6ef3c64ff343536f92269b032bbba7957ab13bf7
- Trigger Event: release

File details

Details for the file poveglia-1.0.0-py3-none-any.whl.

File metadata

Download URL: poveglia-1.0.0-py3-none-any.whl
Upload date: Jun 22, 2026
Size: 41.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for poveglia-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`651fd8917ffcc66a99464695ad7117366289e7babcc28eafbb5cc74269849733`
MD5	`89e5605857eb9bd26bcd632cdd000f7d`
BLAKE2b-256	`764990dde5c599445055e89513c7e61e40fd21609882e71570260fad80d100b9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for poveglia-1.0.0-py3-none-any.whl:

Publisher: publish.yml on Xof/poveglia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: poveglia-1.0.0-py3-none-any.whl
- Subject digest: 651fd8917ffcc66a99464695ad7117366289e7babcc28eafbb5cc74269849733
- Sigstore transparency entry: 1906107566
- Sigstore integration time: Jun 22, 2026
Source repository:
- Permalink: Xof/poveglia@6ef3c64ff343536f92269b032bbba7957ab13bf7
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/Xof
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6ef3c64ff343536f92269b032bbba7957ab13bf7
- Trigger Event: release

poveglia 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Poveglia

Quick Start

How It Works

Scoring Mode

Bundled Classifiers

Detection

Actions

The Input Control Structure

The Result Object

Content Access

Memory footprint

The Blackboard

Writing Custom Classifiers

CSAM Handling

Error Handling

Installation

Requirements

Development

Releasing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance