Skip to main content

Unified DLP scanner for SaaS sources — secret detection (trufflehog, gitleaks, native regex) plus PII detection (pleno-anonymize). API-driven content collection from GitHub, GitLab, Bitbucket, Slack, Notion, Confluence, Jira.

Project description

pleno-dlp (Python)

Unified DLP scanner for SaaS content — secrets (trufflehog / gitleaks / native regex) and PII (delegating to pleno-anonymize).

A connector models a SaaS provider — github, gitlab, bitbucket, slack, notion, confluence, jira — and owns the full lifecycle: walks content through the provider's API, detects leaks in that content, and (optionally) verifies / revokes credentials. Detection happens inside the connector; the engine choice (native, trufflehog, gitleaks, pii) is a per-connector option, not a separate plugin. Every connector self-describes via ConnectorSpec.capabilities (SOURCE + DETECT baseline, optional VERIFY / REVOKE).

pip install pleno-dlp pulls one wheel exposing one console script (pleno-dlp). The Go binary in this repo (cmd/pleno-dlp) remains for filesystem-only scans; the Python package is the path forward for SaaS.

Install

uv tool install pleno-dlp
# or
pipx install pleno-dlp

# Add the PII backend (pulls pleno-anonymize):
uv tool install 'pleno-dlp[pii]'

Usage

The CLI is connector-agnostic: knobs flow through the generic --option key=value flag, and the detection engine is picked with --engine. Run pleno-dlp describe <connector> for the accepted keys, types, defaults, and which ones are secrets.

# Discover what's registered
pleno-dlp list                              # connectors + engines
pleno-dlp list --capability verify          # connectors with VERIFY
pleno-dlp describe github

# Secret scan over an entire GitHub org with the default native engine
GITHUB_TOKEN=ghp_... pleno-dlp scan github --option owner=plenoai

# Scan a single repo, only code, with trufflehog verification
pleno-dlp scan github \
    --option owner=plenoai --option repo=pleno-dlp \
    --option resources=code --engine trufflehog

# Issue + PR conversations only, PII detection (requires pleno-anonymize)
pleno-dlp scan github --option owner=plenoai \
    --option resources=issues,prs --engine pii

# SARIF output for GitHub code-scanning ingestion
pleno-dlp scan github --option owner=plenoai \
    --format sarif > findings.sarif

# Slack workspace — same shape, different source connector
pleno-dlp scan slack --token xoxb-... --option include_threads=false

# Confirm a leaked github PAT is still live
pleno-dlp verify github --token ghp_…

Auth resolution for github: --tokenGITHUB_TOKEN env var → gh auth token. Anonymous works for public content but is rate-limited to 60 req/h. Other source connectors take their token via --token (shorthand for --option token=…) or via --option api_token=… / --option access_token=… depending on the auth mode (see describe).

Detection engines

Engines are the internal scanners connectors compose with. They are stateless utilities that turn a Document.text into Finding\s. Operators do not address them directly — instead pick one with --engine (or --option engine=…); the connector hands its own Documents to the chosen engine. Default for every connector: native.

Engine Class Verifies System dep
trufflehog secret yes (per-detector) trufflehog CLI on PATH
gitleaks secret no gitleaks CLI on PATH
native secret no none — bundled regex (AWS, GitHub PAT, Slack bot, OpenAI, Anthropic)
pii PII n/a pleno-anonymize HTTP API (installed via pleno-dlp[pii] extra)

Source connectors

Each connector self-describes via a ConnectorSpec (auth modes, resources, options, runtime capabilities). Today: github, gitlab, bitbucket (cloud + server), slack (xoxb / xoxp), notion, confluence (cloud + datacenter), jira (cloud + datacenter). Run pleno-dlp list for the live list and pleno-dlp describe <name> for the option sheet.

Capabilities

A connector advertises one or more capabilities:

  • Capability.SOURCE — implements the Connector Protocol (discover / fetch / capabilities). Every shipped connector has this.
  • Capability.DETECT — implements the Detector Protocol (detect(doc) -> AsyncIterator[Finding]). Every shipped connector has this; the engine choice is configured via --option engine=….
  • Capability.VERIFY — implements the Verifier Protocol (verify(secret) -> VerifyResult). Today: github (probes GET /user).
  • Capability.REVOKE — implements the Revoker Protocol (revoke(secret) -> RevokeResult). Reserved; no built-in connector has this yet — providers without a programmatic revoke endpoint should leave it unset and document the manual rotation flow.

pleno-dlp verify <connector> --token … exercises VERIFY. Exit codes: 0 = LIVE, 1 = REVOKED, 2 = UNKNOWN/unsupported.

Adding a new connector

  1. Create python/src/pleno_dlp/connectors/<name>.py. Subclass DetectViaEngineMixin from pleno_dlp.connectors._detect so detect() and the engine kwarg come for free.
  2. Implement the Connector Protocol (discover, fetch, discover_and_fetch, capabilities, close). Keep one httpx.AsyncClient per instance. Call self._init_engine(engine) from your __init__ and await self._close_engine() from your close(). Optionally add verify(secret) / revoke(secret) for lifecycle support.
  3. Declare a spec: ClassVar[ConnectorSpec] = ConnectorSpec(...) with name, kind, summary, capabilities (defaults to {SOURCE, DETECT} — extend with VERIFY / REVOKE as you implement them), auth_modes, resources, options (every __init__ kwarg, including DETECT_ENGINE_OPTION from _detect), and runtime (a Capabilities describing incremental / streaming / concurrency).
  4. End the module with registry.register("<name>", <Class>).
  5. Wire the import in pleno_dlp/connectors/__init__.py.
  6. Add fixtures + tests under python/tests/connectors/test_<name>.py using httpx.MockTransport.

Once the spec lands, pleno-dlp scan <name> --engine <engine>, pleno-dlp verify <name>, pleno-dlp list, and pleno-dlp describe all work without touching the CLI.

Release

Tag py-vX.Y.Z triggers PyPI trusted publishing via GitHub Actions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pleno_dlp-0.12.0.tar.gz (83.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pleno_dlp-0.12.0-py3-none-any.whl (104.7 kB view details)

Uploaded Python 3

File details

Details for the file pleno_dlp-0.12.0.tar.gz.

File metadata

  • Download URL: pleno_dlp-0.12.0.tar.gz
  • Upload date:
  • Size: 83.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pleno_dlp-0.12.0.tar.gz
Algorithm Hash digest
SHA256 fc9181d0a63a3aac3f010ca5e624c8f1bc22495a5d0bbf232daa3f49dcd55a0e
MD5 dd791b1654b423931253e929cc8e1114
BLAKE2b-256 14107d67f5ce0e05b2daf53a41c7639a72a5d2e703d033341efef113f0da105e

See more details on using hashes here.

Provenance

The following attestation bundles were made for pleno_dlp-0.12.0.tar.gz:

Publisher: release-py.yml on plenoai/pleno-dlp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pleno_dlp-0.12.0-py3-none-any.whl.

File metadata

  • Download URL: pleno_dlp-0.12.0-py3-none-any.whl
  • Upload date:
  • Size: 104.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pleno_dlp-0.12.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d511f4a87ba2ee0a602aef715a27f49c5b453a81e2a37ceae9c4d8a5d6e668c4
MD5 3922528c834a5d99d99849b8be3092fe
BLAKE2b-256 664c7af66b36879c54e89d81e7e2ec5eb50f3684f4f3807e66c9cf6150bb93cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for pleno_dlp-0.12.0-py3-none-any.whl:

Publisher: release-py.yml on plenoai/pleno-dlp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page