Skip to main content

Classifyre CLI — scan and classify unstructured data sources

Project description

CLI Application

Python CLI for source extraction, detector execution, and batched output delivery.

Setup

cd /unstructured/apps/cli
uv sync
# Optional if you want an activated shell instead of `uv run ...`:
source .venv/bin/activate

Optional detector groups:

uv sync --group detectors
# or specific groups: --group secrets --group pii --group threat ...

Command Syntax

Use the thin wrapper:

uv run main.py <command> <recipe.json> [options]

Or direct module entrypoint:

uv run python -m src.main <command> <recipe.json> [options]

Commands:

  • test - test source connection.
  • discover - discover source resources.
  • extract - run extraction and emit batched output.
  • sandbox - run sandbox parsing/detectors for a local file.

Extract Output Model

Extraction always emits in batches. Recipes do not contain output configuration; output is controlled by CLI flags and environment variables.

Output types:

  • console - emits NDJSON envelopes to stdout.
  • file - appends NDJSON envelopes to a file.
  • rest - pushes batches to API endpoints and finalizes run.

Default behavior:

  • If source_id is present (--source-id or SOURCE_ID env), default output is rest.
  • Otherwise default output is console.
  • Default batch size is 20.

CLI Options

Global/common:

  • --debug - enable debug logging.
  • --detectors-file <path> - sandbox only.

Extract output options:

  • --output-type rest|file|console
  • --output-batch-size <int>
  • --output-rest-url <url>
  • --output-file-path <path>
  • --source-id <uuid>
  • --runner-id <uuid>
  • --managed-runner (REST only; runner lifecycle managed by API orchestrator)

Environment fallbacks:

  • SOURCE_ID, RUNNER_ID
  • CLASSIFYRE_OUTPUT_TYPE, CLASSIFYRE_OUTPUT_BATCH_SIZE
  • CLASSIFYRE_OUTPUT_REST_URL, CLASSIFYRE_OUTPUT_REST_TIMEOUT_SEC
  • CLASSIFYRE_OUTPUT_FILE_PATH
  • API_URL (fallback base URL for REST output)

Practical Examples

1) Console output (quick local test)

uv run main.py extract ./wordpress-recipe.json --output-type console --output-batch-size 1

You will see NDJSON lines like:

  • {"event":"batch", ...}
  • {"event":"finish", ...}

2) File output

uv run main.py extract ./wordpress-recipe.json \
  --output-type file \
  --output-file-path /tmp/classifyre-assets.ndjson \
  --output-batch-size 20

3) REST output (manual CLI to backend)

uv run main.py extract ./wordpress-recipe.json \
  --output-type rest \
  --source-id <source_uuid>

Notes:

  • --runner-id optional for manual runs. If omitted, CLI creates external runner automatically.
  • --output-rest-url is optional. If omitted, CLI uses CLASSIFYRE_OUTPUT_REST_URL, then API_URL, then http://localhost:8000.
  • --managed-runner should be used only for API-orchestrated runs where runner already exists.

4) REST output with explicit runner (managed/orchestrated style)

uv run main.py extract ./wordpress-recipe.json \
  --output-type rest \
  --source-id <source_uuid> \
  --runner-id <runner_uuid> \
  --managed-runner

5) Full extract command with all output flags

uv run main.py extract ./wordpress-recipe.json \
  --output-type rest \
  --output-batch-size 20 \
  --output-rest-url http://localhost:8000 \
  --output-file-path /tmp/classifyre-assets.ndjson \
  --source-id <source_uuid> \
  --runner-id <runner_uuid> \
  --managed-runner

Use --output-file-path only when --output-type file.

Dev Scripts

  • bun run dev - run CLI quickly.
  • bun run lint - ruff format/check.
  • bun run check-types - mypy.
  • bun run test - pytest suite.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

classifyre_cli-0.4.3.tar.gz (553.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

classifyre_cli-0.4.3-py3-none-any.whl (242.2 kB view details)

Uploaded Python 3

File details

Details for the file classifyre_cli-0.4.3.tar.gz.

File metadata

  • Download URL: classifyre_cli-0.4.3.tar.gz
  • Upload date:
  • Size: 553.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for classifyre_cli-0.4.3.tar.gz
Algorithm Hash digest
SHA256 c69b495a5aa46e64b862289ee91c76f114a8974593faa5533e7e4832dab16414
MD5 00694c360e5df70891514ea4a1652705
BLAKE2b-256 758c03fb593d01c26c2935917b05d6bfecbfc2c3f4c79c8a0aeb33f8ec50d6aa

See more details on using hashes here.

File details

Details for the file classifyre_cli-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: classifyre_cli-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 242.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for classifyre_cli-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9c19afec458bf04c650d9e0658ef5549819564d5047452363dff16dcb6cc2ba7
MD5 db902aa4212365c1c0524cce1cfd2406
BLAKE2b-256 bb1a7d09067866fcd82411a18380cdc18c19b35c9b523da71d864d63e16bcfaf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page