Classifyre CLI — scan and classify unstructured data sources
Project description
CLI Application
Python CLI for source extraction, detector execution, and batched output delivery.
Setup
cd /unstructured/apps/cli
uv sync
# Optional if you want an activated shell instead of `uv run ...`:
source .venv/bin/activate
Optional detector groups:
uv sync --group detectors
# or specific groups: --group secrets --group pii --group threat ...
Command Syntax
Use the thin wrapper:
uv run main.py <command> <recipe.json> [options]
Or direct module entrypoint:
uv run python -m src.main <command> <recipe.json> [options]
Commands:
test- test source connection.discover- discover source resources.extract- run extraction and emit batched output.sandbox- run sandbox parsing/detectors for a local file.
Extract Output Model
Extraction always emits in batches.
Recipes do not contain output configuration; output is controlled by CLI flags and environment variables.
Output types:
console- emits NDJSON envelopes to stdout.file- appends NDJSON envelopes to a file.rest- pushes batches to API endpoints and finalizes run.
Default behavior:
- If
source_idis present (--source-idorSOURCE_IDenv), default output isrest. - Otherwise default output is
console. - Default batch size is
20.
CLI Options
Global/common:
--debug- enable debug logging.--detectors-file <path>- sandbox only.
Extract output options:
--output-type rest|file|console--output-batch-size <int>--output-rest-url <url>--output-file-path <path>--source-id <uuid>--runner-id <uuid>--managed-runner(REST only; runner lifecycle managed by API orchestrator)
Environment fallbacks:
SOURCE_ID,RUNNER_IDCLASSIFYRE_OUTPUT_TYPE,CLASSIFYRE_OUTPUT_BATCH_SIZECLASSIFYRE_OUTPUT_REST_URL,CLASSIFYRE_OUTPUT_REST_TIMEOUT_SECCLASSIFYRE_OUTPUT_FILE_PATHAPI_URL(fallback base URL for REST output)
Practical Examples
1) Console output (quick local test)
uv run main.py extract ./wordpress-recipe.json --output-type console --output-batch-size 1
You will see NDJSON lines like:
{"event":"batch", ...}{"event":"finish", ...}
2) File output
uv run main.py extract ./wordpress-recipe.json \
--output-type file \
--output-file-path /tmp/classifyre-assets.ndjson \
--output-batch-size 20
3) REST output (manual CLI to backend)
uv run main.py extract ./wordpress-recipe.json \
--output-type rest \
--source-id <source_uuid>
Notes:
--runner-idoptional for manual runs. If omitted, CLI creates external runner automatically.--output-rest-urlis optional. If omitted, CLI usesCLASSIFYRE_OUTPUT_REST_URL, thenAPI_URL, thenhttp://localhost:8000.--managed-runnershould be used only for API-orchestrated runs where runner already exists.
4) REST output with explicit runner (managed/orchestrated style)
uv run main.py extract ./wordpress-recipe.json \
--output-type rest \
--source-id <source_uuid> \
--runner-id <runner_uuid> \
--managed-runner
5) Full extract command with all output flags
uv run main.py extract ./wordpress-recipe.json \
--output-type rest \
--output-batch-size 20 \
--output-rest-url http://localhost:8000 \
--output-file-path /tmp/classifyre-assets.ndjson \
--source-id <source_uuid> \
--runner-id <runner_uuid> \
--managed-runner
Use --output-file-path only when --output-type file.
Dev Scripts
bun run dev- run CLI quickly.bun run lint- ruff format/check.bun run check-types- mypy.bun run test- pytest suite.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file classifyre_cli-0.4.3.tar.gz.
File metadata
- Download URL: classifyre_cli-0.4.3.tar.gz
- Upload date:
- Size: 553.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c69b495a5aa46e64b862289ee91c76f114a8974593faa5533e7e4832dab16414
|
|
| MD5 |
00694c360e5df70891514ea4a1652705
|
|
| BLAKE2b-256 |
758c03fb593d01c26c2935917b05d6bfecbfc2c3f4c79c8a0aeb33f8ec50d6aa
|
File details
Details for the file classifyre_cli-0.4.3-py3-none-any.whl.
File metadata
- Download URL: classifyre_cli-0.4.3-py3-none-any.whl
- Upload date:
- Size: 242.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c19afec458bf04c650d9e0658ef5549819564d5047452363dff16dcb6cc2ba7
|
|
| MD5 |
db902aa4212365c1c0524cce1cfd2406
|
|
| BLAKE2b-256 |
bb1a7d09067866fcd82411a18380cdc18c19b35c9b523da71d864d63e16bcfaf
|