Skip to main content

Document-native DAG runner for preprocessing PDFs, Office files, and email messages into structured evidence artifacts.

Project description

docflow

docflow is a document-native DAG runner for preprocessing files into structured evidence artifacts, plain text exports, and downstream indexes.

It supports these input types:

  • pdf
  • docx
  • doc
  • xlsx
  • xls
  • msg

Core capabilities:

  • flow execution from *.flow.dag.yaml
  • document parsing into evidence atoms
  • metadata enrichment
  • evidence-graph construction
  • semantic, structural, and spatial indexes
  • plain-text and structured output artifacts

Example install for local development:

cd packages/docflow
python3 -m pip install -e '.[dev]'

Example CLI usage:

docflow run ../../docflow/examples/document_preprocess.flow.dag.yaml --source-dir /path/to/documents --output-dir /tmp/docflow_run --trace

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docflow_sager-0.1.0.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docflow_sager-0.1.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file docflow_sager-0.1.0.tar.gz.

File metadata

  • Download URL: docflow_sager-0.1.0.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for docflow_sager-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b2e9101c5382f88248229507efe3ff86899b43c2c7f8c921548ff198447b30ac
MD5 effbbb71aadca2fceb1af018110454bb
BLAKE2b-256 d8dc5844cb38d10c9bc26d3cdf2b462a0d46a2de6bfcb6be567fb3486b1a4612

See more details on using hashes here.

File details

Details for the file docflow_sager-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: docflow_sager-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for docflow_sager-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 582cce5ccb592bcb955e37aae6e35150edee7846c2b70550d53843c801534879
MD5 660b450942615d07454d4bf37f49cd59
BLAKE2b-256 5745679c556fef56b83b0f12dfedabe32f1eca0022d6328be614182877bdf5f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page