Skip to main content

AI-assisted CLI for organizing files.

Project description

CI PyPI - Version PyPI - Python Version GitHub License

Dorgy

dorgy logo

dorgy is an AI-assisted command line toolkit that keeps growing collections of files tidy. The project already ships ingestion, classification, organization, watch, search, and undo workflows while we continue to flesh out the roadmap captured in SPEC.md. For a deeper dive into how the components fit together, see the architecture overview.

Why Dorgy?

  • Hands-off organization – classify, rename, and relocate files using DSPy-backed language models plus fast heuristic fallbacks.
  • Continuous monitoring – watch directories, batch changes, and export machine-readable summaries for downstream automation.
  • Rich undo and audit history – track every operation in .dorgy/ so reorganizations remain reversible.
  • Per-collection search stores – Chromadb indexes live under <collection>/.dorgy/chroma, keeping semantic search portable without touching global config.
  • Extensible foundation – configuration is declarative, tests are automated via uv, and the roadmap is public.

Installation

PyPI (recommended)

# Using pip
pip install dorgy

# Using uv
uv pip install dorgy

From source

Clone the repository when you plan to contribute or work off the bleeding edge:

# Clone the repository
git clone https://github.com/bryaneburr/dorgy.git
cd dorgy

# Sync dependencies (includes dev extras)
uv sync

# Optional: install an editable build
uv pip install -e .

Quickstart

# Inspect available commands
uv run dorgy --help

# Organize a directory in place (dry run first)
uv run dorgy org ./documents --dry-run
uv run dorgy org ./documents

# Monitor a directory and emit JSON batches
uv run dorgy watch ./inbox --json --once

# Undo the latest plan
uv run dorgy undo ./documents --dry-run
uv run dorgy status ./documents --json

CLI Highlights

  • dorgy org – batch ingest files, classify them, and apply structured moves with progress bars, summary/quiet toggles, and JSON payloads.
  • dorgy watch – reuse the same pipeline in a long-running service; guard destructive deletions behind --allow-deletions.
  • dorgy mv – move or rename tracked files while preserving state history.
  • dorgy status / dorgy undo – inspect prior plans, audit history, and restore collections when needed.
  • Per-run search togglesdorgy org/dorgy watch support --with-search/--without-search to control Chromadb indexing so collections stay portable and automation stays in sync.
  • dorgy search – query collection metadata and Chromadb-backed document content. Use --search for semantic similarity (requires the Chromadb index), --contains for substring matches, --init-store to rebuild .dorgy/chroma without re-running org, and --drop-store to disable indexing while keeping state lookups available. JSON output now includes document_id, optional scores, and snippets for automation consumers.
  • Search-aware movesdorgy mv keeps Chromadb metadata aligned with renamed files, so semantic search results stay accurate after refactors.
  • Configuration commandsdorgy config view|set|edit expose the full settings model.

All commands accept --json for machine-readable output and share standardized error payloads so automation can script around them.


Search Workflow

Chromadb indexes live beside collection state under <collection>/.dorgy/chroma with a manifest at <collection>/.dorgy/search.json. Dorgy now builds search indexes automatically during dorgy org and watch batches; pass --without-search if you need to skip indexing for a run. Query results with:

  • dorgy search <path> --contains "phrase" – performs substring filtering against stored document text while still honouring tag/category/date filters.
  • dorgy search <path> --search "phrase" – issues a semantic similarity lookup via Chromadb embeddings (requires the local index to be initialised).
  • dorgy search <path> --init-store [--contains ...] – regenerates the Chromadb store from existing files/descriptors without re-running org, emitting notes for missing previews.
  • dorgy search <path> --reindex – drops any existing Chromadb data and rebuilds the store in-place using current collection files.
  • dorgy search <path> --drop-store – removes .dorgy/chroma, disables search in state metadata, and falls back to state-only filtering.

Both human-readable and JSON modes surface persistent document_ids, optional similarity scores, and snippets sourced from the Chromadb payload. When the index is unavailable, the CLI emits actionable errors guiding operators to initialize or rebuild the store.


Configuration Essentials

  • The primary config file lives at ~/.dorgy/config.yaml; environment variables follow DORGY__SECTION__KEY.
  • processing governs ingestion behaviour (batch sizes, captioning, concurrency, size limits). processing.process_images is enabled by default to capture multimodal captions stored in .dorgy/vision.json.
  • organization controls renaming and conflict strategies (append number, timestamp, skip) and timestamp preservation. Automatic renaming is disabled by default (organization.rename_files: false) so classification runs remain non-destructive unless you opt in.
  • cli toggles defaults for quiet/summary modes, Rich progress indicators, and move conflict handling (legacy configs may still surface cli.search_default_limit, but new installs should use the search block instead).
  • search governs Chromadb-backed indexing (default result limits, whether org/watch auto-maintain the store, and optional embedding function overrides). Stores live beside state.json under <collection>/.dorgy/chroma; indexing is enabled by default, but you can pass --without-search (or set config values) to skip runs, use dorgy search --init-store to rebuild indexes, and dorgy search --drop-store to remove them safely.
  • Chromadb telemetry is disabled by default because the client is instantiated with anonymized_telemetry=False. If you intentionally want to opt into telemetry, override the setting (for example by exporting CHROMADB_TELEMETRY_ENABLED=1 before running commands).
  • Watch services share the organization pipeline and respect processing.watch.allow_deletions unless --allow-deletions is passed.
  • LLM models are configured through the llm block. The default target is openai/gpt-5; provide any LiteLLM-compatible identifier (for example openai/gpt-4o-mini or openrouter/gpt-4o-mini:free) via llm.model, supply llm.api_key/llm.api_base_url when required, and set DORGY_USE_FALLBACKS=1 only when explicitly exercising heuristic classifiers in development.

LLM Model Configuration

Configure language models through the llm block using uv run dorgy config set llm.<field> <value> or by editing ~/.dorgy/config.yaml. Supply the exact LiteLLM/DSPy model string (<provider>/<model>[:variant]) via llm.model. The CLI also respects environment variables such as DORGY__LLM__MODEL, DORGY__LLM__API_KEY, and DORGY__LLM__API_BASE_URL.

Common configurations (substitute your own model identifiers and credentials as needed):

  • OpenAI

    uv run dorgy config set llm.model openai/gpt-4o
    uv run dorgy config set llm.api_key "$OPENAI_API_KEY"
    

    YAML equivalent:

    llm:
      model: openai/gpt-4o
      api_key: sk-...
    
  • Anthropic

    uv run dorgy config set llm.model anthropic/claude-3-5-sonnet-20240620
    uv run dorgy config set llm.api_key "$ANTHROPIC_API_KEY"
    
  • xAI (Grok) via OpenRouter

    uv run dorgy config set llm.model openrouter/grok-1
    uv run dorgy config set llm.api_key "$OPENROUTER_API_KEY"
    
  • Google Gemini

    uv run dorgy config set llm.model google/gemini-1.5-pro
    uv run dorgy config set llm.api_key "$GOOGLE_API_KEY"
    
  • Local / Custom Gateway

    uv run dorgy config set llm.model ollama/llama3
    uv run dorgy config set llm.api_base_url http://localhost:11434/v1
    

    When llm.api_base_url is set (e.g., Ollama, LM Studio, vLLM, or self-hosted gateways), dorgy sends requests directly to that endpoint and skips API-key enforcement.


Automation & Release Tasks

We ship an Invoke task collection that wraps the uv toolchain so day-to-day automation stays consistent:

  • uv run invoke sync – install dependencies (dev extras by default).
  • uv run invoke tests / uv run invoke lint / uv run invoke ci – mirror the CI workflow locally.
  • uv run invoke release – bump the version, commit pyproject.toml/uv.lock, rebuild artifacts, publish, and tag.
  • uv run invoke release --dry-run --push-tag – preview the full release plan without modifying anything.
  • uv run invoke tag-version – create (and optionally push) an annotated git tag.

Release Workflow

  1. Ensure the working tree is clean and CI passes locally:
    uv run invoke ci
    
  2. Perform a dry run when validating credentials or reviewing the plan:
    uv run invoke release --dry-run --push-tag --token "$TEST_PYPI_TOKEN" \
        --index-url https://test.pypi.org/legacy/ --skip-existing
    
  3. Publish to PyPI (commits the version bump, pushes the tag when requested):
    export PYPI_TOKEN="pypi-AgEN..."
    uv run invoke release --push-tag --token "$PYPI_TOKEN"
    
    Use --index-url/--skip-existing for TestPyPI dry runs, or --tag-prefix "" if you prefer unprefixed tags.
  4. Update SPEC.md/notes/STATUS.md with release notes, open a PR from feature/release-prep, and merge once GitHub Actions succeeds.

Roadmap

  • SPEC.md tracks implementation phases and current status (Phase 9 – Distribution & Release Prep is underway; Phase 5.8 – Vision-Enriched Classification recently wrapped).
  • notes/STATUS.md logs day-to-day progress, blockers, and next actions.
  • Module-specific coordination details live in src/dorgy/**/AGENTS.md.

Upcoming milestones include the image-only OCR follow-up for vision pipelines, CLI ergonomics work in Phase 6, and continued distribution/release automation.


Contributing

We welcome issues and pull requests while the project matures. A few guidelines keep things predictable:

  • Environment – install dependencies with uv sync and run commands via uv run ....
  • Pre-commit – install hooks (uv run pre-commit install) and run uv run pre-commit run --all-files before pushing.
  • Branching – create feature branches named feature/<scope> and keep them rebased until ready for review.
  • Testing – the default pre-commit stack runs Ruff (lint/format/imports), MyPy, and uv run pytest.
  • Documentation – follow Google-style docstrings and update relevant AGENTS.md files when adding automation-facing behaviours or integrations.
  • Coordination – flag changes that impact the CLI contract, watch automation, or external integrations directly in the associated module AGENTS.md.

For release-specific work, use the branch/review workflow documented above and ensure TestPyPI validation is complete before tagging.


Community & Support

  • File issues and feature requests at github.com/bryaneburr/dorgy/issues.
  • Join the discussion via GitHub Discussions (coming soon) or reach out through issues for contributor onboarding.
  • If you build automations on top of dorgy, let us know—roadmap priorities are community driven.

Authors

  • Codex (ChatGPT-5 based agent) – primary implementation and tactical design across ingestion, classification, organization, and tooling.
  • Bryan E. Burr (@bryaneburr) – supervisor, editor, and maintainer steering project direction and release planning.

License

Released under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dorgy-0.4.2.tar.gz (961.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dorgy-0.4.2-py3-none-any.whl (133.7 kB view details)

Uploaded Python 3

File details

Details for the file dorgy-0.4.2.tar.gz.

File metadata

  • Download URL: dorgy-0.4.2.tar.gz
  • Upload date:
  • Size: 961.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.19

File hashes

Hashes for dorgy-0.4.2.tar.gz
Algorithm Hash digest
SHA256 132db7b3a01c22e27669386ca4521bdc7af9606fe99f7532bf8869588d9fc589
MD5 c672800f2b8e60b93dfd6b59be5b2b10
BLAKE2b-256 ce9fc33796b1e66f5277c5da81ebdf6b7e25a1491be7e2b0c6da1f0a85ae81ea

See more details on using hashes here.

File details

Details for the file dorgy-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: dorgy-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 133.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.19

File hashes

Hashes for dorgy-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 545eecdd76eabd97d6e3ce0e2553cf9c5f323ef3f841cb8e9d541aecf775508a
MD5 b6a3c2393d085b54badf13467c84b0a5
BLAKE2b-256 3a8de2106af7af11c43300c2d208425786384e9ae93a88da6bc9fab420d64e68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page