Skip to main content

A documentation processing tool for identifying and correcting errors in markdown, reStructuredText, and plain text files

Project description

Dr. Doc

Dr. Doc is currently a toy but useful project to improve documentation files by identifying and correcting grammar, formatting errors, and broken links using large language models. Currently, this project uses the Argo API, which provides access to OpenAI models for Argonne researchers. Future updates will include support for other models, structured output to simplify prompts, as well as GitHub and GitLab actions for continuous integration.

Please note that the current version of gpt4o used by Argo is limited to 4096 output tokens. Therefore, the largest files you can process with Argo/gpt4o are around 15 KB.

Features

  • Fixes grammar, formatting, and link issues in documentation files.
  • Supports Markdown (.md), reStructuredText (.rst), and plain text formats.
  • Provides a detailed explanation of changes made to the documentation.
  • Optional Git integration to commit changes directly.

Requirements

  • Argo API credentials (ARGO_URL and ARGO_USER must be defined in the environment)
  • Python 3.8 or higher
  • requests>=2.25.0

Setup

  1. Clone the repository and navigate to the project directory:

    git clone <repository-url>
    cd drdoc
    
  2. Define the required environment variables for the Argo API:

    export ARGO_URL=<your-argo-url>
    export ARGO_USER=<your-argo-user>
    
  3. (Optional) Install the package:

    pip install -e .
    

Usage

If you have installed Dr. Doc with pip as described above, you can run it with drdoc (drdoc -h for help menu). If not, you need to run the Python script with python <path_to_drdoc>/drdoc.py.

drdoc <doc_path> [options]

or without installation:

python <path_to_drdoc>/drdoc.py <doc_path> [options]

Command Line Options

  • doc_path: (Required) Path to the documentation file or directory containing files to process.
  • --argo_url: (Optional) Argo API endpoint URL (default: value of ARGO_URL environment variable).
  • --argo_user: (Optional) Argo API user (default: value of ARGO_USER environment variable).
  • --model: (Optional) Model to use (e.g., gpt4o, gpt35; default: gpt4o).
  • --temperature: (Optional) Sampling temperature for the model (default: 0.1).
  • --top_p: (Optional) Top-p sampling for the model (default: 0.9).
  • --max_tokens: (Optional) Max tokens for the prompt (default: 4096).
  • --max_completion_tokens: (Optional) Max tokens for the completion (default: 16000).
  • --inplace: (Optional) Modify the original file in place instead of creating a new one.
  • --commit: (Optional) Commit changes to Git with the explanation as the commit message.
  • --format: (Optional) Format of the documentation file (md, rst, or txt; default: md).

Example Commands

Process a Markdown file:

drdoc doc/sample.md

This would create doc/sample_fixed.md.

Process all ReStructuredText documentation files (*.rst files) in the doc directory:

drdoc doc/ --format rst

Process a file and modify it in-place:

drdoc doc/sample.md --inplace

Process a file in place and commit changes (you need to run it inside the git project):

cd <your_git_repo>
drdoc README.md --inplace --commit

TODO

  • Add support for LangChain to use other models.
  • Optionally ask for confirmation for each change.
  • Enable using ALCF inference endpoints.
  • Add GitHub and GitLab actions to process documentation files for CI.
  • Improve the prompts and user experience with feedback.

Contributing

We welcome contributions to improve Dr. Doc! Please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drdoc-0.1.0.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

drdoc-0.1.0-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file drdoc-0.1.0.tar.gz.

File metadata

  • Download URL: drdoc-0.1.0.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for drdoc-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cff6b566fcb221c1e9e74599b8658f5e70e043dfee808ba5d6320a3a795d0e62
MD5 58c9299c3deabae4eed717185eb17365
BLAKE2b-256 07270a71378607668bb7308d7afe4ee40ca9ead72335aafb100d606d8e7e18de

See more details on using hashes here.

File details

Details for the file drdoc-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: drdoc-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for drdoc-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 46d4cd75d3849f0ef582a4fcb822d483df224408fe18809f0abe4c2f1bc2bb47
MD5 a7220480ef6a09552db956bffb561128
BLAKE2b-256 e52cd2dcc116632c1aa2c80f91896eaa9161e993970daed418979447eb471297

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page