Skip to main content

A documentation processing tool for identifying and correcting errors in markdown, reStructuredText, and plain text files

Project description

Dr. Doc

Dr. Doc is currently a toy but useful project to improve documentation files by identifying and correcting grammar, formatting errors, and broken links using large language models. Currently, this project uses the Argo API, which provides access to OpenAI models for Argonne researchers. Future updates will include support for other models, structured output to simplify prompts, as well as GitHub and GitLab actions for continuous integration.

Please note that the current version of gpt4o used by Argo is limited to 4096 output tokens. Therefore, the largest files you can process with Argo/gpt4o are around 15 KB.

Features

  • Fixes grammar, formatting, and link issues in documentation files.
  • Supports Markdown (.md), reStructuredText (.rst), and plain text formats.
  • Provides a detailed explanation of changes made to the documentation.
  • Optional Git integration to commit changes directly.

Requirements

  • Argo API credentials (ARGO_URL and ARGO_USER must be defined in the environment)
  • Python 3.8 or higher
  • requests>=2.25.0

Setup

  1. Clone the repository and navigate to the project directory:

    git clone <repository-url>
    cd drdoc
    
  2. Define the required environment variables for the Argo API:

    export ARGO_URL=<your-argo-url>
    export ARGO_USER=<your-argo-user>
    
  3. (Optional) Install the package:

    pip install -e .
    

Usage

If you have installed Dr. Doc with pip as described above, you can run it with drdoc (drdoc -h for help menu). If not, you need to run the Python script with python <path_to_drdoc>/drdoc.py.

drdoc <doc_path> [options]

or without installation:

python <path_to_drdoc>/drdoc.py <doc_path> [options]

Command Line Options

  • doc_path: (Required) Path to the documentation file or directory containing files to process.
  • --argo_url: (Optional) Argo API endpoint URL (default: value of ARGO_URL environment variable).
  • --argo_user: (Optional) Argo API user (default: value of ARGO_USER environment variable).
  • --model: (Optional) Model to use (e.g., gpt4o, gpt35; default: gpt4o).
  • --temperature: (Optional) Sampling temperature for the model (default: 0.1).
  • --top_p: (Optional) Top-p sampling for the model (default: 0.9).
  • --max_tokens: (Optional) Max tokens for the prompt (default: 4096).
  • --max_completion_tokens: (Optional) Max tokens for the completion (default: 16000).
  • --inplace: (Optional) Modify the original file in place instead of creating a new one.
  • --commit: (Optional) Commit changes to Git with the explanation as the commit message.
  • --format: (Optional) Format of the documentation file (md, rst, or txt; default: md).

Example Commands

Process a Markdown file:

drdoc doc/sample.md

This would create doc/sample_fixed.md.

Process all ReStructuredText documentation files (*.rst files) in the doc directory:

drdoc doc/ --format rst

Process a file and modify it in-place:

drdoc doc/sample.md --inplace

Process a file in place and commit changes (you need to run it inside the git project):

cd <your_git_repo>
drdoc README.md --inplace --commit

TODO

  • Add support for LangChain to use other models.
  • Optionally ask for confirmation for each change.
  • Enable using ALCF inference endpoints.
  • Add GitHub and GitLab actions to process documentation files for CI.
  • Improve the prompts and user experience with feedback.

Contributing

We welcome contributions to improve Dr. Doc! Please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drdoc-0.1.1.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

drdoc-0.1.1-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file drdoc-0.1.1.tar.gz.

File metadata

  • Download URL: drdoc-0.1.1.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for drdoc-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1eb019da03c9b43ea9cf0bdfd2b6cadeffcc8aa9d7f966983ac8749eb0619f94
MD5 32e54b38e6e89d9f66b450fb4acab43b
BLAKE2b-256 911095a0516a2f7c1c02de2269aa467c7fc4bd36238fe88e33fe6224d5e273b3

See more details on using hashes here.

File details

Details for the file drdoc-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: drdoc-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for drdoc-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 63d52390f1bd92217d21e5bc433718ca3fee4b03776ce7908a71d35f8c99d031
MD5 ec262604a71207f0354cc293b68accef
BLAKE2b-256 5ed3f1bf28d5904397a17b203335ef0bfd8ba0334ac554185e89135e30645888

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page