A documentation processing tool for identifying and correcting errors in markdown, reStructuredText, and plain text files
Project description
Dr. Doc
Dr. Doc is currently a toy but useful project to improve documentation files by identifying and correcting grammar, formatting errors, and broken links using large language models. Currently, this project uses the Argo API, which provides access to OpenAI models for Argonne researchers. Future updates will include support for other models, structured output to simplify prompts, as well as GitHub and GitLab actions for continuous integration.
Please note that the current version of gpt4o used by Argo is limited to 4096 output tokens. Therefore, the largest files you can process with Argo/gpt4o are around 15 KB.
Features
- Fixes grammar, formatting, and link issues in documentation files.
- Supports Markdown (
.md), reStructuredText (.rst), and plain text formats. - Provides a detailed explanation of changes made to the documentation.
- Optional Git integration to commit changes directly.
Requirements
- Argo API credentials (
ARGO_URLandARGO_USERmust be defined in the environment) - Python 3.8 or higher
requests>=2.25.0
Setup
-
Clone the repository and navigate to the project directory:
git clone <repository-url> cd drdoc
-
Define the required environment variables for the Argo API:
export ARGO_URL=<your-argo-url> export ARGO_USER=<your-argo-user>
-
(Optional) Install the package:
pip install -e .
Usage
If you have installed Dr. Doc with pip as described above, you can run it with drdoc (drdoc -h for help menu). If not, you need to run the Python script with python <path_to_drdoc>/drdoc.py.
drdoc <doc_path> [options]
or without installation:
python <path_to_drdoc>/drdoc.py <doc_path> [options]
Command Line Options
doc_path: (Required) Path to the documentation file or directory containing files to process.--argo_url: (Optional) Argo API endpoint URL (default: value ofARGO_URLenvironment variable).--argo_user: (Optional) Argo API user (default: value ofARGO_USERenvironment variable).--model: (Optional) Model to use (e.g.,gpt4o,gpt35; default:gpt4o).--temperature: (Optional) Sampling temperature for the model (default: 0.1).--top_p: (Optional) Top-p sampling for the model (default: 0.9).--max_tokens: (Optional) Max tokens for the prompt (default: 4096).--max_completion_tokens: (Optional) Max tokens for the completion (default: 16000).--inplace: (Optional) Modify the original file in place instead of creating a new one.--commit: (Optional) Commit changes to Git with the explanation as the commit message.--format: (Optional) Format of the documentation file (md,rst, ortxt; default:md).
Example Commands
Process a Markdown file:
drdoc doc/sample.md
This would create doc/sample_fixed.md.
Process all ReStructuredText documentation files (*.rst files) in the doc directory:
drdoc doc/ --format rst
Process a file and modify it in-place:
drdoc doc/sample.md --inplace
Process a file in place and commit changes (you need to run it inside the git project):
cd <your_git_repo>
drdoc README.md --inplace --commit
TODO
- Add support for LangChain to use other models.
- Optionally ask for confirmation for each change.
- Enable using ALCF inference endpoints.
- Add GitHub and GitLab actions to process documentation files for CI.
- Improve the prompts and user experience with feedback.
Contributing
We welcome contributions to improve Dr. Doc! Please open an issue or submit a pull request.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file drdoc-0.1.0.tar.gz.
File metadata
- Download URL: drdoc-0.1.0.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cff6b566fcb221c1e9e74599b8658f5e70e043dfee808ba5d6320a3a795d0e62
|
|
| MD5 |
58c9299c3deabae4eed717185eb17365
|
|
| BLAKE2b-256 |
07270a71378607668bb7308d7afe4ee40ca9ead72335aafb100d606d8e7e18de
|
File details
Details for the file drdoc-0.1.0-py3-none-any.whl.
File metadata
- Download URL: drdoc-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46d4cd75d3849f0ef582a4fcb822d483df224408fe18809f0abe4c2f1bc2bb47
|
|
| MD5 |
a7220480ef6a09552db956bffb561128
|
|
| BLAKE2b-256 |
e52cd2dcc116632c1aa2c80f91896eaa9161e993970daed418979447eb471297
|