Skip to main content

MCP server universal untuk CRUD dokumen lintas format (text, JSON, YAML, CSV, XML, DOCX, XLSX, PPTX, PDF) dengan versioning, sandboxed multi-root, search, batch, dan semantic search opsional.

Project description

Dokumen-Pintar

Universal MCP server for cross-format document CRUD, lint, and template authoring

Read, write, search, lint, and author text, Office, and PDF files from any AI agent that supports the Model Context Protocol.

PyPI Python 3.10+ License: MIT Tests: 1403 passed Coverage: 100%


Features

  • Multi-root Sandbox — Define multiple workspace roots with per-root writable control. All paths outside the sandbox are rejected.
  • 10 Formats — Plain text, Markdown, LaTeX, JSON / YAML, CSV / TSV, XML / SVG, DOCX, XLSX, PPTX, PDF, plus image EXIF.
  • 62 MCP Tools — File & content CRUD, structured access, batch operations, search, versioning, metadata, authoring, image extraction, sections, templates, TOC, bibliography, document compare, lint — all exposed as callable tools.
  • Automatic Versioning — Copy-on-write snapshots on every write operation. Undo, diff, restore, and purge anytime.
  • Structured Access — JSONPath for JSON / YAML (incl. list indices), XPath for XML, cell / range / sheet for XLSX, paragraph / paragraph_runs / table cells for DOCX, slide for PPTX, page for PDF.
  • Authoring — Generate DOCX or PDF from a JSON spec or Markdown, render Jinja2-style DOCX templates, convert DOCX → Markdown.
  • Document Lint — Pluggable rule registry with built-in presets (default, academic_id, academic_id_kp, academic_id_skripsi) for Indonesian academic documents.
  • Indonesian Stemming (optional) — Sastrawi-based morphological matching so mengatakan, berkata, perkataan collapse during search.
  • Semantic Search (optional) — Vector search powered by sentence-transformers; enable via config.
  • Audit Trail — Every mutation logged to JSONL with timestamp and operation details.
  • 2 Transports — stdio (Claude Desktop, Cursor, VS Code, Windsurf) and HTTP / SSE.

Supported Formats

Format Read Write Structured Query Search
Plain text / Markdown / LaTeX Y Y - Y
JSON / JSONC / JSON5 Y Y JSONPath $.key, $.array[N] Y
YAML Y Y JSONPath $.key Y
CSV / TSV Y Y row:N col:NAME cell:row:N,col:NAME Y
XML / SVG Y Y XPath //node Y
DOCX Y Y paragraph:N paragraph_runs:N table:N!A1 Y
XLSX Y Y cell:Sheet!A1 range: sheet: Y
PPTX Y Y slide:N slide_title:N Y
PDF Y - page:N outline metadata Y
Images (JPG / PNG / TIFF / WEBP) Y Y meta EXIF tags -

Quick Start

1. Install

pip install dokumen-pintar

With Indonesian stemming:

pip install dokumen-pintar[indonesian]

With semantic search:

pip install dokumen-pintar[semantic]

2. Create a Config

dokumen-pintar-init

Or create one manually:

{
  "roots": [
    { "name": "documents", "path": "~/Documents", "writable": true },
    { "name": "projects",  "path": "~/Projects",  "writable": true }
  ]
}

Six pre-tuned profiles ship in docs/profiles/ — copy personal.json for daily desktop use, research.json for thesis libraries, or team-server.json for HTTP deployment.

3. Run

dokumen-pintar --config dokumen-pintar.config.json

4. Connect to an AI Client

Claude Desktop — Add to claude_desktop_config.json:

{
  "mcpServers": {
    "dokumen-pintar": {
      "command": "dokumen-pintar",
      "args": ["--config", "/path/to/dokumen-pintar.config.json"]
    }
  }
}

Cursor / VS Code / Windsurf — Use the same stdio transport. Point your IDE's MCP settings to the dokumen-pintar command and config path.


Tools Overview

62 MCP tools organised by category:

Category Tools
Workspace workspace_list_roots · workspace_stat · workspace_tree · workspace_diagnose
File CRUD file_create · file_delete · file_rename · file_copy · file_move
Content content_read · content_write · content_append · content_insert · content_replace · content_delete_range · content_patch · content_diff
Structured struct_get · struct_set · struct_delete · struct_meta
Metadata metadata_read · metadata_write · metadata_delete · metadata_strip · metadata_read_batch
Authoring validate_spec · compose_docx · compose_pdf · compose_from_markdown · compose_to_markdown
Sections section_extract · section_merge
Images image_list · image_extract · image_extract_all · image_replace
Templates template_list · template_install · template_render · template_render_named
TOC & Bibliography toc_generate · bibliography_check · bibliography_format
Compare & Lint document_compare · document_lint · document_lint_fix
Batch batch_rename · batch_replace_content · batch_replace_structured · batch_delete
Search search_filename · search_content · search_in_format
Versioning version_list · version_diff · version_restore · version_undo · version_purge
Semantic* search_semantic · semantic_index_path · semantic_stats

*Only registered when semantic_search.enabled = true and [semantic] extras are installed.

Bundled templates

  • academic_id/kp_basic — generic Indonesian Kerja Praktik report skeleton (cover, lembar pengesahan, kata pengantar, BAB I/II, log book, daftar pustaka).

Documentation

Full docs on GitHub: github.com/firdausmntp/Dokumen-Pintar

  • USAGE.md — Workspace URIs, every tool with JSON examples, recipes
  • CONFIG.md — All config fields with types, defaults, and notes
  • TOOLS.md — Full reference for all 62 tools
  • ARCHITECTURE.md — Module map, request flow, versioning, safety
  • BENCHMARK.md — Performance baselines and methodology
  • profiles/ — Six pre-tuned config presets
  • AGENTS.md — Contributor guide

License

MIT — 2026 firdausmntp

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dokumen_pintar-1.1.0.tar.gz (262.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dokumen_pintar-1.1.0-py3-none-any.whl (188.9 kB view details)

Uploaded Python 3

File details

Details for the file dokumen_pintar-1.1.0.tar.gz.

File metadata

  • Download URL: dokumen_pintar-1.1.0.tar.gz
  • Upload date:
  • Size: 262.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dokumen_pintar-1.1.0.tar.gz
Algorithm Hash digest
SHA256 ee0b19481e7709ba742849f4be7cf3262e5add898a5ea45afa8a3f78a873a033
MD5 1fa5a70aecb659407bc870325d759dd9
BLAKE2b-256 9ef85b17605248db9ed15e1c72df9a0d6915893e6d90294cba3eee5d7340c689

See more details on using hashes here.

Provenance

The following attestation bundles were made for dokumen_pintar-1.1.0.tar.gz:

Publisher: publish.yml on firdausmntp/Dokumen-Pintar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dokumen_pintar-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: dokumen_pintar-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 188.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dokumen_pintar-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 06171eb87a0ff03b5c1f856443c9f93a5c0dc74adb96e6012d6240a234ea1813
MD5 5036b7177334b6d639bc109a105b277f
BLAKE2b-256 663be1a6e28c962d0914fc33860bdc2e984ffd0fb1076df0035210396f1528be

See more details on using hashes here.

Provenance

The following attestation bundles were made for dokumen_pintar-1.1.0-py3-none-any.whl:

Publisher: publish.yml on firdausmntp/Dokumen-Pintar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page