Skip to main content

MCP server that performs tree-sitter-based source code analysis.

Project description

deco-assaying

MCP server that performs tree-sitter-based source code analysis. Designed to feed structural information about a repo (symbols, imports, references, chunks, metrics) into a downstream consumer that maintains a knowledge base over many codebases.

Run

Pick the deployment that matches your situation:

Mode Command When to use
Daemon — pinned install uv tool install You'll run it across many sessions; want it on $PATH.
Daemon — ephemeral uvx One-off run; don't want anything left on disk.
Container docker run from GHCR Ops deployment, compose stack, or want filesystem isolation.
From source uv run Hacking on the server itself.

Prereqs

  • uv-based modes need uv and git. uv ships a portable Python 3.13, so no system Python install required.

    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  • Docker mode needs docker (or compatible). The image bundles Python 3.13 and git; nothing else on the host.

1. Daemon — uv tool install (PyPI)

Installs the deco-assaying command on your $PATH, isolated in its own venv that uv manages.

uv tool install deco-assaying
deco-assaying                     # starts the server

Update later with uv tool upgrade deco-assaying; remove with uv tool uninstall deco-assaying.

2. Daemon — uvx (no install)

uvx resolves the package into a temporary venv and runs the entry point in one shot. Nothing persists between runs.

uvx deco-assaying                       # latest release
uvx deco-assaying@0.1.0                 # pin a specific version

Good for kicking the tires or running on a CI box where you don't want to touch ~/.local/share/uv.

3. Docker / GHCR

Pull and run the published multi-arch image (linux/amd64 + linux/arm64):

docker pull ghcr.io/garycoding/deco-assaying:latest
docker run --rm \
  -p 35832:35832 \
  -v deco-assaying-data:/data \
  ghcr.io/garycoding/deco-assaying:latest

Pin a specific version with a tag — :0.1.0, :0.1, or :latest (see the Releases page on GHCR for the available tags).

Or with compose (see docker-compose.yml — pulls the image, mounts a named volume at /data, restarts on failure):

docker compose up -d

The named volume deco-assaying-data persists job outputs across container restarts. To pass auth tokens for private repos:

docker run --rm \
  -e GITHUB_TOKEN=ghp_... \
  -e GITLAB_TOKEN=glpat-... \
  -p 35832:35832 \
  -v deco-assaying-data:/data \
  ghcr.io/garycoding/deco-assaying:latest

4. From source

git clone https://github.com/garycoding/deco-assaying.git
cd deco-assaying
uv sync
uv run python -m deco_assaying

Endpoints

In every mode the server listens on PORT (default 35832) with:

  • POST /sse — MCP Streamable HTTP transport.
  • GET /health — liveness probe.
  • GET /admin/* — read-only JSON ops endpoints.
  • GET /outputs/{job_id}/... — read-only download API for job artifacts.
  • GET /docs — OpenAPI / Swagger UI for the HTTP API.

Sanity-check it's up:

curl http://127.0.0.1:35832/health

MCP tools

  • analyze_file(content, filename?, language?, options?) — parse a single file passed inline; returns structural JSON.
  • index_repo(source, options?) — start a job that indexes a whole repo and writes per-file artifacts plus a manifest. The server allocates a fresh output dir under OUTPUT_ROOT and returns { job_id, output_path }. source can be a local directory, a GitHub URL (https://github.com/owner/repo), or a GitLab URL (https://gitlab.com/owner/repo, including nested groups https://gitlab.com/group/sub/repo). Pass git_ref to pick a specific branch / tag / sha.
  • get_job_status(job_id) — poll a running or completed job.
  • cancel_job(job_id) — cooperative cancel.
  • list_supported_languages() — capability discovery.
  • detect_language(path) — extension/shebang detection helper.

Output download API

Every job's artifacts land under OUTPUT_ROOT/{job_id}/. A consumer sharing the volume can read them off disk; one without a shared volume can pull them over HTTP:

Endpoint Returns
GET /outputs/{job_id} manifest.json (convenience).
GET /outputs/{job_id}/manifest.json Repo-level rollup.
GET /outputs/{job_id}/tree.json Full path inventory (analyzed + skipped).
GET /outputs/{job_id}/symbols.json Global qualified-name index.
GET /outputs/{job_id}/languages.json Per-language counts.
GET /outputs/{job_id}/errors.json Parse errors + skipped files.
GET /outputs/{job_id}/log.jsonl?from_offset=N Tail the job's log.
GET /outputs/{job_id}/ls?path=&recursive= Directory listing.
GET /outputs/{job_id}/file/{path} Single file, or a streaming ZIP if any path segment contains *?[. E.g. /file/files/**/*.py.json.
GET /outputs/{job_id}/zip?path=&match= Explicit-bulk-zip alias. Default = whole job dir.
DELETE /outputs/{job_id} Remove the dir + drop the table entry. 409 if still running.
GET /admin/outputs List every job_id present on disk under OUTPUT_ROOT.

Path traversal (.., absolute paths, escape via symlink) is rejected.

Resource requirements

When index_repo runs against a GitHub URL, the server uses a partial clone with bin-packed batched fetching. That gives a small, predictable disk footprint regardless of how large the source repo is:

  • Source-side scratch space: ~100 MB peak in output_path/.source/ during analysis. The server fetches each batch of source files (totaling ≤ max_partial_clone_bytes, default 100 MB), analyzes them, deletes them from the working tree, then fetches the next batch. Even on a multi-GB monorepo, peak local-disk used for source content stays at ~100 MB. Tunable via the max_partial_clone_bytes option on index_repo.

  • Output artifacts: roughly 1-2× the analyzed-source size. Each analyzed file produces a JSON artifact under output_path/files/ containing symbols, imports, references, chunks, etc. These persist past the job — the consumer reads them incrementally — and are the largest durable footprint. The retention sweeper auto-purges job dirs older than OUTPUT_EXPIRY_DAYS.

  • Memory: modest. A ProcessPoolExecutor runs roughly 2 × CPU count workers, each holding one file's bytes plus its tree-sitter parse tree in memory. Source files are capped at max_file_bytes (default 2 MB), so worst case is ~16-32 MB of resident source + parse trees on a typical 8-core box.

  • Network: one provider-API pre-flight to plan the batches (GitHub Trees REST or GitLab REST tree + GraphQL; free for public repos, set GITHUB_TOKEN / GITLAB_TOKEN for higher quotas and private-repo access), plus one git fetch-pack round-trip per batch. For a typical sub-100 MB repo that's two HTTP hits total.

For local-path sources nothing is fetched and nothing is cloned — the only on-disk cost is the output artifacts.

Configuration

Env var Default (daemon) Default (container) Purpose
PORT 35832 35832 HTTP listen port.
HOST 0.0.0.0 0.0.0.0 HTTP bind address.
OUTPUT_ROOT ./output /data Where the server writes job dirs.
OUTPUT_EXPIRY_DAYS 7 7 Auto-purge job dirs older than this. 0 disables.
JOB_HISTORY_MAX 100 100 In-memory job-table cap.
DEFAULT_MAX_FILE_BYTES 2097152 2097152 Default per-file size cap.
DEFAULT_CHUNK_MAX_TOKENS 800 800 Default chunk size for cAST chunking.
GITHUB_TOKEN unset unset Optional, raises GitHub Trees API quota from 60 to 5000 req/hr and unlocks private repos.
GITLAB_TOKEN unset unset Optional, used for GitLab API auth and private-repo access.

Releasing

Tag-driven. Bump version in pyproject.toml, then:

git tag vX.Y.Z && git push --tags

The Release workflow builds a multi-arch image (linux/amd64 + linux/arm64) and pushes it to GHCR with vX.Y.Z, vX.Y, and latest tags, in parallel with publishing wheel + sdist to PyPI via trusted publishing. ~3-5 minutes end-to-end.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deco_assaying-0.1.2.tar.gz (141.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deco_assaying-0.1.2-py3-none-any.whl (91.2 kB view details)

Uploaded Python 3

File details

Details for the file deco_assaying-0.1.2.tar.gz.

File metadata

  • Download URL: deco_assaying-0.1.2.tar.gz
  • Upload date:
  • Size: 141.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for deco_assaying-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f9662d9ee280705e5ea93465f803bd67373ad38090deb1d40349bfa390c8f4dd
MD5 d667380aab8a78da07e409aaed21b400
BLAKE2b-256 d391674f8423d6b3b168b29fbac3b53445698c8bb1d6824f9fe84a8641bc0cec

See more details on using hashes here.

Provenance

The following attestation bundles were made for deco_assaying-0.1.2.tar.gz:

Publisher: release.yml on garycoding/deco-assaying

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deco_assaying-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: deco_assaying-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 91.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for deco_assaying-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4cb0bc89ab8b28e2f7a0b89fe77747604176d5bc34f174ab85424d6581bc0177
MD5 153f4598b19015217e4219e361387f16
BLAKE2b-256 a71c021f2b0ce51cf33e4be7e2c4af7ff934185b9252b7b352b8f10263c1770f

See more details on using hashes here.

Provenance

The following attestation bundles were made for deco_assaying-0.1.2-py3-none-any.whl:

Publisher: release.yml on garycoding/deco-assaying

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page