Task-to-code retrieval — MCP server and web UI

These details have not been verified by PyPI

Project links

Project description

simargl

Semantic Index: Map Artifacts, Retrieve from Git Log

Task-to-code retrieval. Given a description of a change, finds which files and modules are likely affected — using semantic similarity over historical tasks or commits.

Exposes an MCP server (stdio transport) compatible with any MCP-aware agent system.

Install

pip install simargl

The default embedding model (bge-small, ~130MB) is downloaded automatically during install. If that fails or you installed offline, download it manually:

simargl download

Step 1 — Index your project

You need two indexes: one for code files, one for tasks (or commits if no tracker).

# Index code files (walks the repo, chunks text files, stores vectors)
simargl index files C:/repos/sonar

# Index tasks from SQLite (auto-detects tasks vs commits)
simargl index units C:/data/sonar.db

# Check what was indexed
simargl status

Both indexes land in .simargl/default/ relative to your working directory.

If you want to index only part of the repo, or keep two sub-projects separate, use --project:

simargl index files C:/repos/sonar/server --project server
simargl index units C:/data/sonar.db --project server
simargl status --project server

Each project stores its vectors in .simargl/{project}/ independently.

--last N — index only recent history

For large repositories where development is continuous, the most recent tasks or commits describe the current codebase better than the full history. Use --last to limit the units index to the N most recent tasks (or commits), ordered by commit date:

simargl index units C:/data/sonar.db --last 1000
simargl index units C:/data/kafka.db --last 500 --mode commits

This is especially useful for aggr search mode, which averages unit vectors per file — recent units produce a more focused signal than a full multi-year history.

According to our research, --last 1000 produces better retrieval quality than either --last 100 (too narrow) or --last 10000 (too much noise from distant history). 1000 recent tasks is a good default for actively developed projects.

(experiment data and article links will be added here after publication)

Use --model bge-large if you need higher accuracy (uses more RAM and disk).

Available model keys:

# sentence-transformers — runs locally, CPU or GPU, downloads model on first use
--model bge-small                                  # default, 384 dims
--model bge-large                                  # better quality, 1024 dims

# Ollama — no model download, uses whatever is already pulled in Ollama
--model ollama://nomic-embed-text                  # localhost:11434
--model ollama://nomic-embed-text@192.168.1.10     # remote machine

# OpenAI-compatible local server — LM Studio, llama.cpp, LiteLLM, Jan, Koboldcpp
--model openai://localhost:1234/nomic-embed-text   # LM Studio
--model openai://localhost:8080/all-minilm         # llama.cpp server
--model openai://localhost:4000/nomic-embed-text   # LiteLLM

openai:// means OpenAI-compatible API — no cloud, no API key, runs entirely locally.

Step 2 — Connect to 1bcoder

Launch 1bcoder from your project directory — the MCP subprocess inherits that working directory, so .simargl resolves correctly with no extra flags.

cd C:/Project/my-app
1bcoder

If you indexed with the default project_id (no --project flag):

/mcp connect simargl simargl-mcp

If you indexed with a custom project_id:

/mcp connect simargl simargl-mcp --project-id bookcrossing

The first connect takes 30–60 s while the embedding model loads — this is normal. Tool calls are instant after that.

To connect to a project in a different directory without restarting 1bcoder, use --cwd:

/mcp connect simargl simargl-mcp --cwd C:/Project/other-app --project-id myproject

Check it connected:

/mcp tools simargl

You should see: find, index_files, index_units, status, vacuum, embedding, distance.

With --project-id set at server startup, you never need to pass project_id in tool calls.

Step 3 — Index

1bcoder MCP call syntax is /mcp call server/tool {json_args}.

/mcp call simargl/index_files {"path": "C:/Project/my-app"}

If you have a task SQLite (Jira/GitHub export):

/mcp call simargl/index_units {"db_path": "C:/data/myproject.db"}

Check what was indexed:

/mcp call simargl/status {}

Step 4 — Search

The call syntax is always /mcp call simargl/tool {json}.

Find files related to a description

/mcp call simargl/find {"query": "make author field longer in the book class"}

Default mode is task + sort=rank. If you only indexed files (no task SQLite), use mode=file:

/mcp call simargl/find {"query": "make author field longer in the book class", "mode": "file"}

All parameters

/mcp call simargl/find {
  "query": "make author field longer in the book class",
  "mode": "file",
  "top_n": 10
}

param	values	default
`mode`	`task`, `file`, `aggr`	`task`
`sort`	`rank`, `freq`	`rank`
`top_n`	integer	10
`top_k`	integer	10
`include_diff`	true/false	false
`project_id`	string	`default`
`store_dir`	path	`.simargl`

If you used a custom project_id at index time

/mcp call simargl/find {"query": "add author field", "project_id": "bookcrossing"}

To avoid passing project_id every time, re-index without it (uses default):

/mcp call simargl/index_files {"path": "C:/Project/my-app"}

Typical 1bcoder workflow

# 1. Find files
/mcp call simargl/find {"query": "make author field longer in the book class", "mode": "file"} -> find_result
/var set find_files matches

# 2. Read the most relevant files
/read {{find_files}}

# 3. Ask the model
make the author field longer in the Book class

# 4. Apply
/patch models.py code

Other tools

Check index status

/mcp call simargl/status {}
/mcp call simargl/status {"project_id": "bookcrossing"}

Compute embedding for any text

/mcp call simargl/embedding {"text": "add user authentication to login flow"} -> vector1

Stores the vector as {{vector1}}. Use later with distance.

Measure semantic distance between two things

/mcp call simargl/distance {"source1": "auth.py", "source2": "views.py"}
/mcp call simargl/distance {"source1": "add user auth", "source2": "auth.py"}

Returns cosine similarity (0–1).

Vacuum (reclaim disk after many incremental re-indexes)

/mcp call simargl/vacuum {}

Re-index after code changes

# Incremental (default) — only processes files modified since last run
simargl index files C:/repos/sonar --project sonar

# Full reindex — re-embeds everything regardless of mtime
simargl index files C:/repos/sonar --project sonar --full

Incremental index uses mtime comparison against the previous indexed_at timestamp:

unchanged files → skipped
modified files → old chunks soft-deleted, new chunks appended
deleted files → chunks soft-deleted

Soft-deleted vectors stay in the int8 file until you vacuum. Run vacuum periodically (e.g. after a big refactor) to reclaim disk space:

simargl vacuum --project sonar
# or from 1bcoder:
/mcp simargl vacuum

Units index is separate — re-run index units only when the SQLite is updated.

Parameters reference

Tool	Key params	Default
`find`	`mode` (tasks\|files), `sort` (rank\|freq), `top_n`, `top_k`, `top_m`, `include_diff`	tasks, rank, 10, 10, 5, false
`index_files`	`path`, `model_key`, `project_id`, `chunk_size`	—, bge-small, default, 400
`index_units`	`db_path`, `model_key`, `project_id`, `mode`	—, bge-small, default, auto
`embedding`	`text` or `file`, `project_id`	—
`distance`	`source1`, `source2`, `project_id`	—

Multiple projects

simargl index units kafka.db --project kafka
simargl index files C:/repos/kafka --project kafka

/mcp simargl find "add partition rebalance" project_id=kafka
/mcp simargl status project_id=kafka

Each project stores its vectors in .simargl/{project_id}/ independently.

Running on a dedicated machine or phone — LAN access

simargl can run on any machine on your LAN: a dedicated server, a Raspberry Pi, or an Android phone via Termux. The index, embedding model, MCP server, and web UI all run there — the laptop just opens a browser or connects an agent.

Web UI over LAN

The web UI binds to 0.0.0.0 by default, so it is immediately visible on the local network:

# on the server / phone
simargl ui --port 7860

# on any browser in the same LAN
http://192.168.1.42:7860

Install the UI extra if not already present:

pip install "simargl[ui]"

Keep it running in the background with tmux or nohup (see below). The UI lets you search all indexed projects, switch mode/sort, and browse results — no agent or CLI needed on the client side.

The UI includes a Download index section (collapsed accordion at the bottom). Select a project and click Prepare ZIP to download the full index (all six files: files.int8, files.db, units.int8, units.db, unit_files.db, meta.json) as a single ZIP. Extract it as .simargl/ on your local machine and run simargl search or simargl status without re-indexing.

unzip default.zip -d .simargl/
simargl status
simargl search "add author field to Book class"

Android (Termux) — full setup

With 8GB+ RAM Ollama + nomic-embed-text + simargl all fit comfortably on the phone. The laptop connects over LAN — no cloud, no GPU, everything local.

Phone setup (Termux)

# base tools
pkg update && pkg install python git

# Ollama for Android (ARM64)
pkg install ollama
ollama serve &
ollama pull nomic-embed-text      # 274MB embedding model
# optional: ollama pull nemotron-mini  (if you want LLM on phone too)

# simargl
pip install simargl
pip install simargl[http]         # adds starlette + uvicorn for LAN transport

# index your project (copy SQLite and repo to phone storage first)
simargl index units /sdcard/data/sonar.db \
    --project sonar \
    --model ollama://nomic-embed-text

simargl index files /sdcard/repos/sonar \
    --project sonar \
    --model ollama://nomic-embed-text

# start MCP server on LAN
simargl-mcp --http --port 8765
# → simargl MCP server — http://0.0.0.0:8765/sse

Laptop — connect to phone

Find phone IP: ip addr in Termux or check Wi-Fi settings.

1bcoder:

/mcp connect simargl http://192.168.1.42:8765/sse
/mcp tools simargl

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "simargl": {
      "url": "http://192.168.1.42:8765/sse"
    }
  }
}

Claude Code / OpenCode / Cursor — same URL pattern, see agent-specific config above.

Keep server running in Termux background

# run in background, log to file
nohup simargl-mcp --http --port 8765 > ~/.simargl-mcp.log 2>&1 &

# or use tmux (pkg install tmux)
tmux new -s simargl
simargl-mcp --http --port 8765
# Ctrl+B D  to detach

What runs where

Component	Server / phone	Laptop / client
Vector index (.simargl/)	yes	—
Embedding model	yes (Ollama)	—
MCP server (simargl-mcp)	yes	—
Web UI (simargl ui)	yes	browser only
Agent / LLM (1bcoder, Claude)	—	yes
Repo source files	yes (for indexing)	yes (for editing)

The phone stores the index and computes embeddings. The laptop runs the agent and edits code. Both use the same .simargl/ directory — if you prefer, mount phone storage via sshfs so the laptop can also run simargl index directly against it.

Connecting to agent systems

simargl-mcp uses stdio transport — the universal MCP default. Always pass --store-dir with the absolute path to your project root so the subprocess always finds .simargl/ regardless of which directory the agent system uses as its working directory.

Claude Desktop

Config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "simargl": {
      "command": "simargl-mcp",
      "args": ["--store-dir", "C:/repos/sonar/.simargl", "--project-id", "sonar"]
    }
  }
}

Restart Claude Desktop after editing. Tools appear automatically in the UI.

Claude Code (CLI)

Option A — add to global settings ~/.claude/settings.json:

{
  "mcpServers": {
    "simargl": {
      "command": "simargl-mcp",
      "args": ["--store-dir", "C:/repos/sonar/.simargl", "--project-id", "sonar"]
    }
  }
}

Option B — connect interactively from any session (no restart needed):

/mcp add simargl simargl-mcp --store-dir C:/repos/sonar/.simargl --project-id sonar

Then call tools directly in your prompt:

use simargl find to locate files related to "add buildString to project analysis"

OpenCode

Config file: ~/.config/opencode/config.json

{
  "mcp": {
    "simargl": {
      "command": ["simargl-mcp"],
      "cwd": "C:/repos/sonar"
    }
  }
}

OpenAI Codex CLI

Config file: ~/.codex/config.yaml

mcp_servers:
  simargl:
    command: simargl-mcp
    cwd: C:/repos/sonar

Cursor

Config file: .cursor/mcp.json in your project root (or global ~/.cursor/mcp.json):

{
  "mcpServers": {
    "simargl": {
      "command": "simargl-mcp",
      "cwd": "${workspaceFolder}"
    }
  }
}

${workspaceFolder} resolves to the open project directory — simargl will look for .simargl/ there.

Windsurf (Codeium)

Config file: ~/.codeium/windsurf/mcp_settings.json

{
  "mcpServers": {
    "simargl": {
      "command": "simargl-mcp",
      "cwd": "C:/repos/sonar"
    }
  }
}

Any other MCP-compatible system

The pattern is always the same:

{
  "command": "simargl-mcp",
  "args": [],
  "cwd": "<directory where .simargl/ lives>"
}

If the agent system does not support cwd, pass it as an env variable instead and adjust the server startup — or simply cd to the right directory before launching.

Tip: multiple projects across agents

If you work on several repos, use project_id to keep their indexes separate under the same .simargl/ directory:

find files related to "add partition rebalance"  project_id=kafka
find files related to "add buildString to API"   project_id=sonar

PostgreSQL + pgvector backend

For larger codebases or when you want sub-linear search via HNSW index.

pip install simargl[postgres]

Requires PostgreSQL with pgvector extension:

CREATE EXTENSION IF NOT EXISTS vector;

Index with postgres backend

simargl index units sonar.db --project sonar \
    --backend postgres \
    --db-url postgresql://postgres:postgres@localhost/simargl

simargl index files C:/repos/sonar --project sonar \
    --backend postgres \
    --db-url postgresql://postgres:postgres@localhost/simargl

MCP server with postgres

simargl-mcp --backend postgres \
    --db-url postgresql://postgres:postgres@localhost/simargl

numpy vs postgres — when to choose which

	numpy	postgres
Install	zero extra deps	psycopg2 + pgvector
Search speed	linear scan	sub-linear (HNSW)
Scales well to	~500k chunks	millions of chunks
Vacuum	file rebuild	`DELETE` + `VACUUM ANALYZE`
Concurrent writes	no	yes
Termux / Android	yes	harder
Laptop / server	yes	yes

For most projects (sonar.db = ~100k chunks) numpy is fast enough. Switch to postgres when search latency becomes noticeable or you index multiple large repos.

About

(c) 2026 Stanislav Zholobetskyi
Institute for Information Recording, National Academy of Sciences of Ukraine, Kyiv
PhD research: «Intelligent Technology for Software Development and Maintenance Support»

Deferred (session 2)

Ollama and OpenAI embedding providers (ollama://nomic-embed, openai://text-embedding-3-small)
Mode aggregated — avg task vectors → file search
Set operations: /mcp simargl find "query" mode=tasks+files (union/intersection)
Gradio web UI (simargl ui)
PostgreSQL backend (pip install "simargl[postgres]")

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Apr 16, 2026

0.1.0

Apr 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simargl-0.1.1.tar.gz (50.0 kB view details)

Uploaded Apr 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

simargl-0.1.1-py3-none-any.whl (50.6 kB view details)

Uploaded Apr 16, 2026 Python 3

File details

Details for the file simargl-0.1.1.tar.gz.

File metadata

Download URL: simargl-0.1.1.tar.gz
Upload date: Apr 16, 2026
Size: 50.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for simargl-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`c85897d7cb21a2540db46d78284d938464ca6ae151537adf9e564c3101883a9e`
MD5	`0c8bed52c69f28f882484d04f29c02fd`
BLAKE2b-256	`65a1185385a153c1668b1659ccacf69dc71aec1d56962be38cc287581e592e65`

See more details on using hashes here.

File details

Details for the file simargl-0.1.1-py3-none-any.whl.

File metadata

Download URL: simargl-0.1.1-py3-none-any.whl
Upload date: Apr 16, 2026
Size: 50.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for simargl-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`95e5fe723c5a3205ed709f010efc077a26c28adfc607120a6604421b2bcbeb3a`
MD5	`ffc0a14ce9c4c02d8376a99cada7c8bd`
BLAKE2b-256	`2490c1b1a671fde34ab4dd71ac3372ac6b086837feedaabe0974af9770b18c40`

See more details on using hashes here.

simargl 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

simargl

Install

Step 1 — Index your project

Step 2 — Connect to 1bcoder

Step 3 — Index

Step 4 — Search

Find files related to a description

All parameters

If you used a custom project_id at index time

Typical 1bcoder workflow

Other tools

Check index status

Compute embedding for any text

Measure semantic distance between two things

Vacuum (reclaim disk after many incremental re-indexes)

Re-index after code changes

Parameters reference

Multiple projects

Running on a dedicated machine or phone — LAN access

Web UI over LAN

Android (Termux) — full setup

Phone setup (Termux)

Laptop — connect to phone

Keep server running in Termux background

What runs where

Connecting to agent systems

Claude Desktop

Claude Code (CLI)

OpenCode

OpenAI Codex CLI

Cursor

Windsurf (Codeium)

Any other MCP-compatible system

Tip: multiple projects across agents

PostgreSQL + pgvector backend

Index with postgres backend

MCP server with postgres

numpy vs postgres — when to choose which

About

Deferred (session 2)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes