Token-efficient CLI for indexing and searching code symbols (Python-first, designed for minimal LLM/agent context size)
Project description
Sampler
Token-efficient CLI for indexing and searching code symbols across multiple projects.
Current version: 0.4.3
Designed for humans and agents: compact default output, short paths, and low-noise symbol views.
Requirements
- Python 3.11+
Installation
pip install sampler-cli
Development setup:
pip install -e '.[dev]'
Semantic stack (TF-IDF + local hash fallback):
pip install -e '.[semantic]'
Quick Start
sampler init
sampler project add myproj /absolute/path/to/project --language auto
sampler index myproj
sampler search retry --project myproj
sampler symbols myproj
sampler overview src/main.py
Command Overview
Core:
sampler version [--plain]sampler initsampler index <project>sampler search <query> [--project <name>] [--type <t>] [--limit <n>] [--semantic] [--style plain|bars]sampler search-all <query> [--type <t>] [--limit <n>]sampler symbols <project> [--type <t>] [--limit <n>]sampler overview <filepath> [--style plain|bars]
Relationships:
sampler callers <symbol> [--project <name>] [--file <path-or-suffix>]sampler usages <symbol> [--project <name>] [--file <path-or-suffix>]sampler related <symbol> [--project <name>] [--file <path-or-suffix>] [--style plain|bars]- Selector alternativo:
<path>:<symbol>(ej.app/utils/helpers.py:format_kda)
Project management:
sampler project add <name> <path> --language <python|go|typescript|javascript|auto>sampler project update <name> [--path <abs-path>] [--language <lang>]sampler project listsampler project deps <name>sampler project remove <name>
Config:
sampler config showsampler config embeddings [--provider P] [--model M]
Semantic and analysis:
sampler embed <project> [--batch-size <n>]sampler stale-code <project> [--limit <n>]
Embeddings & Semantic Search
sampler search --semantic (and hybrid ranking) supports pluggable providers via the adapter pattern:
- Default:
bge-small(BAAI/bge-small-en-v1.5 via fastembed — lightweight ONNX, ~384 dim, local). - Other built-ins:
hash(always-on deterministic fallback),ollama(e.g. nomic-embed-text),nomic,openai,fastembed. - TF-IDF (sklearn, on-the-fly, no pre-embed) remains the fast lexical primary when no provider embeddings are precomputed for the active model.
- Hash fingerprint is the final always-available fallback.
Configuration (in ~/.sampler/config.yaml or via sampler config embeddings ...):
embeddings:
provider: "bge-small"
# provider: "ollama"
# model: "nomic-embed-text"
# base_url: "http://localhost:11434"
Install:
# For default BGE (recommended for most users)
pip install 'sampler-cli[embeddings]'
# Or for Ollama / OpenAI only
pip install 'sampler-cli[ollama-embeddings]'
pip install 'sampler-cli[openai-embeddings]'
sampler embed <project> precomputes vectors using the current configured provider (progress bar). Changing provider? Re-run embed after updating config (old vectors are ignored until re-embedded).
Offline / air-gapped: provider: hash (or just don't install the embeddings extra — TF-IDF + hash still work if you have [semantic]).
Language Support
- Python parser: stdlib AST (stable)
- Go parser: tree-sitter-go (real extraction)
- TypeScript/JavaScript parser: tree-sitter-typescript (real extraction)
--language auto: per-file language detection for monorepos/multi-language projects
Stale Code Detection
sampler stale-code <project> reports candidate stale functions/methods where:
- function is called from test files
- function has zero non-test callers in project call graph
- symbol is defined in production code (symbols defined in test files are excluded)
Test file detection supports common multi-language patterns:
- Python:
tests/,test_*.py,*_test.py - Go:
*_test.go - TypeScript/JavaScript:
__tests__/,test/,spec/,*.test.*,*.spec.*
This is heuristic signal, not guaranteed dead-code proof.
Examples
$ sampler search worker --project myproj
myproj:src/tasks/celery_app.py:70 function on_worker_ready def on_worker_ready(sender)
$ sampler related ConfigManager --project myproj --style bars
myproj:src/config.py:24-105 class ConfigManager [parent]
...
$ sampler stale-code myproj
myproj:src/utils/retry.py:12-28 function retry_request test_callers=2 non_test_callers=0 [tests.test_retry.test_retry_request]
Data Location
- Config:
~/.sampler/config.yaml - DB:
~/.sampler/graph.db
Running Tests
pytest -q
Notes
- Compact output is default by design (token-efficient for agent workflows).
- For broader roadmap details, see
TODO.mdandPLAN.md.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sampler_cli-0.4.3.tar.gz.
File metadata
- Download URL: sampler_cli-0.4.3.tar.gz
- Upload date:
- Size: 58.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9a65557b574473af276967656e550cd1507a65904e0f964c14ebebb7a5941de
|
|
| MD5 |
8fae991f178f59d9555437d300cad6bc
|
|
| BLAKE2b-256 |
6bfaad66a67a110dc14b54b78a24ab6d573ec34512966c17b8564c5e3dcf9bd2
|
Provenance
The following attestation bundles were made for sampler_cli-0.4.3.tar.gz:
Publisher:
publish.yml on SamuelCarmona83/sampler-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sampler_cli-0.4.3.tar.gz -
Subject digest:
b9a65557b574473af276967656e550cd1507a65904e0f964c14ebebb7a5941de - Sigstore transparency entry: 2047659879
- Sigstore integration time:
-
Permalink:
SamuelCarmona83/sampler-cli@4b503b91fe654cb800aae5c387d00df7def0ef13 -
Branch / Tag:
refs/tags/0.4.3 - Owner: https://github.com/SamuelCarmona83
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4b503b91fe654cb800aae5c387d00df7def0ef13 -
Trigger Event:
release
-
Statement type:
File details
Details for the file sampler_cli-0.4.3-py3-none-any.whl.
File metadata
- Download URL: sampler_cli-0.4.3-py3-none-any.whl
- Upload date:
- Size: 56.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54b059a6a8ed17cc9fc7d537aaef16be6074041514823efa2dffe2b200974ab9
|
|
| MD5 |
93bb309a60e28c830d8932d0843e4d19
|
|
| BLAKE2b-256 |
467c5b9c6a3192e7cc287e3348e12636f4f71321950ed78bdf263c10778e74f7
|
Provenance
The following attestation bundles were made for sampler_cli-0.4.3-py3-none-any.whl:
Publisher:
publish.yml on SamuelCarmona83/sampler-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sampler_cli-0.4.3-py3-none-any.whl -
Subject digest:
54b059a6a8ed17cc9fc7d537aaef16be6074041514823efa2dffe2b200974ab9 - Sigstore transparency entry: 2047660170
- Sigstore integration time:
-
Permalink:
SamuelCarmona83/sampler-cli@4b503b91fe654cb800aae5c387d00df7def0ef13 -
Branch / Tag:
refs/tags/0.4.3 - Owner: https://github.com/SamuelCarmona83
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4b503b91fe654cb800aae5c387d00df7def0ef13 -
Trigger Event:
release
-
Statement type: