Engineer Thesis: Explaining and modifying LLM responses using SAE and concepts.
Project description
Running Tests
The project uses pytest for testing. Tests are organized into unit tests and end-to-end tests.
Running All Tests
pytest
Running Specific Test Suites
Run only unit tests:
pytest --unit -q
Run only end-to-end tests:
pytest --e2e -q
You can also use pytest markers:
pytest -m unit -q
pytest -m e2e -q
Or specify the test directory directly:
pytest tests/unit -q
pytest tests/e2e -q
Test Coverage
The test suite is configured to require at least 85% code coverage. Coverage reports are generated in both terminal and XML formats.
Backend (FastAPI) quickstart
Install server-only dependencies (kept out of the core library) with uv:
uv sync --group server
Run the API:
uv run --group server uvicorn server.main:app --reload
Smoke-test the server endpoints:
uv run --group server pytest tests/server/test_api.py --cov=server --cov-fail-under=0
SAE API usage
- Configure artifact location (optional):
export SERVER_ARTIFACT_BASE_PATH=/path/to/mi_crow_artifacts(defaults to~/.cache/mi_crow_server) - Load a model:
curl -X POST http://localhost:8000/models/load -H "Content-Type: application/json" -d '{"model_id":"bielik"}' - Save activations from dataset (stored in
LocalStoreunderactivations/<model>/<run_id>):- HF dataset:
{"dataset":{"type":"hf","name":"ag_news","split":"train","text_field":"text"}} - Local files:
{"dataset":{"type":"local","paths":["/path/to/file.txt"]}} - Example:
curl -X POST http://localhost:8000/sae/activations/save -H "Content-Type: application/json" -d '{"model_id":"bielik","layers":["dummy_root"],"dataset":{"type":"local","paths":["/tmp/data.txt"]},"sample_limit":100,"batch_size":4,"shard_size":64}'→ returns a manifest path, run_id, token counts, and batch metadata.
- HF dataset:
- List activation runs:
curl "http://localhost:8000/sae/activations?model_id=bielik" - Start SAE training (async job, uses
SaeTrainer):curl -X POST http://localhost:8000/sae/train -H "Content-Type: application/json" -d '{"model_id":"bielik","activations_path":"/path/to/manifest.json","layer":"<layer_name>","sae_class":"TopKSae","hyperparams":{"epochs":1,"batch_size":256}}'→ returnsjob_id - Check job status:
curl http://localhost:8000/sae/train/status/<job_id>(returnssae_id,sae_path,metadata_path, progress, and logs) - Cancel a job (best-effort):
curl -X POST http://localhost:8000/sae/train/cancel/<job_id> - Load an SAE:
curl -X POST http://localhost:8000/sae/load -H "Content-Type: application/json" -d '{"model_id":"bielik","sae_path":"/path/to/sae.json"}' - List SAEs:
curl "http://localhost:8000/sae/saes?model_id=bielik" - Run SAE inference (optionally save top texts and apply concept config):
curl -X POST http://localhost:8000/sae/infer -H "Content-Type: application/json" -d '{"model_id":"bielik","sae_id":"<sae_id>","save_top_texts":true,"top_k_neurons":5,"concept_config_path":"/path/to/concepts.json","inputs":[{"prompt":"hi"}]}'→ returns outputs, top neuron summary, sae metadata, and saved top-texts path when requested. - Per-token latents: add
"return_token_latents": true(default off) to include top-k neuron activations per token. - List concepts:
curl "http://localhost:8000/sae/concepts?model_id=bielik&sae_id=<sae_id>" - Load concepts from a file (validated against SAE latents):
curl -X POST http://localhost:8000/sae/concepts/load -H "Content-Type: application/json" -d '{"model_id":"bielik","sae_id":"<sae_id>","source_path":"/path/to/concepts.json"}' - Manipulate concepts (saves a config file for inference-time scaling):
curl -X POST http://localhost:8000/sae/concepts/manipulate -H "Content-Type: application/json" -d '{"model_id":"bielik","sae_id":"<sae_id>","edits":{"0":1.2}}' - List concept configs:
curl "http://localhost:8000/sae/concepts/configs?model_id=bielik&sae_id=<sae_id>" - Preview concept config (validate without saving):
curl -X POST http://localhost:8000/sae/concepts/preview -H "Content-Type: application/json" -d '{"model_id":"bielik","sae_id":"<sae_id>","edits":{"0":1.2}}' - Delete activation run or SAE (requires API key if set):
curl -X DELETE "http://localhost:8000/sae/activations/<run_id>?model_id=bielik" -H "X-API-Key: <key>"andcurl -X DELETE "http://localhost:8000/sae/saes/<sae_id>?model_id=bielik" -H "X-API-Key: <key>" - Health/metrics summary:
curl http://localhost:8000/health/metrics(in-memory job counts; no persistence, no auth)
Notes:
- Job manager is in-memory/lightweight: jobs disappear on process restart; idempotency is best-effort via payload key.
- Training/inference currently run in-process threads; add your own resource guards when running heavy models.
- Optional API key protection: set
SERVER_API_KEY=<value>to requireX-API-Keyon protected endpoints (delete).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mi_crow-0.1.1.post14.tar.gz.
File metadata
- Download URL: mi_crow-0.1.1.post14.tar.gz
- Upload date:
- Size: 349.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32e6bd1f822ecd9f3988d138aea8d04223b0e33000c154e6719ab30733ba5b0c
|
|
| MD5 |
2df297d837d34cf686c85068aef03b20
|
|
| BLAKE2b-256 |
670c3481a233ce10d57832bd794fea29a9b5341b072efd802150131d95d8d5d4
|
Provenance
The following attestation bundles were made for mi_crow-0.1.1.post14.tar.gz:
Publisher:
publish.yml on AdamKaniasty/Mi-Crow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mi_crow-0.1.1.post14.tar.gz -
Subject digest:
32e6bd1f822ecd9f3988d138aea8d04223b0e33000c154e6719ab30733ba5b0c - Sigstore transparency entry: 782528408
- Sigstore integration time:
-
Permalink:
AdamKaniasty/Mi-Crow@bbb65648bef6cc05bd08563f3f60a572e6849732 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/AdamKaniasty
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@bbb65648bef6cc05bd08563f3f60a572e6849732 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mi_crow-0.1.1.post14-py3-none-any.whl.
File metadata
- Download URL: mi_crow-0.1.1.post14-py3-none-any.whl
- Upload date:
- Size: 91.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a5cab3b39c1ad8b036386d21fbecd58299413e2d9079c3e655d08c907b4d769
|
|
| MD5 |
01b0aac8c1dcf0158da32787fa1244e8
|
|
| BLAKE2b-256 |
ec4fe8acbe737c3b4ce4cf3ba29834d9a66f1240918519316027bc1d525dc8ef
|
Provenance
The following attestation bundles were made for mi_crow-0.1.1.post14-py3-none-any.whl:
Publisher:
publish.yml on AdamKaniasty/Mi-Crow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mi_crow-0.1.1.post14-py3-none-any.whl -
Subject digest:
7a5cab3b39c1ad8b036386d21fbecd58299413e2d9079c3e655d08c907b4d769 - Sigstore transparency entry: 782528463
- Sigstore integration time:
-
Permalink:
AdamKaniasty/Mi-Crow@bbb65648bef6cc05bd08563f3f60a572e6849732 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/AdamKaniasty
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@bbb65648bef6cc05bd08563f3f60a572e6849732 -
Trigger Event:
push
-
Statement type: