mi-crow

Engineer Thesis: Explaining and modifying LLM responses using SAE and concepts.

Project description

Running Tests

The project uses pytest for testing. Tests are organized into unit tests and end-to-end tests.

Running All Tests

pytest

Running Specific Test Suites

Run only unit tests:

pytest --unit -q

Run only end-to-end tests:

pytest --e2e -q

You can also use pytest markers:

pytest -m unit -q
pytest -m e2e -q

Or specify the test directory directly:

pytest tests/unit -q
pytest tests/e2e -q

Test Coverage

The test suite is configured to require at least 85% code coverage. Coverage reports are generated in both terminal and XML formats.

Backend (FastAPI) quickstart

Install server-only dependencies (kept out of the core library) with uv:

uv sync --group server

Run the API:

uv run --group server uvicorn server.main:app --reload

Smoke-test the server endpoints:

uv run --group server pytest tests/server/test_api.py --cov=server --cov-fail-under=0

SAE API usage

Configure artifact location (optional): export SERVER_ARTIFACT_BASE_PATH=/path/to/mi_crow_artifacts (defaults to ~/.cache/mi_crow_server)
Load a model: curl -X POST http://localhost:8000/models/load -H "Content-Type: application/json" -d '{"model_id":"bielik"}'
Save activations from dataset (stored in LocalStore under activations/<model>/<run_id>):
- HF dataset: {"dataset":{"type":"hf","name":"ag_news","split":"train","text_field":"text"}}
- Local files: {"dataset":{"type":"local","paths":["/path/to/file.txt"]}}
- Example: curl -X POST http://localhost:8000/sae/activations/save -H "Content-Type: application/json" -d '{"model_id":"bielik","layers":["dummy_root"],"dataset":{"type":"local","paths":["/tmp/data.txt"]},"sample_limit":100,"batch_size":4,"shard_size":64}' → returns a manifest path, run_id, token counts, and batch metadata.
List activation runs: curl "http://localhost:8000/sae/activations?model_id=bielik"
Start SAE training (async job, uses SaeTrainer): curl -X POST http://localhost:8000/sae/train -H "Content-Type: application/json" -d '{"model_id":"bielik","activations_path":"/path/to/manifest.json","layer":"<layer_name>","sae_class":"TopKSae","hyperparams":{"epochs":1,"batch_size":256}}' → returns job_id
Check job status: curl http://localhost:8000/sae/train/status/<job_id> (returns sae_id, sae_path, metadata_path, progress, and logs)
Cancel a job (best-effort): curl -X POST http://localhost:8000/sae/train/cancel/<job_id>
Load an SAE: curl -X POST http://localhost:8000/sae/load -H "Content-Type: application/json" -d '{"model_id":"bielik","sae_path":"/path/to/sae.json"}'
List SAEs: curl "http://localhost:8000/sae/saes?model_id=bielik"
Run SAE inference (optionally save top texts and apply concept config): curl -X POST http://localhost:8000/sae/infer -H "Content-Type: application/json" -d '{"model_id":"bielik","sae_id":"<sae_id>","save_top_texts":true,"top_k_neurons":5,"concept_config_path":"/path/to/concepts.json","inputs":[{"prompt":"hi"}]}' → returns outputs, top neuron summary, sae metadata, and saved top-texts path when requested.
Per-token latents: add "return_token_latents": true (default off) to include top-k neuron activations per token.
List concepts: curl "http://localhost:8000/sae/concepts?model_id=bielik&sae_id=<sae_id>"
Load concepts from a file (validated against SAE latents): curl -X POST http://localhost:8000/sae/concepts/load -H "Content-Type: application/json" -d '{"model_id":"bielik","sae_id":"<sae_id>","source_path":"/path/to/concepts.json"}'
Manipulate concepts (saves a config file for inference-time scaling): curl -X POST http://localhost:8000/sae/concepts/manipulate -H "Content-Type: application/json" -d '{"model_id":"bielik","sae_id":"<sae_id>","edits":{"0":1.2}}'
List concept configs: curl "http://localhost:8000/sae/concepts/configs?model_id=bielik&sae_id=<sae_id>"
Preview concept config (validate without saving): curl -X POST http://localhost:8000/sae/concepts/preview -H "Content-Type: application/json" -d '{"model_id":"bielik","sae_id":"<sae_id>","edits":{"0":1.2}}'
Delete activation run or SAE (requires API key if set): curl -X DELETE "http://localhost:8000/sae/activations/<run_id>?model_id=bielik" -H "X-API-Key: <key>" and curl -X DELETE "http://localhost:8000/sae/saes/<sae_id>?model_id=bielik" -H "X-API-Key: <key>"
Health/metrics summary: curl http://localhost:8000/health/metrics (in-memory job counts; no persistence, no auth)

Notes:

Job manager is in-memory/lightweight: jobs disappear on process restart; idempotency is best-effort via payload key.
Training/inference currently run in-process threads; add your own resource guards when running heavy models.
Optional API key protection: set SERVER_API_KEY=<value> to require X-API-Key on protected endpoints (delete).

Project details

Release history Release notifications | RSS feed

1.0.0.post10

Feb 3, 2026

1.0.0.post9

Feb 1, 2026

1.0.0.post8

Jan 31, 2026

1.0.0.post7

Jan 31, 2026

1.0.0.post6

Jan 30, 2026

1.0.0.post5

Jan 30, 2026

1.0.0.post3

Jan 25, 2026

1.0.0.post2

Jan 25, 2026

1.0.0.post1

Jan 25, 2026

1.0.0

Jan 25, 2026

0.1.2

Jan 2, 2026

0.1.1.post17

Jan 2, 2026

0.1.1.post16

Jan 2, 2026

0.1.1.post15

Dec 30, 2025

0.1.1.post14

Dec 30, 2025

This version

0.1.1.post13

Dec 30, 2025

0.1.1.post12

Dec 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mi_crow-0.1.1.post13.tar.gz (349.8 kB view details)

Uploaded Dec 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mi_crow-0.1.1.post13-py3-none-any.whl (90.8 kB view details)

Uploaded Dec 30, 2025 Python 3

File details

Details for the file mi_crow-0.1.1.post13.tar.gz.

File metadata

Download URL: mi_crow-0.1.1.post13.tar.gz
Upload date: Dec 30, 2025
Size: 349.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mi_crow-0.1.1.post13.tar.gz
Algorithm	Hash digest
SHA256	`c5b807cd2abf201228fa48bab26eb52510e865fe2d966a867e324caec3f8e83f`
MD5	`97eedec34ab5aac82c74b3fbb77ecce2`
BLAKE2b-256	`551fd30c1fa79fe4f0d993fb3146b17e5f371d68a48e99af95dff30ab8bd4359`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mi_crow-0.1.1.post13.tar.gz:

Publisher: publish.yml on AdamKaniasty/Mi-Crow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mi_crow-0.1.1.post13.tar.gz
- Subject digest: c5b807cd2abf201228fa48bab26eb52510e865fe2d966a867e324caec3f8e83f
- Sigstore transparency entry: 782524719
- Sigstore integration time: Dec 30, 2025
Source repository:
- Permalink: AdamKaniasty/Mi-Crow@78e525f4091b06098acd70826c3cebf38d2c5c52
- Branch / Tag: refs/heads/main
- Owner: https://github.com/AdamKaniasty
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@78e525f4091b06098acd70826c3cebf38d2c5c52
- Trigger Event: push

File details

Details for the file mi_crow-0.1.1.post13-py3-none-any.whl.

File metadata

Download URL: mi_crow-0.1.1.post13-py3-none-any.whl
Upload date: Dec 30, 2025
Size: 90.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mi_crow-0.1.1.post13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d8f010eeb6e37743c2cf561c95553fcbfea2a33d98e46ea524a099bc61f8c31e`
MD5	`2bf8afb4796e53cd2ae002d5d5fdd883`
BLAKE2b-256	`c70b910e24f5aae0ac756487824a0338415021e1fbca1fc8670848aaef7d5ff4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mi_crow-0.1.1.post13-py3-none-any.whl:

Publisher: publish.yml on AdamKaniasty/Mi-Crow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mi_crow-0.1.1.post13-py3-none-any.whl
- Subject digest: d8f010eeb6e37743c2cf561c95553fcbfea2a33d98e46ea524a099bc61f8c31e
- Sigstore transparency entry: 782524722
- Sigstore integration time: Dec 30, 2025
Source repository:
- Permalink: AdamKaniasty/Mi-Crow@78e525f4091b06098acd70826c3cebf38d2c5c52
- Branch / Tag: refs/heads/main
- Owner: https://github.com/AdamKaniasty
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@78e525f4091b06098acd70826c3cebf38d2c5c52
- Trigger Event: push

mi-crow 0.1.1.post13

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Running Tests

Running All Tests

Running Specific Test Suites

Test Coverage

Backend (FastAPI) quickstart

SAE API usage

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance