DataKrypto FHEnom for AI™ — Automated POC Head-to-Head Test Suite for encrypted vs. plaintext model validation
Project description
dk-test-suite — FHEnom for AI™ POC Test Suite
Automated head-to-head validation framework that provides quantitative, side-by-side evidence that a FHEnom-encrypted model is equivalent to its plaintext counterpart across performance, accuracy, scalability, security, serving, and training dimensions.
Test Categories
| Category | IDs | Description |
|---|---|---|
| Performance | PERF-1 … PERF-5 | TTFT, throughput (tok/s), E2E latency, model footprint (disk), GPU VRAM usage |
| Accuracy | ACC-1 … ACC-5 | Exact match (deterministic equivalence), lm-eval-harness benchmarks, BERTScore semantic similarity, response-length t-test, perplexity |
| Scalability | SCALE-1 … SCALE-4 | Concurrent load (1–50 users), context length (512–8 192 tokens), batch processing (1–32), 24 h sustained operation |
| Security | SEC-1 … SEC-6 | Encryption at rest, runtime memory inspection, TLS transport, model binding, key isolation, log safety |
| Serving | SERV-1 … SERV-7 | File integrity, vLLM health, TEE status, inference coherence, encryption-reality bypass proof, output fidelity, FHE overhead |
| Training | T-1 … T-3 | Secure checkpoints (LoRA/MoE), convergence, post-training inference quality (Extended POC) |
Quick Start — Running from the GPU Machine
This is the most common setup: the test suite runs directly on the GPU server, connecting out to the remote TEE machine via HTTPS.
1. Install on the GPU machine
cd POC_TEST_SUITE/dk_test_suite
python3 -m venv env
source env/bin/activate
pip install -e .
2. Edit config/default.yaml
Set local_mode: true — the runner automatically sets gpu_host to 127.0.0.1 for vLLM HTTP calls and uses local subprocesses for all Docker/shell commands. No SSH key needed:
local_mode: true # no SSH key; gpu_host auto-set to 127.0.0.1
tee_host: "<TEE_IP>" # TEE machine (remote)
tee_user_token: "<your-token>"
model_name: "Llama-3.2-1B-Instruct"
encrypted_model_name: "Llama-3.2-1B-Instruct-encrypted"
encrypted_model_id: "<UUID from fhenomai model list>"
model_path_clear: "/home/<user>/models/Llama-3.2-1B-Instruct"
model_path_encrypted: "/home/<user>/models/Llama-3.2-1B-Instruct-encrypted"
3. Run
# Option A — vLLM containers already running (from serving_cli_tutorial.ipynb)
dk-test run --skip-clear-vllm --skip-encrypted-vllm
# Option B — full suite, runner manages vLLM container lifecycle
docker rm -f dk_vllm_bench 2>/dev/null || true
dk-test run
# Option C — quick smoke test (serving + security only, ~5–10 min)
dk-test run -t serving -t security --skip-clear-vllm --skip-encrypted-vllm
# Option D — specific categories
dk-test run -t performance -t accuracy -t serving
# Override any config value from CLI
dk-test run --num-prompts 50 -v
4. View the report
# The HTML report is in:
ls results/poc_report_*.html
# Copy to your local machine for viewing:
scp root@<GPU_IP>:~/POC_TEST_SUITE/dk_test_suite/results/poc_report_*.html ./
Remote Runner Setup (Alternative)
If you run the suite from a separate machine (not the GPU server), do not set
local_mode. Instead provide the GPU's external IP and an SSH key:
gpu_host: "34.162.1.145"
gpu_ssh_user: "root"
gpu_ssh_key: "/path/to/gpu.key"
tee_host: "<TEE_IP>"
tee_user_token: "<your-token>"
Configuration
All settings live in config/default.yaml. Override via (in order of precedence):
- CLI flags (
--gpu-host,--tee-host,--model,--num-prompts, etc.) - Custom YAML file (
-c my_config.yaml) - Environment variables (
DK_GPU_HOST,DK_TEE_HOST,DK_MODEL_NAME,DK_OUTPUT_DIR)
Key parameters
| Key | Description |
|---|---|
local_mode |
Set true when running on the GPU machine — uses subprocesses and auto-sets gpu_host to 127.0.0.1; no SSH key needed |
gpu_host |
IP of the GPU server (only used when local_mode is false) |
gpu_ssh_key |
Path to SSH private key for the GPU (only used when local_mode is false) |
tee_host |
IP of the TEE machine |
tee_user_token |
Bearer token for TEE inference auth |
model_name |
Clear model HF name (e.g. Llama-3.2-1B-Instruct) |
encrypted_model_name |
Name as registered in the TEE |
encrypted_model_id |
UUID from fhenomai model list --show-details |
model_path_clear |
Absolute path to clear model weights on GPU |
model_path_encrypted |
Absolute path to encrypted model weights on GPU |
num_prompts |
Benchmark prompt count (default: 100) |
temperature |
Inference temperature (default: 0, deterministic) |
Architecture
dk-test run
│
├── Phase 1: Clear Model Tests
│ ├── Start vLLM (clear model)
│ ├── Run PERF / ACC / SCALE tests
│ └── Stop vLLM
│
├── Phase 2: Encrypted Model Tests
│ ├── Start vLLM (encrypted model)
│ ├── Register with TEE
│ ├── Run PERF / ACC / SCALE tests (via TEE)
│ └── Stop vLLM
│
├── Phase 3: Security Validation
│ └── SEC-1 … SEC-6 (automated)
│
├── Phase 4: Training Tests
│ └── T-1 … T-3 (if enabled)
│
└── Generate HTML Report
├── Side-by-side comparison tables
├── Pass/fail per criterion
└── Full configuration dump
Pass Conditions
PERF-*, ACC-*, SCALE-*, and T-* tests are measurement-only — they always PASS and document overhead. Hard pass/fail is applied to SEC-* and SERV-* tests only.
| ID | Pass Condition |
|---|---|
| PERF-1 | Measurement — TTFT comparison (no threshold) |
| PERF-2 | Measurement — throughput comparison (no threshold) |
| PERF-3 | Measurement — E2E latency comparison (no threshold) |
| PERF-4 | Measurement — disk size ratio (no threshold) |
| PERF-5 | Measurement — VRAM delta (no threshold) |
| ACC-1 | Measurement — exact match rate between clear and encrypted outputs |
| ACC-2 | Measurement — lm-eval benchmark scores (SKIP if not installed) |
| ACC-3 | Measurement — BERTScore F1 semantic similarity |
| ACC-4 | Measurement — t-test p-value for response length distribution |
| ACC-5 | Measurement — perplexity (encrypted TEE does not return logprobs — N/A) |
| SCALE-1–4 | Measurement — throughput / latency at each load level |
| SEC-1 | No plaintext weights or high-entropy data section on disk |
| SEC-2 | No plaintext weight patterns in process memory maps |
| SEC-3 | TEE endpoint uses HTTPS; no plaintext in network capture |
| SEC-4 | Encrypted model cannot be loaded outside FHEnom |
| SEC-5 | No key material in env vars, docker config, or accessible files |
| SEC-6 | No plaintext data in container or system logs |
| SERV-1 | Required model files present, total size ≥ 1.5 GB |
| SERV-2 | vLLM responds HTTP 200 on /health |
| SERV-3 | Encrypted model visible in TEE /v1/models |
| SERV-4 | TEE outputs coherent English (space ratio ≥ 0.08) |
| SERV-5 | Direct bypass of TEE produces garbled output (space ratio ≤ 0.04) |
| SERV-6 | SequenceMatcher similarity ≥ 0.70 vs clear model for every probe |
| SERV-7 | FHE enc overhead < 10 ms and dec overhead < 5 ms per token |
| T-1–3 | SKIP when no dataset_path configured |
Dependencies
Core: click, pyyaml, httpx, paramiko, rich, jinja2, numpy, scipy
Optional (accuracy): pip install dk-test-suite[accuracy]
lm-eval(EleutherAI LM Evaluation Harness)deepeval(Confident AI)bert-score
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dk_test_suite-1.0.0.tar.gz.
File metadata
- Download URL: dk_test_suite-1.0.0.tar.gz
- Upload date:
- Size: 113.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
820d76c265a8add9703e6eebf37166a7473897c7d0efab3e5bc47e6dc11096fa
|
|
| MD5 |
b3994be5265055de00c2e127c7b70d52
|
|
| BLAKE2b-256 |
97f4ca3ffa65f941f27b5d9b3af3160c5b7ae969adceee5cfe55d564597ab7ba
|
File details
Details for the file dk_test_suite-1.0.0-py3-none-any.whl.
File metadata
- Download URL: dk_test_suite-1.0.0-py3-none-any.whl
- Upload date:
- Size: 89.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5cbd6d0ecfefac44bf71277934cc49fdefba65739d51969e54df0e0bba305c0
|
|
| MD5 |
ec4d08caf3511e613f2c0f0cf1524b4e
|
|
| BLAKE2b-256 |
3c435ce8950eeee5560e963a41445539a75bc0cc524bb1c54c9bf3d629cce39e
|