Skip to main content

DataKrypto FHEnom for AI™ — Automated POC Head-to-Head Test Suite for encrypted vs. plaintext model validation

Project description

dk-test-suite — FHEnom for AI™ POC Test Suite

Automated head-to-head validation framework that provides quantitative, side-by-side evidence that a FHEnom-encrypted model is equivalent to its plaintext counterpart across performance, accuracy, scalability, security, serving, and training dimensions.

Test Categories

Category IDs Description
Performance PERF-1 … PERF-5 TTFT, throughput (tok/s), E2E latency, model footprint (disk), GPU VRAM usage
Accuracy ACC-1 … ACC-5 Exact match (deterministic equivalence), lm-eval-harness benchmarks, BERTScore semantic similarity, response-length t-test, perplexity
Scalability SCALE-1 … SCALE-4 Concurrent load (1–50 users), context length (512–8 192 tokens), batch processing (1–32), 24 h sustained operation
Security SEC-1 … SEC-6 Encryption at rest, runtime memory inspection, TLS transport, model binding, key isolation, log safety
Serving SERV-1 … SERV-7 File integrity, vLLM health, TEE status, inference coherence, encryption-reality bypass proof, output fidelity, FHE overhead
Training T-1 … T-3 Secure checkpoints (LoRA/MoE), convergence, post-training inference quality (Extended POC)

Quick Start — Running from the GPU Machine

This is the most common setup: the test suite runs directly on the GPU server, connecting out to the remote TEE machine via HTTPS.

1. Install on the GPU machine

cd POC_TEST_SUITE/dk_test_suite
python3 -m venv env
source env/bin/activate
pip install -e .

2. Edit config/default.yaml

Set local_mode: true — the runner automatically sets gpu_host to 127.0.0.1 for vLLM HTTP calls and uses local subprocesses for all Docker/shell commands. No SSH key needed:

local_mode: true                   # no SSH key; gpu_host auto-set to 127.0.0.1
tee_host: "<TEE_IP>"               # TEE machine (remote)
tee_user_token: "<your-token>"
model_name: "Llama-3.2-1B-Instruct"
encrypted_model_name: "Llama-3.2-1B-Instruct-encrypted"
encrypted_model_id: "<UUID from fhenomai model list>"
model_path_clear: "/home/<user>/models/Llama-3.2-1B-Instruct"
model_path_encrypted: "/home/<user>/models/Llama-3.2-1B-Instruct-encrypted"

3. Run

# Option A — vLLM containers already running (from serving_cli_tutorial.ipynb)
dk-test run --skip-clear-vllm --skip-encrypted-vllm

# Option B — full suite, runner manages vLLM container lifecycle
docker rm -f dk_vllm_bench 2>/dev/null || true
dk-test run

# Option C — quick smoke test (serving + security only, ~5–10 min)
dk-test run -t serving -t security --skip-clear-vllm --skip-encrypted-vllm

# Option D — specific categories
dk-test run -t performance -t accuracy -t serving

# Override any config value from CLI
dk-test run --num-prompts 50 -v

4. View the report

# The HTML report is in:
ls results/poc_report_*.html

# Copy to your local machine for viewing:
scp root@<GPU_IP>:~/POC_TEST_SUITE/dk_test_suite/results/poc_report_*.html ./

Remote Runner Setup (Alternative)

If you run the suite from a separate machine (not the GPU server), do not set local_mode. Instead provide the GPU's external IP and an SSH key:

gpu_host: "34.162.1.145"
gpu_ssh_user: "root"
gpu_ssh_key: "/path/to/gpu.key"
tee_host: "<TEE_IP>"
tee_user_token: "<your-token>"

Configuration

All settings live in config/default.yaml. Override via (in order of precedence):

  1. CLI flags (--gpu-host, --tee-host, --model, --num-prompts, etc.)
  2. Custom YAML file (-c my_config.yaml)
  3. Environment variables (DK_GPU_HOST, DK_TEE_HOST, DK_MODEL_NAME, DK_OUTPUT_DIR)

Key parameters

Key Description
local_mode Set true when running on the GPU machine — uses subprocesses and auto-sets gpu_host to 127.0.0.1; no SSH key needed
gpu_host IP of the GPU server (only used when local_mode is false)
gpu_ssh_key Path to SSH private key for the GPU (only used when local_mode is false)
tee_host IP of the TEE machine
tee_user_token Bearer token for TEE inference auth
model_name Clear model HF name (e.g. Llama-3.2-1B-Instruct)
encrypted_model_name Name as registered in the TEE
encrypted_model_id UUID from fhenomai model list --show-details
model_path_clear Absolute path to clear model weights on GPU
model_path_encrypted Absolute path to encrypted model weights on GPU
num_prompts Benchmark prompt count (default: 100)
temperature Inference temperature (default: 0, deterministic)

Architecture

dk-test run
  │
  ├── Phase 1: Clear Model Tests
  │     ├── Start vLLM (clear model)
  │     ├── Run PERF / ACC / SCALE tests
  │     └── Stop vLLM
  │
  ├── Phase 2: Encrypted Model Tests  
  │     ├── Start vLLM (encrypted model)
  │     ├── Register with TEE
  │     ├── Run PERF / ACC / SCALE tests (via TEE)
  │     └── Stop vLLM
  │
  ├── Phase 3: Security Validation
  │     └── SEC-1 … SEC-6 (automated)
  │
  ├── Phase 4: Training Tests
  │     └── T-1 … T-3 (if enabled)
  │
  └── Generate HTML Report
        ├── Side-by-side comparison tables
        ├── Pass/fail per criterion
        └── Full configuration dump

Pass Conditions

PERF-*, ACC-*, SCALE-*, and T-* tests are measurement-only — they always PASS and document overhead. Hard pass/fail is applied to SEC-* and SERV-* tests only.

ID Pass Condition
PERF-1 Measurement — TTFT comparison (no threshold)
PERF-2 Measurement — throughput comparison (no threshold)
PERF-3 Measurement — E2E latency comparison (no threshold)
PERF-4 Measurement — disk size ratio (no threshold)
PERF-5 Measurement — VRAM delta (no threshold)
ACC-1 Measurement — exact match rate between clear and encrypted outputs
ACC-2 Measurement — lm-eval benchmark scores (SKIP if not installed)
ACC-3 Measurement — BERTScore F1 semantic similarity
ACC-4 Measurement — t-test p-value for response length distribution
ACC-5 Measurement — perplexity (encrypted TEE does not return logprobs — N/A)
SCALE-1–4 Measurement — throughput / latency at each load level
SEC-1 No plaintext weights or high-entropy data section on disk
SEC-2 No plaintext weight patterns in process memory maps
SEC-3 TEE endpoint uses HTTPS; no plaintext in network capture
SEC-4 Encrypted model cannot be loaded outside FHEnom
SEC-5 No key material in env vars, docker config, or accessible files
SEC-6 No plaintext data in container or system logs
SERV-1 Required model files present, total size ≥ 1.5 GB
SERV-2 vLLM responds HTTP 200 on /health
SERV-3 Encrypted model visible in TEE /v1/models
SERV-4 TEE outputs coherent English (space ratio ≥ 0.08)
SERV-5 Direct bypass of TEE produces garbled output (space ratio ≤ 0.04)
SERV-6 SequenceMatcher similarity ≥ 0.70 vs clear model for every probe
SERV-7 FHE enc overhead < 10 ms and dec overhead < 5 ms per token
T-1–3 SKIP when no dataset_path configured

Dependencies

Core: click, pyyaml, httpx, paramiko, rich, jinja2, numpy, scipy

Optional (accuracy): pip install dk-test-suite[accuracy]

  • lm-eval (EleutherAI LM Evaluation Harness)
  • deepeval (Confident AI)
  • bert-score

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dk_test_suite-1.0.0.tar.gz (113.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dk_test_suite-1.0.0-py3-none-any.whl (89.3 kB view details)

Uploaded Python 3

File details

Details for the file dk_test_suite-1.0.0.tar.gz.

File metadata

  • Download URL: dk_test_suite-1.0.0.tar.gz
  • Upload date:
  • Size: 113.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dk_test_suite-1.0.0.tar.gz
Algorithm Hash digest
SHA256 820d76c265a8add9703e6eebf37166a7473897c7d0efab3e5bc47e6dc11096fa
MD5 b3994be5265055de00c2e127c7b70d52
BLAKE2b-256 97f4ca3ffa65f941f27b5d9b3af3160c5b7ae969adceee5cfe55d564597ab7ba

See more details on using hashes here.

File details

Details for the file dk_test_suite-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: dk_test_suite-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 89.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dk_test_suite-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f5cbd6d0ecfefac44bf71277934cc49fdefba65739d51969e54df0e0bba305c0
MD5 ec4d08caf3511e613f2c0f0cf1524b4e
BLAKE2b-256 3c435ce8950eeee5560e963a41445539a75bc0cc524bb1c54c9bf3d629cce39e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page