Skip to main content

Causal-Multimodal Engine for Creative Performance Attribution

Project description

OmniProof

The engine that sees your creatives (Gemini Embedding 2), extracts their DNA automatically, and proves what causes performance.
Not vibes. Not correlations. Proof.

Release Python Stars Pull Requests MIT License

Installation · Quick Start · How It Works · Playground · Features · API · Contributing


OmniProof is an open-source Python engine that answers why creative assets perform differently. It replaces gut-feel marketing analytics with rigorous causal inference -- moving from "ads with blue backgrounds got more clicks" to "blue backgrounds cause a +12% CTR uplift for the 18-24 segment, controlling for platform, budget, and seasonality."

It combines Gemini Embedding 2 for native multimodal understanding, Double Machine Learning for causal estimation, and RAG-based brand compliance into a single, modular pipeline.

Highlights

  • Causal Engine -- DML + refutation tests isolate true treatment effects from confounders. Not correlations.
  • Multimodal Embeddings -- Gemini Embedding 2 maps video, images, audio, PDFs, and text into a shared 3072-dim space.
  • Brand Intelligence -- Extract structured brand guidelines from any asset, then auto-check new creatives for compliance.
  • DICE-DML -- Disentangle visual confounders from treatment signals using counterfactual embedding pairs.
  • Creative Generation -- Causal insights feed directly into optimized creative prompts.
  • REST API -- 12 endpoints covering brand extraction, compliance, causal analysis, and generation.
  • Modular -- Use the full pipeline or any layer independently as a library.

Installation

pip install omni-proof

Or install from source:

git clone https://github.com/navidgh66/omni_proof.git
cd omni_proof
pip install -e ".[dev]"

Requires Python 3.11+. For the full pipeline you'll need a Gemini API key and a Pinecone account. The causal analysis layer works with local data only -- no API keys needed.

Quick Start

import pandas as pd
from pathlib import Path
from omni_proof import BrandExtractor, ComplianceChain, DMLEstimator, GeminiClient, Settings
from omni_proof.storage.memory_store import InMemoryVectorStore
from omni_proof.rag.brand_retriever import BrandRetriever

settings = Settings(gemini_api_key="AIza...", pinecone_api_key="pcsk_...",
                    pinecone_index_host="https://my-index.svc.pinecone.io")
client = GeminiClient(api_key=settings.gemini_api_key)
store = InMemoryVectorStore()

# 1. Extract brand identity from assets
extractor = BrandExtractor(embedding_provider=client, gemini_client=client, vector_store=store)
profile = await extractor.extract("AcmeCorp", [Path("brand_guide.pdf"), Path("logo.png")])

# 2. Check a new creative for brand compliance
retriever = BrandRetriever(gemini_client=client, vector_store=store)
chain = ComplianceChain(gemini_client=client, brand_retriever=retriever)
report = await chain.check_compliance("ad_001", Path("new_ad.jpg"))
print(f"Compliant: {report.passed} (score: {report.score})")

# 3. Estimate causal effect of a creative feature (no API keys needed)
data = pd.read_csv("campaign_data.csv")
estimator = DMLEstimator(cv=5, n_estimators=50)
ate = estimator.estimate_ate(data, "fast_pacing", "ctr", ["platform", "audience_segment", "budget"])
print(f"ATE: {ate.ate:+.3f} (p={ate.p_value:.4f})")

Or start the API server:

uvicorn omni_proof.api.app:create_app --factory --reload
curl localhost:8000/health  # {"status": "ok"}

How It Works

  Upload creatives (video, image, PDF, audio)
          |
          v
  +---------------------------+
  |  Gemini Embedding 2       |  3072-dim multimodal embeddings
  |  Gemini 3.1 Flash Lite         |  Structured feature extraction
  +------------+--------------+
               |
        +------+------+
        v             v
    Pinecone       SQL DB
    (vectors)      (metadata + outcomes)
        |             |
        +------+------+
               v
  +---------------------------+
  |  Causal Engine             |  DAG -> Identify -> DML -> Refute
  |  (DoWhy + EconML)          |  DICE-DML for visual embeddings
  +------------+--------------+
               |
        +------+-----------+
        v                  v
  Brand Compliance     Creative Generation
  (RAG retrieval)      (causal-informed prompts)

Configuration

Set environment variables with the OMNI_PROOF_ prefix, or pass them programmatically:

OMNI_PROOF_GEMINI_API_KEY=AIza...
OMNI_PROOF_PINECONE_API_KEY=pcsk_...
OMNI_PROOF_PINECONE_INDEX_HOST=https://my-index-abc123.svc.pinecone.io
OMNI_PROOF_DATABASE_URL=sqlite+aiosqlite:///./omni_proof.db  # default
Variable Required For Where to Get
OMNI_PROOF_GEMINI_API_KEY Embeddings + extraction Google AI Studio
OMNI_PROOF_PINECONE_API_KEY Vector storage Pinecone Console
OMNI_PROOF_PINECONE_INDEX_HOST Vector storage Pinecone Console
OMNI_PROOF_DATABASE_URL Relational storage PostgreSQL or SQLite URI

Architecture

OmniProof is organized into five layers. Each can be used independently:

Layer Module Key Classes
Ingestion omni_proof.ingestion GeminiClient, AssetPreprocessor, IngestPipeline
Storage omni_proof.storage PineconeVectorStore, InMemoryVectorStore, RelationalStore
Causal omni_proof.causal CausalDAGBuilder, DMLEstimator, CausalRefuter, VisualDMLEstimator
Orchestration omni_proof.orchestration ComplianceChain, InsightSynthesizer, BrandExtractor
API omni_proof.api FastAPI app, routes, GenerativePromptBuilder

Key Abstractions

Interface Purpose Implementations
EmbeddingProvider Generate embeddings from any content GeminiClient
VectorStore Store and search vectors PineconeVectorStore, InMemoryVectorStore
Estimator Estimate causal effects DMLEstimator

Hands-On Playground

The examples/playground.ipynb notebook walks through every OmniProof capability interactively — from embedding creatives into Pinecone, to causal analysis, to DICE-DML on A/B video pairs.

Walkthrough What it shows API keys?
0. Offline Demo All 9 pipeline stages in ~4s No
1. Embed into Pinecone Gemini Embedding 2 → Pinecone storage Yes
2. Extract Metadata Structured creative metadata via Flash Lite Yes
3. Brand Identity Multimodal brand profile extraction Yes
4. Brand Compliance RAG: embed guidelines → retrieve → evaluate Yes
5. Causal Analysis DAG → DML → ATE/CATE → refutation No
6. Embeddings + Causal Merge Pinecone embeddings with campaign data Yes
7. DICE-DML Counterfactual pairs → treatment fingerprint → visual ATE Yes
8. Design Brief Causal insights → creative prompt generation No
9. API Server FastAPI endpoints for all capabilities Yes

To run locally:

pip install -e ".[dev]"
jupyter notebook examples/playground.ipynb

Or try the offline demo without any API keys:

python examples/demo.py

What's in examples/

examples/
  creatives/
    runner_sunrise_*.png          # 10 image creatives (one per concept)
    trail_epic_*.png
    hiit_studio_*.png
    ...
    runner_sunrise_fast_pacing_A.mp4   # A/B video variants
    runner_sunrise_slow_pacing_B.mp4   #   (fast_pacing treatment)
    basketball_court_fast_pacing_A.mp4
    basketball_court_slow_pacing_B.mp4
  data/
    campaign_performance.csv      # 1,000 rows with planted causal effects
    brand_profile.json            # Velocity Sportswear brand profile
    brand_guidelines.json         # 12 brand rules for RAG
    compliance_samples.json       # 5 compliance reports (PASS/WARN/FAIL)
    creative_metadata_samples.json # 14 records (10 images + 4 videos)

API Reference

Method Endpoint Description
GET /health Health check
POST /api/v1/brand/extract Extract brand profile from uploaded assets
POST /api/v1/brand/update/{id} Update brand profile with new assets
GET /api/v1/brand/profile/{id} Retrieve a brand profile
POST /api/v1/compliance/check Check creative for brand compliance
GET /api/v1/compliance/reports Historical compliance reports
POST /api/v1/causal/analyze Trigger causal analysis
GET /api/v1/causal/effects List estimated causal effects
GET /api/v1/causal/effects/{treatment} CATE breakdown by segment
GET /api/v1/insights/briefs Design briefs from causal data
GET /api/v1/insights/segments Effects by audience segment
POST /api/v1/generative/prompt Generate optimized creative prompt

Causal Methodology

OmniProof implements a four-stage causal pipeline:

  1. Model -- Build a DAG mapping treatments, outcomes, and confounders
  2. Identify -- Apply the backdoor criterion to find valid adjustment sets
  3. Estimate -- Double Machine Learning (Neyman orthogonalization) via EconML
  4. Refute -- Placebo tests, subset validation, and random confounder checks

For visual embeddings where treatment and confounders are entangled, DICE-DML generates counterfactual pairs, isolates treatment fingerprints via vector subtraction, and applies orthogonal projection before estimation.

Gemini Embedding 2

All modalities map to the same 3072-dimensional semantic space via Gemini Embedding 2:

Modality Limit
Text 8,192 tokens
Images 6 per request
Video 80s (with audio) / 120s (without)
Audio 80s
PDF 1 document, 6 pages
Output 3,072 dims (Matryoshka: truncate to 1536 / 768 / 128)

Tech Stack

Component Technology
Embeddings Gemini Embedding 2
Structured extraction Gemini 3.1 Flash Lite
Vector DB Pinecone Serverless
Relational DB PostgreSQL / SQLite
Causal inference DoWhy + EconML
API FastAPI
ML models LightGBM
Schemas Pydantic v2 + SQLAlchemy 2.0

Testing

pytest tests/unit/ -v               # 150 unit tests
pytest tests/integration/ -v        # 53 integration tests
pytest tests/ -v                    # All 203 tests
ruff check src/ tests/              # Lint

Star History

Star History Chart

Contributing

See CONTRIBUTING.md for development setup, testing, and PR guidelines.

License

MIT -- OmniProof Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omni_proof-0.0.2.tar.gz (24.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omni_proof-0.0.2-py3-none-any.whl (44.4 kB view details)

Uploaded Python 3

File details

Details for the file omni_proof-0.0.2.tar.gz.

File metadata

  • Download URL: omni_proof-0.0.2.tar.gz
  • Upload date:
  • Size: 24.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omni_proof-0.0.2.tar.gz
Algorithm Hash digest
SHA256 ead077b778c2658c96c225208c26fdfc1b07bd2dbfb93e1223c6af7319cecc48
MD5 f317d0a944ecd251df114290238a97a0
BLAKE2b-256 316522b6497300815d3aea05f1cbcb2d323f7ad2680e705e8bea265b318d67ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for omni_proof-0.0.2.tar.gz:

Publisher: release.yml on navidgh66/omni_proof

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file omni_proof-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: omni_proof-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 44.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omni_proof-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 673ec401db1a8788ac6c698ffcef2f3a0720ccc1f87fc2f04d65311d63cd2702
MD5 0f6f865080960d36298a83001ac000f4
BLAKE2b-256 cd6b96fbb83cf691c82abc1ca1368bec5792d68c49d46497abe52f32f70d5cb5

See more details on using hashes here.

Provenance

The following attestation bundles were made for omni_proof-0.0.2-py3-none-any.whl:

Publisher: release.yml on navidgh66/omni_proof

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page