Skip to main content

CAMP: Cumulative Agentic Masking and Pruning - session-aware PII protection for LLM pipelines

Project description

CAMP

Cumulative Agentic Masking and Pruning
Session-aware PII protection for LLM pipelines

PyPI License: MIT Python arXiv


CAMP tracks cumulative PII exposure across an entire conversation - not just a single message - and pseudonymizes the full history the moment risk crosses a configurable threshold. Real identities never leave your machine.


Table of Contents


How it works

Every conversation turn, CAMP runs a four-step pipeline entirely on-device:

  1. Extract - detects PII locally using Microsoft Presidio and spaCy NER, plus custom regex recognizers for financial and corporate data
  2. Graph - updates a co-occurrence graph where nodes are entity types and edges form when types appear together across turns
  3. Score - computes a Cumulative PII Exposure (CPE) score using the formula below
  4. Decide - takes one of three actions per turn
CPE(t) = Σ w(v) × (1 + α × degree(v))
Decision Condition Action
PASS CPE below threshold Send original text to LLM
PSEUDONYMIZE CPE crossed threshold Rewrite full conversation history with consistent synthetic identities
BLOCK Hard-block entity detected Redact immediately, regardless of CPE score

Hard-blocked types (always redacted): US_SSN, CREDIT_CARD, ACCOUNT_NUMBER


Installation

Requirements: Python 3.11+

pip install campii

CAMP uses spaCy for named entity recognition. Download the required model after installation:

python -m spacy download en_core_web_lg

Optional extras

Extra Command Adds
LangChain pip install campii[langchain] CAMPCallbackHandler, CAMPChain
Agent Framework pip install campii[agent-framework] CAMPAgentMiddleware
All integrations pip install campii[all] Everything above

Quick start

from camp import CAMPMasker

masker = CAMPMasker(threshold=2.0, alpha=0.3)

conversation = [
    "Hi, I need help with my bank account.",
    "My name is Michael Torres.",
    "I bank with Chase, account ending in 4872.",
    "I live in Austin, Texas.",
    "My SSN is 512-34-7891.",
]

for i, text in enumerate(conversation):
    result = masker.process_turn(text, turn_index=i)
    print(f"Turn {i}  [{result.decision:13}]  CPE={result.cpe_score:.2f}  |  {result.sent_to_llm}")

# Restore real identities in the LLM response before showing to the user
llm_response = "I can help you with that, Michael."
clean = masker.demask_response(llm_response)

Example output:

Turn 0  [PASS         ]  CPE=0.00  |  Hi, I need help with my bank account.
Turn 1  [PASS         ]  CPE=0.60  |  My name is Michael Torres.
Turn 2  [BLOCK        ]  CPE=1.55  |  I bank with Chase, account ending in [BLOCKED].
Turn 3  [PASS         ]  CPE=1.60  |  I live in Austin, Texas.
Turn 4  [BLOCK        ]  CPE=2.60  |  My SSN is [BLOCKED].

Integrations

Integration 1 - Any LLM callable

CAMPSession wraps any function that accepts a string and returns a string. No framework dependency required.

from camp import CAMPSession
import openai

client = openai.OpenAI()

def my_llm(prompt: str) -> str:
    return client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    ).choices[0].message.content

# Wrap once - protection is applied automatically on every call
session = CAMPSession.wrap(my_llm, threshold=2.0, alpha=0.3)

response = session.chat("My name is Sarah Johnson")
response = session.chat("I live in Denver, Colorado")
response = session.chat("My SSN is 512-34-7891")  # blocked, LLM is never called

print(f"CPE score : {session.cpe_score:.2f}")
print(f"Triggered : {session.triggered}")

Manual mode - manage the LLM call yourself:

result = session.process("My email is sarah@example.com")
raw    = my_llm(result.sent_to_llm)   # call LLM with masked text
clean  = session.demask(raw)           # restore real identity in the response

Integration 2 - LangChain

Requires pip install campii[langchain]

Option A - callback handler (attach to any existing chain or LLM):

from camp.integrations.langchain import CAMPCallbackHandler
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain

handler = CAMPCallbackHandler(threshold=2.0)
chain   = ConversationChain(llm=ChatOpenAI(model="gpt-4o"), callbacks=[handler])

chain.invoke({"input": "My name is Sarah Johnson"})
chain.invoke({"input": "I live in Denver, Colorado"})
chain.invoke({"input": "My SSN is 512-34-7891"})

print(f"CPE           : {handler.cpe_score:.2f}")
print(f"Last decision : {handler.last_result.decision}")

Option B - CAMPChain wrapper (one-liner setup):

from camp.integrations.langchain import CAMPChain

protected = CAMPChain.from_runnable(chain, threshold=2.0)
result    = protected.invoke({"input": "My SSN is 512-34-7891"})

print(protected.handler.triggered)

Integration 3 - Microsoft Agent Framework

Requires pip install campii[agent-framework]

Class-based middleware (recommended - maintains session state across all runs):

from camp.integrations.agent_framework import CAMPAgentMiddleware
from agent_framework import Agent
from agent_framework.foundry import FoundryChatClient
from azure.identity.aio import AzureCliCredential
import asyncio

async def main():
    async with (
        AzureCliCredential() as credential,
        Agent(
            client=FoundryChatClient(credential=credential),
            name="SupportAgent",
            instructions="You are a helpful customer support assistant.",
            middleware=[CAMPAgentMiddleware(threshold=2.0, alpha=0.3)],
        ) as agent,
    ):
        await agent.run("My name is Sarah Johnson")
        await agent.run("I live in Denver, Colorado")
        await agent.run("My SSN is 512-34-7891")
        # ^ Blocked before reaching the agent; returns a safe refusal message

        camp = agent.middleware[0]
        print(f"CPE score  : {camp.cpe_score:.2f}")
        print(f"Triggered  : {camp.triggered}")
        print(f"Pseudonyms : {camp.pseudonym_map}")

asyncio.run(main())

Function-based factory (lightweight, per-run):

from camp.integrations.agent_framework import create_camp_middleware

camp   = create_camp_middleware(threshold=1.5)
result = await agent.run("My name is Sarah Johnson", middleware=[camp])

Configuration

Constructor parameters

Parameter Default Description
threshold 2.0 CPE score at which pseudonymization triggers
alpha 0.3 Graph amplifier - controls how much entity co-occurrence raises the score
session_id "default" Session label used in the PII registry
redaction_map None Override default hard-block replacements
custom_patterns None Additional regex recognizers for domain-specific PII

Risk bands

CPE range Band
0.0 - 1.0 LOW
1.0 - 2.0 MODERATE
2.0 - 3.0 HIGH
3.0+ CRITICAL

Custom recognizers

Pass domain-specific patterns at construction time:

masker = CAMPMasker(
    threshold=2.0,
    custom_patterns=[
        {"entity": "EMPLOYEE_ID", "pattern": r"\bEMP-\d{6}\b", "score": 0.9},
        {"entity": "PROJECT_CODE", "pattern": r"\bPRJ-[A-Z]{3}-\d{4}\b", "score": 0.85},
    ],
)

Supported entity types

Category Entity types
Identity Person name, Date of birth, SSN, Driver license, Ethnicity
Contact Email address, Phone number, Location, IP address
Financial Credit card, Account number, IBAN, SWIFT/BIC, Crypto wallet, Transaction ID, US ITIN
Employment Salary, Age, Organization
Medical Medical condition
Corporate Financial amount, Financial metric, Internal projection, Confidential data

Development

git clone https://github.com/aman-panjwani/camp
cd camp
pip install -e ".[dev]"
python -m spacy download en_core_web_lg

Run the test suite:

# Unit tests (no spaCy model required - Presidio is mocked)
pytest tests/ -v

# With coverage report
pytest tests/ --cov=camp --cov-report=term-missing

Lint and type-check:

ruff check src/ tests/
mypy src/

Research

CAMP is the reference implementation for the following paper:

@article{panjwani2026camp,
  title   = {CAMP: Cumulative Agentic Masking and Pruning for Session-Aware PII Protection in LLM Pipelines},
  author  = {Panjwani, Aman},
  journal = {arXiv preprint},
  year    = {2026}
}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

campii-0.1.6.tar.gz (31.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

campii-0.1.6-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file campii-0.1.6.tar.gz.

File metadata

  • Download URL: campii-0.1.6.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for campii-0.1.6.tar.gz
Algorithm Hash digest
SHA256 3036700c07d3d0736a1abf2409e7dd7125475ffa13bd57e5ba21732173697f96
MD5 ed288d4b7f3f7a5c77319266e33788f3
BLAKE2b-256 6e88b29d462176f5d503c4d35d8396d096fff3cb37d5b7b90ce4881a03a8e8cf

See more details on using hashes here.

File details

Details for the file campii-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: campii-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 30.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for campii-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 9eccebfb7ccaee6ce9cb60a507461f6ad7d6638acdcebd343b1c215549085a4a
MD5 dbd090e2fb042310bfea78ae12f8db3b
BLAKE2b-256 42633c4b9feb5c8999950ed27b2eff18bc68944acc787369211e97cffcb192af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page