Skip to main content

CAMP: Cumulative Agentic Masking and Pruning - session-aware PII protection for LLM pipelines

Project description

CAMP

Cumulative Agentic Masking and Pruning
Session-aware PII protection for LLM pipelines

PyPI License: MIT Python arXiv


CAMP tracks cumulative PII exposure across an entire conversation - not just a single message - and pseudonymizes the full history the moment risk crosses a configurable threshold. Real identities never leave your machine.


Table of Contents


How it works

Every conversation turn, CAMP runs a four-step pipeline entirely on-device:

  1. Extract - detects PII locally using Microsoft Presidio and spaCy NER, plus custom regex recognizers for financial and corporate data
  2. Graph - updates a co-occurrence graph where nodes are entity types and edges form when types appear together across turns
  3. Score - computes a Cumulative PII Exposure (CPE) score using the formula below
  4. Decide - takes one of three actions per turn
CPE(t) = Σ w(v) × (1 + α × degree(v))
Decision Condition Action
PASS CPE below threshold Send original text to LLM
PSEUDONYMIZE CPE crossed threshold Rewrite full conversation history with consistent synthetic identities
BLOCK Hard-block entity detected Redact immediately, regardless of CPE score

Hard-blocked types (always redacted): US_SSN, CREDIT_CARD, ACCOUNT_NUMBER


Installation

Requirements: Python 3.11+

pip install campii

CAMP uses spaCy for named entity recognition. Download the required model after installation:

python -m spacy download en_core_web_lg

Optional extras

Extra Command Adds
LangChain pip install campii[langchain] CAMPCallbackHandler, CAMPChain
Agent Framework pip install campii[agent-framework] CAMPAgentMiddleware
All integrations pip install campii[all] Everything above

Quick start

from camp import CAMPMasker

masker = CAMPMasker(threshold=2.0, alpha=0.3)

conversation = [
    "Hi, I need help with my bank account.",
    "My name is Michael Torres.",
    "I bank with Chase, account ending in 4872.",
    "I live in Austin, Texas.",
    "My SSN is 512-34-7891.",
]

for i, text in enumerate(conversation):
    result = masker.process_turn(text, turn_index=i)
    print(f"Turn {i}  [{result.decision:13}]  CPE={result.cpe_score:.2f}  |  {result.sent_to_llm}")

# Restore real identities in the LLM response before showing to the user
llm_response = "I can help you with that, Michael."
clean = masker.demask_response(llm_response)

Example output:

Turn 0  [PASS         ]  CPE=0.00  |  Hi, I need help with my bank account.
Turn 1  [PASS         ]  CPE=0.60  |  My name is Michael Torres.
Turn 2  [BLOCK        ]  CPE=1.55  |  I bank with Chase, account ending in [BLOCKED].
Turn 3  [PASS         ]  CPE=1.60  |  I live in Austin, Texas.
Turn 4  [BLOCK        ]  CPE=2.60  |  My SSN is [BLOCKED].

Integrations

Integration 1 - Any LLM callable

CAMPSession wraps any function that accepts a string and returns a string. No framework dependency required.

from camp import CAMPSession
import openai

client = openai.OpenAI()

def my_llm(prompt: str) -> str:
    return client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    ).choices[0].message.content

# Wrap once - protection is applied automatically on every call
session = CAMPSession.wrap(my_llm, threshold=2.0, alpha=0.3)

response = session.chat("My name is Sarah Johnson")
response = session.chat("I live in Denver, Colorado")
response = session.chat("My SSN is 512-34-7891")  # blocked, LLM is never called

print(f"CPE score : {session.cpe_score:.2f}")
print(f"Triggered : {session.triggered}")

Manual mode - manage the LLM call yourself:

result = session.process("My email is sarah@example.com")
raw    = my_llm(result.sent_to_llm)   # call LLM with masked text
clean  = session.demask(raw)           # restore real identity in the response

Integration 2 - LangChain

Requires pip install campii[langchain]

Option A - callback handler (attach to any existing chain or LLM):

from camp.integrations.langchain import CAMPCallbackHandler
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain

handler = CAMPCallbackHandler(threshold=2.0)
chain   = ConversationChain(llm=ChatOpenAI(model="gpt-4o"), callbacks=[handler])

chain.invoke({"input": "My name is Sarah Johnson"})
chain.invoke({"input": "I live in Denver, Colorado"})
chain.invoke({"input": "My SSN is 512-34-7891"})

print(f"CPE           : {handler.cpe_score:.2f}")
print(f"Last decision : {handler.last_result.decision}")

Option B - CAMPChain wrapper (one-liner setup):

from camp.integrations.langchain import CAMPChain

protected = CAMPChain.from_runnable(chain, threshold=2.0)
result    = protected.invoke({"input": "My SSN is 512-34-7891"})

print(protected.handler.triggered)

Integration 3 - Microsoft Agent Framework

Requires pip install campii[agent-framework]

Class-based middleware (recommended - maintains session state across all runs):

from camp.integrations.agent_framework import CAMPAgentMiddleware
from agent_framework import Agent
from agent_framework.foundry import FoundryChatClient
from azure.identity.aio import AzureCliCredential
import asyncio

async def main():
    async with (
        AzureCliCredential() as credential,
        Agent(
            client=FoundryChatClient(credential=credential),
            name="SupportAgent",
            instructions="You are a helpful customer support assistant.",
            middleware=[CAMPAgentMiddleware(threshold=2.0, alpha=0.3)],
        ) as agent,
    ):
        await agent.run("My name is Sarah Johnson")
        await agent.run("I live in Denver, Colorado")
        await agent.run("My SSN is 512-34-7891")
        # ^ Blocked before reaching the agent; returns a safe refusal message

        camp = agent.middleware[0]
        print(f"CPE score  : {camp.cpe_score:.2f}")
        print(f"Triggered  : {camp.triggered}")
        print(f"Pseudonyms : {camp.pseudonym_map}")

asyncio.run(main())

Function-based factory (lightweight, per-run):

from camp.integrations.agent_framework import create_camp_middleware

camp   = create_camp_middleware(threshold=1.5)
result = await agent.run("My name is Sarah Johnson", middleware=[camp])

Configuration

Constructor parameters

Parameter Default Description
threshold 2.0 CPE score at which pseudonymization triggers
alpha 0.3 Graph amplifier - controls how much entity co-occurrence raises the score
session_id "default" Session label used in the PII registry
redaction_map None Override default hard-block replacements
extra_patterns None Additional regex recognizers for domain-specific PII

Risk bands

CPE range Band
0.0 - 1.0 LOW
1.0 - 2.0 MODERATE
2.0 - 3.0 HIGH
3.0+ CRITICAL

Custom recognizers

Pass domain-specific patterns at construction time:

masker = CAMPMasker(
    threshold=2.0,
    extra_patterns=[
        {"entity": "EMPLOYEE_ID", "pattern": r"\bEMP-\d{6}\b", "score": 0.9},
        {"entity": "PROJECT_CODE", "pattern": r"\bPRJ-[A-Z]{3}-\d{4}\b", "score": 0.85},
    ],
)

Supported entity types

Category Entity types
Identity Person name, Date of birth, SSN, Driver license, Ethnicity
Contact Email address, Phone number, Location, IP address
Financial Credit card, Account number, IBAN, SWIFT/BIC, Crypto wallet, Transaction ID, US ITIN
Employment Salary, Age, Organization
Medical Medical condition
Corporate Financial amount, Financial metric, Internal projection, Confidential data

Development

git clone https://github.com/aman-panjwani/camp
cd camp
pip install -e ".[dev]"
python -m spacy download en_core_web_lg

Run the test suite:

# Unit tests (no spaCy model required - Presidio is mocked)
pytest tests/ -v

# With coverage report
pytest tests/ --cov=camp --cov-report=term-missing

Lint and type-check:

ruff check src/ tests/
mypy src/

Research

CAMP is the reference implementation for the following paper:

@article{panjwani2026camp,
  title   = {CAMP: Cumulative Agentic Masking and Pruning for Session-Aware PII Protection in LLM Pipelines},
  author  = {Panjwani, Aman},
  journal = {arXiv preprint},
  year    = {2026}
}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

campii-0.1.3.tar.gz (31.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

campii-0.1.3-py3-none-any.whl (30.4 kB view details)

Uploaded Python 3

File details

Details for the file campii-0.1.3.tar.gz.

File metadata

  • Download URL: campii-0.1.3.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for campii-0.1.3.tar.gz
Algorithm Hash digest
SHA256 3199b880f6d1f9645929e5869b01da5b2ec199f10343b4de0fcf9c36a39de845
MD5 be85e8a933affd187ed025b8620d0b2f
BLAKE2b-256 0be5e16e8e58e94f89511949e039f4d00b997a7d739dbb8c58727082e203cda4

See more details on using hashes here.

File details

Details for the file campii-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: campii-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 30.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for campii-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fd344b451e59cfcf8180be13bcc931a77651c98704dfe558e96b4257a3b80b65
MD5 19329f433443b13230c2e6665e95d857
BLAKE2b-256 c2fd4ef81cc27eee08259f0116f849a14d2bdebfbe78f9918ad6f8a14b028435

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page