Skip to main content

Privacy-preserving proxy for contributing local AI conversations to an open training commons

Project description

Common Parlance

A privacy-preserving tool for contributing your AI conversations to an open research dataset on HuggingFace.

What it does

Import conversations you already have, or capture new ones through a local proxy. Everything is scrubbed for PII on your machine, reviewed by you, then uploaded to the Common Parlance dataset.

[Your conversation exports]    [AI Client] → [Proxy :11435] → [Local Model]
            ↓                                      ↓
    common-parlance import               Automatic capture
                        ↘                ↙
                    Local SQLite database
                            ↓
                    PII scrubbing (local)
                            ↓
                    Your review & approval
                            ↓
                    Upload → Server NER
                            ↓
                    Published to dataset

What scrubbed data looks like

Before: My friend Alice Smith at alice@gmail.com helped me set up
        the server at 192.168.1.100 in /Users/john/projects/

After:  My friend [NAME_1] at [EMAIL] helped me set up
        the server at [IP] in [PATH]

Works with

Import from: ChatGPT exports, Claude exports, Open WebUI, Jan.ai, SillyTavern, oobabooga, OpenAI messages JSONL, ShareGPT format

Proxy captures from: Ollama, llama.cpp, vLLM, LM Studio, LocalAI, koboldcpp, or any OpenAI-compatible local endpoint

Quick Start

Requires Python 3.11+.

1. Install

# With uv (recommended)
uv tool install common-parlance

# Or with pipx
pipx install common-parlance

# Or with pip
pip install --user common-parlance

This installs the common-parlance command on your PATH.

2. Register

common-parlance register

This opens your browser for a Cloudflare Turnstile verification (no account or email needed), then saves an anonymous API key to your local config.

3. Consent

common-parlance consent --grant

Read and agree to the contribution terms. You can revoke anytime with common-parlance consent --revoke.

Note: Revoking consent also purges all local conversation data.

4. Contribute conversations

Option A — Import existing conversations:

common-parlance import ~/Downloads/chatgpt-export.zip
common-parlance import conversations.jsonl
common-parlance import ~/jan/threads/
common-parlance import ~/.open-webui/data/webui.db

Format is auto-detected. Use --dry-run to preview without importing.

Option B — Capture live conversations via proxy:

common-parlance proxy

# Or run in the background
nohup common-parlance proxy > /dev/null 2>&1 &

# Or install as a service that starts on login
common-parlance startup --enable

Point your AI client at http://localhost:11435 instead of the usual model URL.

Connecting your client:

Client How to connect
Open WebUI Settings → Connections → change 11434 to 11435
Any OpenAI-compatible app Set base URL to http://localhost:11435/v1

Note: The proxy sits between your chat client and Ollama. For clients with configurable URLs (like Open WebUI), just change the port. For ollama run, use transparent mode — move Ollama to a different port and let the proxy take the default:

OLLAMA_HOST=127.0.0.1:11436 open -a Ollama   # macOS (or set in systemd on Linux)
common-parlance proxy --port 11434 --upstream http://localhost:11436

5. Process, review, upload

common-parlance process    # scrub PII, run audit
common-parlance review     # approve/reject/edit each conversation
common-parlance upload     # send approved conversations to the dataset
common-parlance status     # check pipeline counts

During review you can manually redact additional text by pressing e (edit) — selected text is replaced with [REDACTED].

If you're using the proxy, background uploads run automatically every 24 hours for approved conversations.

Privacy

  • Opt-in only: nothing is captured or uploaded without your explicit consent
  • Scrubbed locally: PII removal happens on your machine before anything leaves
  • Server-side NER: a second pass catches names and locations that regex misses (Presidio + spaCy)
  • Anonymous: no user ID, device fingerprint, or metadata in the published dataset
  • Inspectable: your local data is a SQLite database you can query directly (sqlite3 ~/.local/share/common-parlance/conversations.db)

What gets uploaded

  • Human and assistant conversation turns only
  • PII replaced with typed placeholders ([NAME_1], [EMAIL], [PHONE], etc.)

What gets stripped

  • Model names and engine metadata
  • System prompts
  • Token counts, timing, performance data
  • IP addresses, user agents, all client metadata

Configuration

Settings are stored at ~/.config/common-parlance/config.json.

common-parlance config                          # view all
common-parlance config upstream http://myhost:8080  # set a value
Key Default Description
upstream http://localhost:11434 Your local model endpoint
port 11435 Port the proxy listens on
auto_approve false Skip review, auto-approve all scrubbed conversations
upload_interval_hours 24 How often background uploads run

Advanced

Local NER (optional)

Server-side NER handles name detection before publishing. For an extra local scrubbing layer:

uv pip install presidio-analyzer presidio-anonymizer spacy
python -m spacy download en_core_web_lg

The first time you run process, it will ask if you'd like to set this up.

Watch mode

# Re-scan a directory for new exports every 60 minutes
common-parlance import ~/Downloads/ --watch 60

# Install as a system service that survives restarts
common-parlance import ~/Downloads/ --watch 60 --daemon

Auto-start on login

common-parlance startup --enable   # launchd (macOS), systemd (Linux)
common-parlance startup --disable

License

The code is licensed under Apache-2.0.

The dataset is licensed under ODC-BY 1.0 (Open Data Commons Attribution) — use the data freely for any purpose with attribution. See COVENANT.md for the community request to keep model weights open.

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

common_parlance-0.1.0.tar.gz (278.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

common_parlance-0.1.0-py3-none-any.whl (74.2 kB view details)

Uploaded Python 3

File details

Details for the file common_parlance-0.1.0.tar.gz.

File metadata

  • Download URL: common_parlance-0.1.0.tar.gz
  • Upload date:
  • Size: 278.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for common_parlance-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d1ea206ccec665b6ea34d93e9d4511e2e83f2528bf5929278b15f8ab553d331d
MD5 13bdea75506f7848813b744260288481
BLAKE2b-256 882f67c90a5e0badedd936e9916fa0ad293597d19aef9a006f500962f86200aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for common_parlance-0.1.0.tar.gz:

Publisher: ci.yml on common-parlance/common-parlance

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file common_parlance-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for common_parlance-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e2cc2269669085219488dd4d82938e11c2c7f9dd5c70ab43c5276fa03cb4c258
MD5 c18216b641b462b012bb0e48c20e38a3
BLAKE2b-256 09cd5900cb6bed7bbbd0f39e4afecf803554cfede43ac09572dca46f80e18114

See more details on using hashes here.

Provenance

The following attestation bundles were made for common_parlance-0.1.0-py3-none-any.whl:

Publisher: ci.yml on common-parlance/common-parlance

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page