Metadata encoding and extraction for AI-generated content

These details have not been verified by PyPI

Project links

Project description

EncypherAI Logo

EncypherAI Core

A Python package for embedding and extracting metadata in text using Unicode variation selectors without affecting readability.

Overview

EncypherAI Core provides tools for invisibly encoding metadata (such as model information, timestamps, and custom data) into text generated by AI models. This enables:

Provenance tracking: Identify which AI model generated a piece of text
Timestamp verification: Know when text was generated
Custom metadata: Embed any additional information you need
Streaming support: Works with both streaming and non-streaming LLM outputs

The encoding is done using Unicode variation selectors, which are designed to specify alternative forms of characters without affecting text appearance or readability.

Installation

uv pip install encypher-ai

Quick Start

Basic Encoding and Decoding

from encypher.core.unicode_metadata import UnicodeMetadata
import time

# Encode metadata into text
encoded_text = UnicodeMetadata.embed_metadata(
    text="This is a sample text generated by an AI model.",
    model_id="gpt-4",
    timestamp=int(time.time()),  # Current Unix timestamp
    target="whitespace"  # Embed in whitespace characters
)

# Extract metadata from encoded text
metadata = UnicodeMetadata.extract_metadata(encoded_text)
print(f"Model: {metadata.get('model_id')}")
print(f"Timestamp: {metadata.get('timestamp')}")

Using MetadataEncoder (Alternative Method)

from encypher.core.metadata_encoder import MetadataEncoder
import time

# Initialize encoder with optional HMAC secret key
encoder = MetadataEncoder(secret_key="your-secret-key")

# Encode metadata
metadata = {
    "model_id": "gpt-4",
    "timestamp": int(time.time()),  # Current Unix timestamp
    "custom_field": "custom value"
}
encoded_text = encoder.encode_metadata(
    text="This is a sample text generated by an AI model.",
    metadata=metadata
)

# Decode and verify metadata
is_valid, extracted_metadata, clean_text = encoder.verify_text(encoded_text)
if is_valid:
    print(f"Model: {extracted_metadata.get('model_id')}")
    print(f"Timestamp: {extracted_metadata.get('timestamp')}")
    print(f"Custom field: {extracted_metadata.get('custom_field')}")

Streaming Support

from encypher.streaming.handlers import StreamingHandler

# Initialize streaming handler
handler = StreamingHandler(
    metadata={
        "model_id": "gpt-4",
        "custom_field": "custom value"
    },
    target="whitespace",
    encode_first_chunk_only=True  # Only encode the first non-empty chunk
)

# Process streaming chunks
chunks = [
    "This is ",
    "a sample ",
    "text generated ",
    "by an AI model."
]

for chunk in chunks:
    processed_chunk = handler.process_chunk(chunk)
    print(processed_chunk)  # Use in your streaming response

Configuration

from encypher.config.settings import Settings

# Load settings from environment variables and/or config file
settings = Settings(
    config_file="config.json",  # Optional
    env_prefix="ENCYPHER_"  # Environment variable prefix
)

# Get configuration values
metadata_target = settings.get_metadata_target()
hmac_secret_key = settings.get_hmac_secret_key()
encode_first_chunk_only = settings.get_encode_first_chunk_only()

Including Custom Metadata

from encypher.core.unicode_metadata import UnicodeMetadata
import time

# Include custom metadata along with required fields
encoded_text = UnicodeMetadata.embed_metadata(
    text="This is a sample text generated by an AI model.",
    model_id="gpt-4",
    timestamp=int(time.time()),  # Current Unix timestamp
    custom_metadata={
        "user_id": "user123",
        "session_id": "abc456",
        "context": {
            "source": "knowledge_base",
            "reference_id": "doc789"
        }
    }
)

# Later extract and use all metadata
is_valid, metadata = UnicodeMetadata.extract_metadata(encoded_text)
if is_valid:
    model = metadata["model_id"]  # "gpt-4"
    timestamp = metadata["timestamp"]  # Timestamp
    
    # Access custom metadata
    if "custom" in metadata:
        user_id = metadata["custom"]["user_id"]  # "user123"
        context = metadata["custom"]["context"]  # Nested object

Metadata Target Options

You can specify where to embed metadata using the target parameter:

whitespace: Embed in whitespace characters (default, least noticeable)
punctuation: Embed in punctuation marks
first_letter: Embed in the first letter of each word
last_letter: Embed in the last letter of each word
all_characters: Embed in all characters (not recommended)
none: Don't embed metadata (for testing/debugging)

Security Features

HMAC Authentication

EncypherAI uses HMAC (Hash-based Message Authentication Code) to ensure the security and integrity of embedded metadata:

Tamper Detection: Cryptographically verifies that metadata hasn't been modified
Authentication: Confirms metadata was created by an authorized source
Integrity Protection: Ensures the relationship between content and metadata remains intact

# Example of verifying metadata with HMAC
from encypher.core.unicode_metadata import UnicodeMetadata

encoder = UnicodeMetadata()  # Uses secret key from environment variable
encoded_text = "AI-generated text with embedded metadata..."

# Returns (is_valid, metadata)
is_valid, metadata = encoder.extract_metadata(encoded_text)

if is_valid:
    print(f"Verified metadata: {metadata}")
else:
    print("Warning: Metadata has been tampered with!")

For production use, set your HMAC secret key via the ENCYPHER_SECRET_KEY environment variable or pass it directly to the constructor.

FastAPI Integration

See the examples/fastapi_example.py for a complete example of integrating EncypherAI with FastAPI, including:

Encoding endpoint
Decoding endpoint
Streaming support

CLI Usage

The package includes a comprehensive command-line interface:

# Encode metadata into text
python -m encypher.examples.cli_example encode --text "This is a test" --model-id "gpt-4" --target "whitespace"

# Encode with custom metadata
python -m encypher.examples.cli_example encode --input-file input.txt --output-file output.txt --model-id "gpt-4" --custom-metadata '{"source": "test", "user_id": 123}'

# Decode metadata from text
python -m encypher.examples.cli_example decode --input-file encoded.txt --show-clean

# Decode with debug information
python -m encypher.examples.cli_example decode --text "Your encoded text here" --debug

Development and Contributing

We welcome contributions to EncypherAI! Please see CONTRIBUTING.md for detailed guidelines.

Code Style

EncypherAI follows PEP 8 style guidelines with Black as our code formatter. All code must pass Black formatting checks before being merged. We use pre-commit hooks to automate code formatting and quality checks.

To set up the development environment:

# Clone the repository
git clone https://github.com/encypherai/encypher-ai.git
cd encypher-ai

# Install development dependencies
uv pip install -e ".[dev]"

# Set up pre-commit hooks
pre-commit install

The pre-commit hooks will automatically:

Format your code with Black (including Jupyter notebooks)
Sort imports with isort
Check for common issues with flake8 and ruff
Perform type checking with mypy

You can also run the formatting tools manually:

# Format all Python files
black encypher

# Format Python files including Jupyter notebooks
black --jupyter encypher

Running Tests

# Run all tests
pytest

# Run tests with coverage
pytest --cov=encypher

License

EncypherAI is provided under a dual licensing model:

Open Source License (AGPL-3.0)

The core EncypherAI package is released under the GNU Affero General Public License v3.0 (AGPL-3.0). This license allows you to use, modify, and distribute the software freely, provided that:

You disclose the source code when you distribute the software
Any modifications you make are also licensed under AGPL-3.0
If you run a modified version of the software as a service (e.g., over a network), you must make the complete source code available to users of that service

Commercial License

For organizations that wish to incorporate EncypherAI into proprietary applications without the source code disclosure requirements of AGPL-3.0, we offer a commercial licensing option.

Benefits of the commercial license include:

Proprietary Integration: Use EncypherAI in closed-source applications without AGPL obligations
Legal Certainty: Clear licensing terms for commercial use
Support & Indemnification: Access to professional support and IP indemnification

For commercial licensing inquiries, please contact enterprise@encypherai.com.

See the LICENSE file for details of the AGPL-3.0 license.

Acknowledgments

Thanks to all contributors who have helped shape this project
Special thanks to the open-source community for their invaluable tools and libraries

Contact

For questions, feedback, or support, please open an issue on our GitHub repository.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.1.6

Apr 22, 2026

3.1.5

Apr 22, 2026

3.1.4

Apr 22, 2026

3.1.3

Apr 8, 2026

3.1.2

Apr 8, 2026

3.1.1

Apr 8, 2026

3.1.0

Apr 8, 2026

3.0.6

Feb 12, 2026

3.0.4

Jan 30, 2026

3.0.3

Jan 16, 2026

3.0.2

Jan 11, 2026

3.0.1

Jan 9, 2026

2.8.1

Sep 3, 2025

2.8.0

Aug 28, 2025

2.7.0

Jul 8, 2025

2.6.0

Jul 8, 2025

2.5.0

Jul 8, 2025

2.4.1

Jun 29, 2025

2.3.0

Jun 16, 2025

2.2.0

Jun 2, 2025

2.1.0

May 23, 2025

2.0.0

Apr 14, 2025

1.1.0

Mar 31, 2025

This version

1.0.0

Mar 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

encypher_ai-1.0.0.tar.gz (51.0 kB view details)

Uploaded Mar 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

encypher_ai-1.0.0-py3-none-any.whl (42.6 kB view details)

Uploaded Mar 22, 2025 Python 3

File details

Details for the file encypher_ai-1.0.0.tar.gz.

File metadata

Download URL: encypher_ai-1.0.0.tar.gz
Upload date: Mar 22, 2025
Size: 51.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for encypher_ai-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`1f91b6187defe48e2923d75ed31cca477e1eace47914b8b84483091bb2392ced`
MD5	`cc4e9adc5568aa402bf3b8089d625087`
BLAKE2b-256	`e345ce3a8675d5a9c953c425769fafb642bd3799e388afa1416630e1da9439a5`

See more details on using hashes here.

File details

Details for the file encypher_ai-1.0.0-py3-none-any.whl.

File metadata

Download URL: encypher_ai-1.0.0-py3-none-any.whl
Upload date: Mar 22, 2025
Size: 42.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for encypher_ai-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3434ad257db278cd06903997f702e0f466faab72387f639a9e741b67c7f9c901`
MD5	`a48f488cecc30107d768073141d53ce4`
BLAKE2b-256	`87fe3032aac36c13aa2192e55474aad3cda56cb53087086d95e8d33031b955cd`

See more details on using hashes here.

encypher-ai 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

EncypherAI Core

Overview

Installation

Quick Start

Basic Encoding and Decoding

Using MetadataEncoder (Alternative Method)

Streaming Support

Configuration

Including Custom Metadata

Metadata Target Options

Security Features

HMAC Authentication

FastAPI Integration

CLI Usage

Development and Contributing

Code Style

Running Tests

License

Open Source License (AGPL-3.0)

Commercial License

Acknowledgments

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes