Skip to main content

Metadata encoding and extraction for AI-generated content

Project description

EncypherAI Logo

EncypherAI Core

License: AGPL v3 Python 3.8+ Documentation

A Python package for embedding and extracting metadata in text using Unicode variation selectors without affecting readability.

Overview

EncypherAI Core provides tools for invisibly encoding metadata (such as model information, timestamps, and custom data) into text generated by AI models. This enables:

  • Provenance tracking: Identify which AI model generated a piece of text
  • Timestamp verification: Know when text was generated
  • Custom metadata: Embed any additional information you need
  • Streaming support: Works with both streaming and non-streaming LLM outputs

The encoding is done using Unicode variation selectors, which are designed to specify alternative forms of characters without affecting text appearance or readability.

Installation

uv pip install encypher-ai

Quick Start

Basic Encoding and Decoding

from encypher.core.unicode_metadata import UnicodeMetadata
import time

# Encode metadata into text
encoded_text = UnicodeMetadata.embed_metadata(
    text="This is a sample text generated by an AI model.",
    model_id="gpt-4",
    timestamp=int(time.time()),  # Current Unix timestamp
    target="whitespace"  # Embed in whitespace characters
)

# Extract metadata from encoded text
metadata = UnicodeMetadata.extract_metadata(encoded_text)
print(f"Model: {metadata.get('model_id')}")
print(f"Timestamp: {metadata.get('timestamp')}")

Using MetadataEncoder (Alternative Method)

from encypher.core.metadata_encoder import MetadataEncoder
import time

# Initialize encoder with optional HMAC secret key
encoder = MetadataEncoder(secret_key="your-secret-key")

# Encode metadata
metadata = {
    "model_id": "gpt-4",
    "timestamp": int(time.time()),  # Current Unix timestamp
    "custom_field": "custom value"
}
encoded_text = encoder.encode_metadata(
    text="This is a sample text generated by an AI model.",
    metadata=metadata
)

# Decode and verify metadata
is_valid, extracted_metadata, clean_text = encoder.verify_text(encoded_text)
if is_valid:
    print(f"Model: {extracted_metadata.get('model_id')}")
    print(f"Timestamp: {extracted_metadata.get('timestamp')}")
    print(f"Custom field: {extracted_metadata.get('custom_field')}")

Streaming Support

from encypher.streaming.handlers import StreamingHandler

# Initialize streaming handler
handler = StreamingHandler(
    metadata={
        "model_id": "gpt-4",
        "custom_field": "custom value"
    },
    target="whitespace",
    encode_first_chunk_only=True  # Only encode the first non-empty chunk
)

# Process streaming chunks
chunks = [
    "This is ",
    "a sample ",
    "text generated ",
    "by an AI model."
]

for chunk in chunks:
    processed_chunk = handler.process_chunk(chunk)
    print(processed_chunk)  # Use in your streaming response

Configuration

from encypher.config.settings import Settings

# Load settings from environment variables and/or config file
settings = Settings(
    config_file="config.json",  # Optional
    env_prefix="ENCYPHER_"  # Environment variable prefix
)

# Get configuration values
metadata_target = settings.get_metadata_target()
hmac_secret_key = settings.get_hmac_secret_key()
encode_first_chunk_only = settings.get_encode_first_chunk_only()

Including Custom Metadata

from encypher.core.unicode_metadata import UnicodeMetadata
import time

# Include custom metadata along with required fields
encoded_text = UnicodeMetadata.embed_metadata(
    text="This is a sample text generated by an AI model.",
    model_id="gpt-4",
    timestamp=int(time.time()),  # Current Unix timestamp
    custom_metadata={
        "user_id": "user123",
        "session_id": "abc456",
        "context": {
            "source": "knowledge_base",
            "reference_id": "doc789"
        }
    }
)

# Later extract and use all metadata
is_valid, metadata = UnicodeMetadata.extract_metadata(encoded_text)
if is_valid:
    model = metadata["model_id"]  # "gpt-4"
    timestamp = metadata["timestamp"]  # Timestamp
    
    # Access custom metadata
    if "custom" in metadata:
        user_id = metadata["custom"]["user_id"]  # "user123"
        context = metadata["custom"]["context"]  # Nested object

Metadata Target Options

You can specify where to embed metadata using the target parameter:

  • whitespace: Embed in whitespace characters (default, least noticeable)
  • punctuation: Embed in punctuation marks
  • first_letter: Embed in the first letter of each word
  • last_letter: Embed in the last letter of each word
  • all_characters: Embed in all characters (not recommended)
  • none: Don't embed metadata (for testing/debugging)

Security Features

HMAC Authentication

EncypherAI uses HMAC (Hash-based Message Authentication Code) to ensure the security and integrity of embedded metadata:

  • Tamper Detection: Cryptographically verifies that metadata hasn't been modified
  • Authentication: Confirms metadata was created by an authorized source
  • Integrity Protection: Ensures the relationship between content and metadata remains intact
# Example of verifying metadata with HMAC
from encypher.core.unicode_metadata import UnicodeMetadata

encoder = UnicodeMetadata()  # Uses secret key from environment variable
encoded_text = "AI-generated text with embedded metadata..."

# Returns (is_valid, metadata)
is_valid, metadata = encoder.extract_metadata(encoded_text)

if is_valid:
    print(f"Verified metadata: {metadata}")
else:
    print("Warning: Metadata has been tampered with!")

For production use, set your HMAC secret key via the ENCYPHER_SECRET_KEY environment variable or pass it directly to the constructor.

FastAPI Integration

See the examples/fastapi_example.py for a complete example of integrating EncypherAI with FastAPI, including:

  • Encoding endpoint
  • Decoding endpoint
  • Streaming support

CLI Usage

The package includes a comprehensive command-line interface:

# Encode metadata into text
python -m encypher.examples.cli_example encode --text "This is a test" --model-id "gpt-4" --target "whitespace"

# Encode with custom metadata
python -m encypher.examples.cli_example encode --input-file input.txt --output-file output.txt --model-id "gpt-4" --custom-metadata '{"source": "test", "user_id": 123}'

# Decode metadata from text
python -m encypher.examples.cli_example decode --input-file encoded.txt --show-clean

# Decode with debug information
python -m encypher.examples.cli_example decode --text "Your encoded text here" --debug

Development and Contributing

We welcome contributions to EncypherAI! Please see CONTRIBUTING.md for detailed guidelines.

Code Style

EncypherAI follows PEP 8 style guidelines with Black as our code formatter. All code must pass Black formatting checks before being merged. We use pre-commit hooks to automate code formatting and quality checks.

To set up the development environment:

# Clone the repository
git clone https://github.com/encypherai/encypher-ai.git
cd encypher-ai

# Install development dependencies
uv pip install -e ".[dev]"

# Set up pre-commit hooks
pre-commit install

The pre-commit hooks will automatically:

  • Format your code with Black (including Jupyter notebooks)
  • Sort imports with isort
  • Check for common issues with flake8 and ruff
  • Perform type checking with mypy

You can also run the formatting tools manually:

# Format all Python files
black encypher

# Format Python files including Jupyter notebooks
black --jupyter encypher

Running Tests

# Run all tests
pytest

# Run tests with coverage
pytest --cov=encypher

License

EncypherAI is provided under a dual licensing model:

Open Source License (AGPL-3.0)

The core EncypherAI package is released under the GNU Affero General Public License v3.0 (AGPL-3.0). This license allows you to use, modify, and distribute the software freely, provided that:

  • You disclose the source code when you distribute the software
  • Any modifications you make are also licensed under AGPL-3.0
  • If you run a modified version of the software as a service (e.g., over a network), you must make the complete source code available to users of that service

Commercial License

For organizations that wish to incorporate EncypherAI into proprietary applications without the source code disclosure requirements of AGPL-3.0, we offer a commercial licensing option.

Benefits of the commercial license include:

  • Proprietary Integration: Use EncypherAI in closed-source applications without AGPL obligations
  • Legal Certainty: Clear licensing terms for commercial use
  • Support & Indemnification: Access to professional support and IP indemnification

For commercial licensing inquiries, please contact enterprise@encypherai.com.

See the LICENSE file for details of the AGPL-3.0 license.

Acknowledgments

  • Thanks to all contributors who have helped shape this project
  • Special thanks to the open-source community for their invaluable tools and libraries

Contact

For questions, feedback, or support, please open an issue on our GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

encypher_ai-1.0.0.tar.gz (51.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

encypher_ai-1.0.0-py3-none-any.whl (42.6 kB view details)

Uploaded Python 3

File details

Details for the file encypher_ai-1.0.0.tar.gz.

File metadata

  • Download URL: encypher_ai-1.0.0.tar.gz
  • Upload date:
  • Size: 51.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for encypher_ai-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1f91b6187defe48e2923d75ed31cca477e1eace47914b8b84483091bb2392ced
MD5 cc4e9adc5568aa402bf3b8089d625087
BLAKE2b-256 e345ce3a8675d5a9c953c425769fafb642bd3799e388afa1416630e1da9439a5

See more details on using hashes here.

File details

Details for the file encypher_ai-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: encypher_ai-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 42.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for encypher_ai-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3434ad257db278cd06903997f702e0f466faab72387f639a9e741b67c7f9c901
MD5 a48f488cecc30107d768073141d53ce4
BLAKE2b-256 87fe3032aac36c13aa2192e55474aad3cda56cb53087086d95e8d33031b955cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page