Skip to main content

User pattern learning with trajectory-aware DPO training

Project description

CognitiveTwin

User pattern learning with trajectory-aware DPO (Direct Preference Optimization) training.

Overview

CognitiveTwin is a sophisticated system for learning user communication patterns through:

  • Corpus Surgery: Data cleaning, validation, and quality filtering
  • WORMS: Trajectory generators for synthetic training data
    • Conversation Worm: Dialogue trajectory generation
    • Repo Worm: Code repository analysis
    • Task Worm: Task execution patterns
    • DPO Generator: Preference pair generation
  • Dataset Building: Preference pair labeling and export
  • Evaluation Suite: Comprehensive testing framework

Installation

pip install cognitive-twin

# With training dependencies
pip install cognitive-twin[training]

Quick Start

from cognitive_twin.v3 import pipeline, schema
from cognitive_twin.framework import config

# Initialize pipeline
cfg = config.CognitiveTwinConfig(
    model_name="your-base-model",
    output_dir="./output"
)

# Run corpus surgery
pipeline.run_corpus_surgery(cfg)

# Generate training data
pipeline.generate_dpo_pairs(cfg)

# Train
pipeline.train(cfg)

Components

v3/ - Main Implementation

  • corpus_surgery/ - Data cleaning and validation
  • dataset/ - Dataset generation and labeling
  • eval/ - Evaluation framework
  • generators/ - Batch and DPO generators
  • ingest/ - Data ingestion (Claude, OpenAI, Supabase)
  • worms/ - Trajectory generators
  • pipeline.py - Main orchestrator
  • schema.py - Type definitions

framework/ - Supporting Infrastructure

  • config.py - Configuration management
  • twin.py - Core twin abstraction
  • trainer.py - Training loop

Documentation

See the docs/ directory for detailed documentation:

  • 00_OVERVIEW.md - System overview
  • 01_CORPUS_SURGERY.md - Data cleaning pipeline
  • 02_REPO_WORM.md - Repository analysis
  • 03_CONVERSATION_WORM.md - Dialogue generation
  • 04_ENHANCER_AGENT.md - Quality enhancement
  • 05_DATASET_BUILDER.md - Dataset construction
  • 06_TRAINING_PIPELINE.md - Training guide
  • 07_EVALUATION_SUITE.md - Evaluation metrics
  • 08_API_INTEGRATION.md - API usage

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cognitive_twin-3.0.0.tar.gz (319.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cognitive_twin-3.0.0-py3-none-any.whl (270.9 kB view details)

Uploaded Python 3

File details

Details for the file cognitive_twin-3.0.0.tar.gz.

File metadata

  • Download URL: cognitive_twin-3.0.0.tar.gz
  • Upload date:
  • Size: 319.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cognitive_twin-3.0.0.tar.gz
Algorithm Hash digest
SHA256 09bcf2435279bdc10d32c7eae62f75d87b2712ead25e9ffd60682bc4cd17a97c
MD5 3a8c96b09a0acc6a73e45301843f4556
BLAKE2b-256 2790b7bac746fa0d42da6aa86e5ad86433c65950da4341d5e2b2854363e27e98

See more details on using hashes here.

File details

Details for the file cognitive_twin-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: cognitive_twin-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 270.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for cognitive_twin-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c9f634c9b0b23418a663374d4122ce07ce5f55ebb260d5e8d95b755d79ae2962
MD5 ed65b6b72224b6c95bf9ae55adedbbde
BLAKE2b-256 c449e369853b4940463e819da25d34c1ae650a000bd4c7ac1ab76fd6c717f20a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page