Automated tool for migrating h2oGPTe collections

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

h2o

These details have not been verified by PyPI

Project description

h2oGPTe Migration Tool

Automated tool for migrating h2oGPTe collections with tracking, verification, and resume capabilities.

Overview

This tool helps administrators and collection owners migrate collections to new embedding models while:

Preserving collection settings - permissions, lifecycle settings, scheduled connectors, document metadata
Tracking migration state in a SQLite database with step-level granularity
Supporting resume - interrupted migrations can be resumed without duplicating work
Verifying job completion with document counts, embedding model, lifecycle settings, and document statuses
RAG verification - optionally test each migrated collection with a chat query to confirm it works
Optionally migrating chat sessions from old to new collections
Manual move operations - move connectors and chats between any collections
Supporting both admin bulk and self-service migrations

Installation

pip install h2ogpte-migration

This installs the h2ogpte-migrate command.

How It Works

Per-Collection Migration Flow

For each collection, the tool performs these steps:

1. Create new collection (target embedding model)        --> DB: collection_created=1
2. Copy permissions (public, user, group)                --> DB: permissions_copied=1
3. Import documents + settings (single server job)       --> DB: import_submitted=1
   - Documents (with preserved ingest modes)
   - Lifecycle settings (expiry, inactivity, size limit)
   - Document metadata
4. Migrate scheduled connectors (after import succeeds)  --> DB: connectors_migrated=1
5. [Optional] Migrate chat sessions (after connectors)   --> DB: chats_migrated=1

The import job (step 3) handles documents, lifecycle settings, and metadata in a single server-side operation. Connectors and chats are migrated separately after the import succeeds to ensure they are only moved to a working collection.

Execution Modes

Parallel (default): Submit all import jobs without waiting. Jobs run in the background on the server. Use --verify later to check completion — verify also triggers connector migration for completed jobs, and optionally chat migration with --verify --migrate-chats. Note: passing --migrate-chats without --wait-for-completion will NOT migrate chats during the run — it is deferred to the next --verify --migrate-chats invocation.

Sequential (--wait-for-completion): Wait for each import job to complete before moving to the next collection. After each successful import, connectors are migrated immediately. If --migrate-chats is specified, chat sessions are migrated after connectors. Optionally use --max-concurrent-jobs N to process multiple collections concurrently while still waiting for each to complete.

Both execution modes produce identical end results — they differ only in when connectors/chats are migrated (inline for sequential, on next --verify for parallel).

Behavior Summary

Action	`--wait-for-completion`	Parallel (default)
Create new collection	Automatic	Automatic
Copy permissions	Automatic	Automatic
Import documents + settings	Waits for completion	Submits and exits
Validate migration (doc counts, model, statuses)	Automatic after import	On `--verify`
RAG verification	Needs `--verify-query` in the migration command	On `--verify --verify-query`
Migrate connectors	Automatic after import	On `--verify`
Migrate chats	Needs `--migrate-chats` in the migration command	On `--verify --migrate-chats`

Why connectors are automatic but chats are opt-in:

Connectors handle scheduled document ingestion — if they aren't moved, the new collection won't receive future data updates
Chats are optional — admins may want to verify the migration before moving users' chat history, giving flexibility on timing

What happens to the old collection

The old collection is not deleted by this tool. After migration:

Documents are shared (referenced by both old and new collections via copy_document=False)
Scheduled connectors have been moved to the new collection (old collection has none)
Chat sessions have been moved to the new collection (if --migrate-chats was used)
The old collection can be manually deleted once you've confirmed the migration is complete
With copy_document=False, deleting the old collection is safe — documents survive because the new collection still references them

Why connectors and chats are separate from the import job

Scheduled connectors are moved (not copied) from the old collection to the new one. This is a destructive operation — the old collection loses its connectors. If the import job failed (e.g., embedding model error), moving connectors to a broken collection would leave the old collection without connectors and the new collection unusable. By running connector migration only after a confirmed successful import, we ensure connectors are only moved to a working collection.

Chat sessions follow the same principle — they should only be moved after both the import and connector migration succeed.

Resume Capability

If the tool is interrupted, re-running the same command will:

Skip collections that are fully migrated (unless --force-remigrate is used). If connectors/chats were previously moved to another collection, the tool logs the exact --move-connectors/--move-chats commands needed to recover them
Reuse collections that were created but not yet imported (avoids orphaned collections)
Direct to --verify for collections with submitted but incomplete imports
Run pending post-import steps for collections where import completed but connectors/chats haven't been migrated
Retry failed collections when --retry-failed is specified (creates a new collection)

What Gets Migrated

Item	How (API calls / configs)	When
Documents	Server-side import job (`import_collection_into_collection`)	During import
Document metadata	`preserve_metadata=True`	During import
Document ingest modes	`preserve_document_status=True` (agent_only stays agent_only)	During import
Lifecycle settings	`copy_lifecycle_settings=True` (expiry, inactivity, size limit)	During import
Public permissions	`list_collection_public_permissions` + `make_collection_public`	Before import
User/group permissions	`list_collection_permissions` + `share_collection`	Before import
Scheduled connectors	`migrate_scheduled_connectors_to_collection` (moved from old to new)	After import succeeds
Chat sessions	`migrate_chat_sessions_to_collection` (opt-in via `--migrate-chats`)	After connectors succeed

Authentication Modes

Admin Mode (`--admin-key`)

Migrate collections for any user
Use --users, --all-users, or --collections to set scope
Automatically creates temporary API keys for collection owners
Best for bulk migrations across the organization

Self-Service Mode (`--user-key`)

Migrate only collections you own
Optionally use --collections to specify which collections
No additional temporary API keys needed (your user key is used directly)
For collection owners managing their own migrations

Model Mappings

When using the --use-model-mappings flag, the tool uses the following predefined source→target model mappings. Collections whose current embedding model matches a source model below will be migrated to the corresponding target model. Collections using any other model are skipped.

Source Model (deprecated)	Target Model (compliant)
BAAI/bge-m3	h2oai/embeddinggemma-300m-qat-q8_0-unquantized
BAAI/bge-large-en-v1.5	mixedbread-ai/mxbai-embed-large-v1

To migrate to a model not listed here, use --target-model instead.

Flag Reference

Flag	Description
Required
`--url <url>`	h2oGPTe instance URL
Authentication (choose one)
`--admin-key [key]`	Admin API key. Enables migration for any user via `--users`, `--all-users`, or `--collections`/`--collections-file`. Automatically creates and cleans up temporary API keys for collection owners. Pass without a value to read from `H2OGPTE_ADMIN_KEY` env var
`--user-key [key]`	User API key. Migrates only collections you own. No additional temporary API keys created. Cannot use `--users` or `--all-users`. Pass without a value to read from `H2OGPTE_USER_KEY` env var
Migration Scope (choose one)
`--users <names>`	Comma-separated usernames to migrate (admin only). Example: `"john.doe, jane.smith"`
`--all-users`	Migrate all users in the organization (admin only). Use with caution on large organizations — this will consider all collections in the system
`--collections <ids>`	Comma-separated collection IDs. Works with both admin and user keys. Auto-detects owners in admin mode. Also used to filter `--verify` scope
`--collections-file <path>`	Path to a file containing collection IDs (one per line, `#` comments ignored). Works like `--collections` but reads from a file
Migration Mode (choose one)
`--use-model-mappings`	Use predefined source→target model mappings (see Model Mappings section). Collections whose model isn't in the mapping are skipped. Cannot be combined with `--source-model` or `--target-model`
`--target-model <model>`	Target embedding model. Required when not using `--use-model-mappings`
`--source-model <model>`	Only migrate collections using this specific embedding model (optional). Without this, all collections are migrated to `--target-model` regardless of what embedding model they are currently using
Execution
`--wait-for-completion`	Wait for each collection to fully complete migration before moving to the next (import, validation, connectors, and optionally chats). Without this flag, jobs are submitted in parallel and `--verify` must be run separately. Use `--max-concurrent-jobs N` to process multiple collections concurrently
`--max-concurrent-jobs <N>`	Number of collections to process concurrently with `--wait-for-completion` (default: 1). Each worker runs the full migration cycle (create, import, validate, connectors, chats) independently. Has no effect without `--wait-for-completion`
`--verify`	Check status of previously submitted import jobs. Validates completed imports (document counts, embedding model, lifecycle settings, document statuses). Migrates connectors for successfully completed imports. Often combined with `--migrate-chats` and `--verify-query`. Cannot be combined with migration flags (`--use-model-mappings`, `--target-model`, `--source-model`)
`--migrate-chats`	Migrate chat sessions from old to new collection after successful import and connector migration. With `--wait-for-completion`: migrates chats inline. Without it: deferred to `--verify --migrate-chats`. Chats are only moved after connectors succeed
`--dry-run`	Preview what would be migrated without making any changes. Shows target models, permissions, lifecycle settings, document counts, and import settings. No collections created, no database writes
Retry/Resume
`--retry-failed`	Retries collections whose import jobs failed. Creates a new collection and re-submits the import. The previous failed collection remains and needs manual cleanup. Note: This flag is not needed for the case where a collection's import job succeeded but connector/chat migration failed. Those are automatically retried on the next `--verify --migrate-chats` run. Cannot be combined with `--force-remigrate`
`--force-remigrate`	Re-migrate collections regardless of their migration status. Creates new collections even if previously migrated successfully. Overwrites database records. Use with caution — verify the state of the old collection before re-migrating. If a previous successful migration already moved connectors/chats, use `--move-connectors`/`--move-chats` to restore the original collection's state beforehand, or use the recovery commands logged during re-migration. Cannot be combined with `--retry-failed`
Manual Move (recovery actions, not part of regular migration workflows)
`--move-connectors`	Move scheduled connectors from `--from` collection to `--to` collection. The h2oGPTe API enforces ownership — the user must own both collections. With `--admin-key`, the tool looks up the source collection owner and impersonates them. Cannot be combined with migration or verify flags
`--move-chats`	Move chat sessions from `--from` collection to `--to` collection. Can be combined with `--move-connectors` to move both in a single command. Same ownership rules apply
`--from <id>`	Source collection ID for `--move-connectors`/`--move-chats`
`--to <id>`	Target collection ID for `--move-connectors`/`--move-chats`. Must be different from `--from`
Verification
`--verify-query <query>`	RAG verification query. For each completed migration, creates a temporary chat session on the new collection, sends the query, checks that the response includes document references, logs a response preview, and deletes the test chat session. Informational only — does not block connector/chat migration. Requires `--verify` or `--wait-for-completion`. The same query is sent to every collection being verified — often combined with `--collections` to target specific collections
Options
`--copy-document`	Copy documents instead of referencing. Default (False) references documents — both old and new collections point to the same document record (with different embeddings). Faster (skips creating new document records, storage uploads, and cataloging) and saves storage. Use `--copy-document` for full storage isolation between collections
`--skip-reparse`	Re-embed existing chunks without re-parsing documents. Reads text chunks from the source collection, re-embeds them with the target embedding model, and stores them in the new collection — skipping file fetch, PDF conversion, OCR, and chunking. Significantly faster for embedding model migrations where only the embedding model changes. Requires `copy_document=False` (default). Cannot be used with `--copy-document`
`--ocr-model <model>`	OCR model to use during document re-parsing (default: auto). Use this to override the source collection's OCR model, e.g., when migrating away from a CN model. Examples: `auto`, `off`, `tesseract`. Not applicable when `--skip-reparse` is used
`--db-path <path>`	Path to SQLite database for tracking migration state (default: migration_tracking.db). A database file is created automatically in the directory where the tool is run. This flag is optional — only needed if you want to store the database in a custom location. Must use the same path for `--verify` as the original migration, otherwise it creates a new empty database and finds no pending jobs
`--cert <path>`	Path to CA certificate file for SSL verification. Omit this flag if no certificate is required
`--api-key-expiry <duration>`	Expiry duration for temporary API keys created in admin mode (default: 30 days). Example: `"7 days"`, `"30 days"`

Usage Examples

Quick Start

1. Sequentially migrate specific collections (one at a time)

Migrate, validate, move chats, and verify RAG in a single command:

h2ogpte-migrate --url https://h2ogpte.example.com --user-key sk-xxx --collections "col-123, col-456" --use-model-mappings --wait-for-completion --migrate-chats --verify-query "What is our refund policy?"

What happens (for each collection, one at a time — fully completes before moving to the next):

Creates a new collection with the target model from the predefined model mapping
Copies permissions (public, user, group) and lifecycle settings (expiry, inactivity, size limit) to the new collection
Imports documents and waits for the import job to complete
Validates the import (document counts, embedding model, lifecycle settings, document statuses)
Creates a temporary test chat session, sends the RAG verification query, checks for document references, logs the response preview, and deletes the test chat session
Migrates scheduled connectors automatically
Migrates chat sessions (because --migrate-chats is passed)

Note: The same --verify-query is sent to every collection. For collection-specific queries, run separate commands per collection (e.g., in multiple terminals).

Tip: Add --max-concurrent-jobs N to process multiple collections concurrently instead of one at a time. See Example 2 below.

2. Concurrent migration with controlled parallelism

Migrate many collections concurrently with --max-concurrent-jobs:

h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --collections "col-1, col-2, ..., col-100" --use-model-mappings --wait-for-completion --max-concurrent-jobs 10 --migrate-chats

What happens:

Up to 10 collections are processed concurrently
Each worker independently: creates a new collection, copies permissions, imports documents, waits for completion, validates, migrates connectors and chats
When a worker finishes one collection, it picks up the next from the queue
At most 10 import jobs are active on the server at any time
If any collection fails, the rest continue unaffected — failed collections can be retried later with --retry-failed

Note: Without --max-concurrent-jobs (or with --max-concurrent-jobs 1), --wait-for-completion processes one collection at a time. Use higher values to speed up large-scale migrations while controlling server load.

Tip: With concurrent workers, log lines from different collections are interleaved. Each line is prefixed with [Collection Name], so you can filter the log file for a specific collection:

grep "\[Collection Alpha\]" migration_20260316_225645.log

3. Migrate specific collections in parallel (multiple at the same time)

Submit all jobs at once, then verify separately:

# Step 1: Submit migration jobs (runs in background)
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --collections "col-123, col-456, col-789" --use-model-mappings

# Step 2: Verify completion, migrate connectors + chats, and run RAG check
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --collections "col-123, col-456, col-789" --verify --migrate-chats --verify-query "What is our refund policy?"

What happens in Step 1:

For each collection: looks up the owner, creates a temporary API key for them
Creates new collections with the target model from the predefined model mapping
Copies permissions, lifecycle settings, and submits import jobs for each collection
Exits immediately once all collections have had jobs created — jobs continue running in the background on the server

What happens in Step 2:

Checks the status of each import job
For successfully completed imports: validates document counts, embedding model, lifecycle settings, and document statuses
Runs the RAG verification query on each completed collection (because --verify-query was included)
Migrates scheduled connectors automatically for successfully completed imports
Migrates chat sessions (because --migrate-chats is passed) — chats are only moved after the import succeeds, so it's safe to include in the verify step

Note: The --collections flag in Step 2 applies verification to those specific collections. The same --verify-query is sent to every collection specified. Without --collections, --verify checks all pending jobs in the database (admin mode). If you were to use --user-key instead and omitted the --collections with --verify, only jobs belonging to your account are checked.

Tip for admins: --users "john.doe, jane.smith" can be used to scope to specific users instead of collection IDs. --all-users is also available to migrate every collection across all users, but use with caution on large organizations as it submits import jobs for all collections at once.

More Examples

4. Dry run (preview changes)

h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --users john.doe --use-model-mappings --dry-run

What happens:

Shows which collections would be migrated and the target embedding models
Displays permissions that would be copied (public, user, group)
Shows lifecycle settings that would be copied (expiry, inactivity interval, size limit)
Shows document counts and import settings
No actual changes — no collections created, no database writes

5. Specific model migration with OCR model override

# Step 1: Submit migration jobs for a specific source model, using tesseract for OCR
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --users john.doe --source-model "BAAI/bge-large-en-v1.5" --target-model "mixedbread-ai/mxbai-embed-large-v1" --ocr-model "tesseract"

# Step 2: Verify completion, migrate connectors + chats
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --users john.doe --verify --migrate-chats

What happens in Step 1:

Creates a temporary API key for the user
Scans all collections owned by the user
Only processes collections using BAAI/bge-large-en-v1.5 (ignores all others)
For each matching collection: creates a new collection with mixedbread-ai/mxbai-embed-large-v1, copies permissions and lifecycle settings, and submits an import job
The --ocr-model "tesseract" flag overrides the OCR model used during document re-parsing (default: auto). Use this when migrating away from a CN OCR model or to preserve a specific OCR model like Tesseract
Useful for phased migrations — migrate one model at a time instead of using predefined mappings

What happens in Step 2:

Checks the status of each import job for the user
For successfully completed imports: validates document counts, embedding model, lifecycle settings, and document statuses
Migrates scheduled connectors automatically for successfully completed imports
Migrates chat sessions for successfully completed imports (because --migrate-chats is passed)
No RAG verification is done (--verify-query was not included — add it to Step 2 if needed, but keep in mind the same query would apply to all collections verified)

6. Verify and migrate chats (check job status, migrate connectors + chats)

h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --verify --migrate-chats

What happens:

Does NOT run any new migrations
Queries the database for all pending/submitted/running jobs, plus completed jobs with pending post-import steps (i.e., connectors and chats that weren't migrated inline because --wait-for-completion was omitted)
Checks the status of each import job
For successfully completed imports: validates document counts, embedding model, lifecycle settings, and document statuses
Migrates scheduled connectors automatically for successfully completed imports
Migrates chat sessions for successfully completed imports (because --migrate-chats is passed)
Reports summary (completed/failed/running/canceled counts)

Without --users or --collections, checks all pending jobs in the database with --admin-key. With --user-key, only jobs belonging to your account are checked.

Filter by user or collection:

h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --verify --users john.doe
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --verify --collections "col-123, col-456"
h2ogpte-migrate --url https://h2ogpte.example.com --user-key sk-xxx --verify --collections "col-123, col-456"

7. Retry failed migrations

# Admin: retry specific collections
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --collections "col-123, col-456" --use-model-mappings --retry-failed --wait-for-completion --migrate-chats

# Self-service: retry all your failed collections (no --collections scope needed)
h2ogpte-migrate --url https://h2ogpte.example.com --user-key sk-xxx --use-model-mappings --retry-failed --wait-for-completion --migrate-chats

What happens:

Skips collections that are completed, submitted, or running
For collections with a failed import job status: creates a new collection and re-submits the import
- Logs a warning with the previous failed collection ID for reference (needs manual cleanup)
With --wait-for-completion: waits for each retried import to complete, validates, migrates connectors and chats inline
Without --wait-for-completion: submits jobs in the background — run --verify --migrate-chats later to check completion and migrate connectors + chats

8. Force re-migration

# Step 1: Force re-migrate specific collections
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --collections "col-123, col-456" --use-model-mappings --force-remigrate

# Step 2: Verify completion, migrate connectors + chats
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --collections "col-123, col-456" --verify --migrate-chats

What happens in Step 1:

Collections are picked up regardless of their migration status
Creates new collections for ALL specified collections, even if they were previously migrated successfully
Overwrites previous local migration database records for those collections
Previously migrated collections remain in the user's account (needs manual cleanup)
Caution: If a previous successful migration already moved connectors/chats to another collection, they should either be manually moved back to the original collection before running this (via --move-connectors/--move-chats), or the tool will log the exact move commands to recover them after the re-migration completes

What happens in Step 2:

Checks the status of each import job
For successfully completed imports: validates document counts, embedding model, lifecycle settings, and document statuses
Migrates scheduled connectors from the original collection, if they still exist there
Migrates chat sessions from the original collection, if they still exist there (because --migrate-chats is passed)
Important: If a previous migration already moved connectors/chats out of the original collection, they won't be found here — use --move-connectors/--move-chats to recover them (see commands logged in Step 1)

9. Manually move connectors and/or chats between collections (for recovery purposes)

h2ogpte-migrate --url https://h2ogpte.example.com --user-key sk-xxx --move-connectors --from "col-abc" --to "col-def"
h2ogpte-migrate --url https://h2ogpte.example.com --user-key sk-xxx --move-chats --from "col-abc" --to "col-def"
h2ogpte-migrate --url https://h2ogpte.example.com --admin-key sk-xxx --move-connectors --move-chats --from "col-abc" --to "col-def"

What happens:

Moves scheduled connectors and/or chat sessions from the source collection to the target collection
The source collection will no longer have the moved items after this operation
Useful for recovering connectors/chats after --force-remigrate, or reorganizing collections
Works with both --admin-key and --user-key (server enforces ownership)
In admin mode, automatically creates a temporary API key for the source collection's owner

Database Tracking

Schema

CREATE TABLE collection_migrations (
    old_collection_id TEXT PRIMARY KEY,
    old_collection_name TEXT,
    new_collection_id TEXT,
    new_collection_name TEXT,
    old_model TEXT,
    new_model TEXT,
    job_id TEXT,
    job_status TEXT,
    user_id TEXT,
    username TEXT,
    created_at TIMESTAMP,
    completed_at TIMESTAMP,
    error TEXT,
    -- Step tracking
    collection_created BOOLEAN DEFAULT 0,
    permissions_copied BOOLEAN DEFAULT 0,
    import_submitted BOOLEAN DEFAULT 0,
    import_completed BOOLEAN DEFAULT 0,
    connectors_migrated BOOLEAN DEFAULT 0,
    chats_migrated BOOLEAN DEFAULT 0
);

Job Statuses

pending - Collection created, import not yet submitted
submitted - Import job submitted, running in background
running - Import job verified as in-progress
completed - Import job completed successfully
failed - Import job failed, canceled, or had errors

Resume Behavior

DB State	On Re-run
`collection_created=1, import_submitted=0`	Reuses existing collection, re-copies permissions, submits import
`import_submitted=1, import_completed=0`	Skips, tells user to run `--verify`
`import_completed=1, connectors_migrated=0`	Migrates connectors
`import_completed=1, connectors_migrated=1, chats_migrated=0`	With `--migrate-chats`: migrates chats
`import_completed=1, connectors_migrated=1, chats_migrated=1`	Fully done, skips
`job_status='failed'`	With `--retry-failed`: creates new collection
Any state	With `--force-remigrate`: ignores DB, creates new collection. Logs `--move-connectors`/`--move-chats` commands if connectors/chats were previously moved

Output Files

migration_YYYYMMDD_HHMMSS.log - Detailed log with timestamps, job IDs, errors. Created in the directory where the tool is run
migration_tracking.db - SQLite database with migration state. Created automatically in the directory where the tool is run. Use --db-path for a custom location. If running from a different directory later (e.g., for --verify), pass --db-path pointing to the original database, otherwise a new empty database is created and no pending jobs are found

Troubleshooting

SSL Certificate Errors

--cert ~/path/to/ca-chain.crt    # Provide certificate
# Omit --cert if no certificate is required

Check Migration Status

sqlite3 migration_tracking.db "SELECT old_collection_name, job_status, import_completed, connectors_migrated, chats_migrated, error FROM collection_migrations;"

Collection Already Migrated

--force-remigrate    # Re-migrate (creates new collection, old one needs manual cleanup)

Caution: If a previous successful migration already moved connectors/chats, use --move-connectors/--move-chats to restore state before re-migrating, or use the recovery commands logged during re-migration.

Using `--verify` with a custom database path

If your initial migration used --db-path /custom/path/migration.db, you must use the same --db-path for --verify, otherwise it creates a new empty database and finds no pending jobs.

Failed Import - Retry

--retry-failed       # Creates new collection for failed imports

Best Practices

Always dry-run first - Use --dry-run to preview changes
Test on a single collection or user - Understand and validate how the migration works before running on a larger scale
Run during off-hours - Minimize impact on users
Use parallel mode for large batches - Submit jobs without waiting, verify later
Always run --verify after parallel migrations - This checks completion, validates imports, and migrates connectors. Include --migrate-chats to also migrate chat sessions
Use --verify-query for RAG validation - Sends a test query to each migrated collection, checks for document references, and cleans up the test chat session. Informational only — does not block connector/chat migration. The same query applies to all collections being verified, so use --collections to target collections with similar content for accurate results
Use --move-connectors/--move-chats with --force-remigrate - If a collection needs to be re-migrated, a previous successful migration may have already moved connectors/chats to another collection. Use --move-connectors/--move-chats to recover them to the appropriate collection. The tool logs the exact commands needed during re-migration
Keep database and logs - Archive for audit trail
Clean up failed collections manually - After --retry-failed, old failed collections remain

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

h2o

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.2.0

Apr 7, 2026

1.2.0rc2 pre-release

Mar 24, 2026

1.2.0rc1 pre-release

Mar 23, 2026

1.1.0

Mar 18, 2026

1.0.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

h2ogpte_migration-1.2.0.tar.gz (56.9 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

h2ogpte_migration-1.2.0-py3-none-any.whl (43.7 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file h2ogpte_migration-1.2.0.tar.gz.

File metadata

Download URL: h2ogpte_migration-1.2.0.tar.gz
Upload date: Apr 7, 2026
Size: 56.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for h2ogpte_migration-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`86e2712b7775da409a9653d3ced1a9634025b1fd179cd1c8df816dd77921a11d`
MD5	`dbf37388111040cffe6539aa7ed9b4e7`
BLAKE2b-256	`ee50727b471bb16fd1784307b5ad99f98e347924ccfa9985fd2f85ab2debcff9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for h2ogpte_migration-1.2.0.tar.gz:

Publisher: publish-pypi.yml on h2oai/h2ogpte-embedding-migration

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: h2ogpte_migration-1.2.0.tar.gz
- Subject digest: 86e2712b7775da409a9653d3ced1a9634025b1fd179cd1c8df816dd77921a11d
- Sigstore transparency entry: 1245261327
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: h2oai/h2ogpte-embedding-migration@d8a64e90738b84c1638f89337b666486a134de4d
- Branch / Tag: refs/heads/main
- Owner: https://github.com/h2oai
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@d8a64e90738b84c1638f89337b666486a134de4d
- Trigger Event: workflow_dispatch

File details

Details for the file h2ogpte_migration-1.2.0-py3-none-any.whl.

File metadata

Download URL: h2ogpte_migration-1.2.0-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 43.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for h2ogpte_migration-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1cd83d9a88ae9c0d231a6c3268d8191f61272a760680a8a7e583a950a59d2288`
MD5	`ab5b85014bc92f2bc5394640a03cdc51`
BLAKE2b-256	`a0ec52016ee8d389ae3f5220d08df380696d26d233b3bf654d8cc5fcc548e30a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for h2ogpte_migration-1.2.0-py3-none-any.whl:

Publisher: publish-pypi.yml on h2oai/h2ogpte-embedding-migration

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: h2ogpte_migration-1.2.0-py3-none-any.whl
- Subject digest: 1cd83d9a88ae9c0d231a6c3268d8191f61272a760680a8a7e583a950a59d2288
- Sigstore transparency entry: 1245261373
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: h2oai/h2ogpte-embedding-migration@d8a64e90738b84c1638f89337b666486a134de4d
- Branch / Tag: refs/heads/main
- Owner: https://github.com/h2oai
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@d8a64e90738b84c1638f89337b666486a134de4d
- Trigger Event: workflow_dispatch

h2ogpte-migration 1.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

h2oGPTe Migration Tool

Overview

Installation

How It Works

Per-Collection Migration Flow

Execution Modes

Behavior Summary

What happens to the old collection

Why connectors and chats are separate from the import job

Resume Capability

What Gets Migrated

Authentication Modes

Admin Mode (--admin-key)

Self-Service Mode (--user-key)

Model Mappings

Flag Reference

Usage Examples

Quick Start

1. Sequentially migrate specific collections (one at a time)

2. Concurrent migration with controlled parallelism

3. Migrate specific collections in parallel (multiple at the same time)

More Examples

4. Dry run (preview changes)

5. Specific model migration with OCR model override

6. Verify and migrate chats (check job status, migrate connectors + chats)

7. Retry failed migrations

8. Force re-migration

9. Manually move connectors and/or chats between collections (for recovery purposes)

Database Tracking

Schema

Job Statuses

Resume Behavior

Output Files

Troubleshooting

SSL Certificate Errors

Check Migration Status

Collection Already Migrated

Using --verify with a custom database path

Failed Import - Retry

Best Practices

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Admin Mode (`--admin-key`)

Self-Service Mode (`--user-key`)

Using `--verify` with a custom database path