Skip to main content

Small toolkit and CLI to transfer and stream data between databases using async pipelines.

Project description

Datahood

A friendly toolkit and CLI to transfer and stream data between databases using async pipelines, plus automatic schema inference that generates TypedDict and Pydantic models from your existing documents.

This tool was born from repeatedly writing throwaway scripts to move data from point A to point B across various client projects. Instead of reinventing the wheel every time, I extracted this into a reusable solution. If it saves you from writing ad-hoc data transfer scripts, it has fulfilled its purpose!


🎯 What It Does

Datahood solves two common data engineering problems:

  1. Data Transfer - Move data between MongoDB collections and BSON files with streaming performance
  2. Schema Discovery - Automatically generate Python type definitions from your existing documents

Perfect for:

  • Moving production data slices to staging/development environments
  • Creating typed models from legacy collections for safer refactoring
  • Auditing document structure before migrations
  • Quick data exports and imports with compression support

🚀 Quick Start

# Get help
dh --help
dh transfer --help
dh schema --help

# Transfer data
dh transfer mongo-to-bson output.bson --source-uri mongodb://localhost --source-collection users
dh transfer bson-to-mongo data.bson --dest-uri mongodb://localhost --dest-collection users

# Generate Python types
dh schema from-mongo --uri mongodb://localhost --collection users --to-pydantic -o models.py
dh schema from-bson data.bson --to-typeddict -o models.py

📦 Installation

uv add datahood  # or pip install datahood

👷 Data Transfer

TIP: Add --dry-run to any command to preview without moving data

MongoDB → BSON

dh transfer mongo-to-bson output.bson \
  --source-uri "mongodb://user:pass@localhost:27017/?authSource=admin" \
  --source-database mydb \
  --source-collection users

BSON → MongoDB

dh transfer bson-to-mongo data.bson \
  --dest-uri "mongodb://user:pass@localhost:27017/?authSource=admin" \
  --dest-database mydb \
  --dest-collection users

MongoDB → MongoDB

dh transfer mongo-to-mongo \
  --source-uri "mongodb://source:27017" \
  --source-database src_db --source-collection users \
  --dest-uri "mongodb://dest:27017" \
  --dest-database dest_db --dest-collection users_copy

🧬 Schema Generation

From MongoDB

# Generate Pydantic models
dh schema from-mongo --uri mongodb://localhost --collection users --to-pydantic -o models.py

# Generate TypedDict (default)
dh schema from-mongo --uri mongodb://localhost --collection users -o models.py

From BSON Files

# Generate Pydantic models
dh schema from-bson data.bson --to-pydantic -o models.py

# Generate TypedDict
dh schema from-bson data.bson --to-typeddict -o models.py

Smart Features:

  • Automatically handles nested objects and creates separate types
  • Detects optional fields and union types
  • Generates clean, production-ready code

🧪 Development

make format lint type-check test    # Quality checks
make test-mongo-up                  # Start test MongoDB
RUN_INTEGRATION=1 pytest tests/integration
make test-mongo-down               # Clean up

🤝 Contributing

  1. Fork & create feature branch
  2. Add changes with tests
  3. Run: make format lint type-check test
  4. Submit PR

Version bumps: make bump-version PART=patch|minor|major

📬 Support

Issues: https://github.com/ericmiguel/datahood/issues

Happy data moving! ✨

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datahood-0.1.0.tar.gz (194.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datahood-0.1.0-py3-none-any.whl (48.8 kB view details)

Uploaded Python 3

File details

Details for the file datahood-0.1.0.tar.gz.

File metadata

  • Download URL: datahood-0.1.0.tar.gz
  • Upload date:
  • Size: 194.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.3

File hashes

Hashes for datahood-0.1.0.tar.gz
Algorithm Hash digest
SHA256 633efa546b9590d1782907dbec49ab4750c76c92b9c5c121749313336c755085
MD5 3685e0f87f7f6e6ff4dc64bb8be1b32e
BLAKE2b-256 d36e9d6aef889f85884189db25e9375eda4fa5ad32f8e870a4dbd50838ddcd00

See more details on using hashes here.

File details

Details for the file datahood-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: datahood-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 48.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.3

File hashes

Hashes for datahood-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9d617810cbe9f8df8a2968d9e1bed617dfaf248fa3b5b70fe79bf4d8301f2ab6
MD5 21433dc25d758e7585854e899fa72193
BLAKE2b-256 698860c4442a0109ef26efbab4175de9c5697f6780271925e044265f0b714466

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page