Small toolkit and CLI to transfer and stream data between databases using async pipelines.
Project description
Datahood
A friendly toolkit and CLI to transfer and stream data between databases using async pipelines, plus automatic schema inference that generates TypedDict and Pydantic models from your existing documents.
This tool was born from repeatedly writing throwaway scripts to move data from point A to point B across various client projects. Instead of reinventing the wheel every time, I extracted this into a reusable solution. If it saves you from writing ad-hoc data transfer scripts, it has fulfilled its purpose!
🎯 What It Does
Datahood solves two common data engineering problems:
- Data Transfer - Move data between MongoDB collections and BSON files with streaming performance
- Schema Discovery - Automatically generate Python type definitions from your existing documents
Perfect for:
- Moving production data slices to staging/development environments
- Creating typed models from legacy collections for safer refactoring
- Auditing document structure before migrations
- Quick data exports and imports with compression support
🚀 Quick Start
# Get help
dh --help
dh transfer --help
dh schema --help
# Transfer data
dh transfer mongo-to-bson output.bson --source-uri mongodb://localhost --source-collection users
dh transfer bson-to-mongo data.bson --dest-uri mongodb://localhost --dest-collection users
# Generate Python types
dh schema from-mongo --uri mongodb://localhost --collection users --to-pydantic -o models.py
dh schema from-bson data.bson --to-typeddict -o models.py
📦 Installation
uv add datahood # or pip install datahood
👷 Data Transfer
TIP: Add
--dry-runto any command to preview without moving data
MongoDB → BSON
dh transfer mongo-to-bson output.bson \
--source-uri "mongodb://user:pass@localhost:27017/?authSource=admin" \
--source-database mydb \
--source-collection users
BSON → MongoDB
dh transfer bson-to-mongo data.bson \
--dest-uri "mongodb://user:pass@localhost:27017/?authSource=admin" \
--dest-database mydb \
--dest-collection users
MongoDB → MongoDB
dh transfer mongo-to-mongo \
--source-uri "mongodb://source:27017" \
--source-database src_db --source-collection users \
--dest-uri "mongodb://dest:27017" \
--dest-database dest_db --dest-collection users_copy
🧬 Schema Generation
From MongoDB
# Generate Pydantic models
dh schema from-mongo --uri mongodb://localhost --collection users --to-pydantic -o models.py
# Generate TypedDict (default)
dh schema from-mongo --uri mongodb://localhost --collection users -o models.py
From BSON Files
# Generate Pydantic models
dh schema from-bson data.bson --to-pydantic -o models.py
# Generate TypedDict
dh schema from-bson data.bson --to-typeddict -o models.py
Smart Features:
- Automatically handles nested objects and creates separate types
- Detects optional fields and union types
- Generates clean, production-ready code
🧪 Development
make format lint type-check test # Quality checks
make test-mongo-up # Start test MongoDB
RUN_INTEGRATION=1 pytest tests/integration
make test-mongo-down # Clean up
🤝 Contributing
- Fork & create feature branch
- Add changes with tests
- Run:
make format lint type-check test - Submit PR
Version bumps: make bump-version PART=patch|minor|major
📬 Support
Issues: https://github.com/ericmiguel/datahood/issues
Happy data moving! ✨
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datahood-0.1.0.tar.gz.
File metadata
- Download URL: datahood-0.1.0.tar.gz
- Upload date:
- Size: 194.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
633efa546b9590d1782907dbec49ab4750c76c92b9c5c121749313336c755085
|
|
| MD5 |
3685e0f87f7f6e6ff4dc64bb8be1b32e
|
|
| BLAKE2b-256 |
d36e9d6aef889f85884189db25e9375eda4fa5ad32f8e870a4dbd50838ddcd00
|
File details
Details for the file datahood-0.1.0-py3-none-any.whl.
File metadata
- Download URL: datahood-0.1.0-py3-none-any.whl
- Upload date:
- Size: 48.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d617810cbe9f8df8a2968d9e1bed617dfaf248fa3b5b70fe79bf4d8301f2ab6
|
|
| MD5 |
21433dc25d758e7585854e899fa72193
|
|
| BLAKE2b-256 |
698860c4442a0109ef26efbab4175de9c5697f6780271925e044265f0b714466
|