Python Data Libraries
Project description
mabel is a fully-portable Data Engineering platform designed to run on low-spec compute nodes.
There is no server component, mabel just runs when you need it, where you want it.
Documentation GitHub Wiki
Bug Reports GitHub Issues
Feature Requests GitHub Issues
Source Code GitHub
Discussions GitHub Discussions
Focus on What Matters
We've built mabel to enable Data Analysts to write complex data engineering tasks quickly and easily, so they could get on with doing what they do best.
from mabel import operator
from mabel.operators import EndOperator
@operator
def say_hello(name):
print(F"Hello, {name}!")
flow = say_hello > EndOperator()
with flow as runner:
runner("world") # Hello, world!
Key Features
- Programatically define data pipelines
- Treats datasets as immutable
- On-the-fly compression
- Automatic version tracking of processing operations
- Trace messages through the pipeline (random sampling)
- Automatic retry of failed operations
- Low-memory requirements, even with terabytes of data
- Indexing and partitioning of data for fast reads (beta)
- Cursors for tracking reading position (beta)
- SQL Query support (alpha)
- Schema and data_expectations validation
Note:
- alpha features are subject to change and are not recommended for production systems
- beta features may change to resolve issues during testing
Installation
From PyPI (recommended)
pip install --upgrade mabel
From GitHub
pip install --upgrade git+https://github.com/mabel-dev/mabel
Guides
How to Write a Flow
How to Read Data
Dependencies
- dateutil is used to convert dates received as strings
- mmh3 is used for non-cryptographic hashing
- pydantic is used to define internal data models
- UltraJSON (AKA
ujson
) is used whereorjson
is not available. (1) - zstandard is used for real-time compression
There are a number of optional dependencies which are usually only required for
specific features and functionality. These are listed in the
requirements.txt
file in the tests folder which is used for testing. The exception is orjson
which
is the preferred JSON library but not available on all platforms.
Integrations
mabel comes with adapters for the following services:
Service | Support | |
---|---|---|
Google Cloud Storage | Read/Write | |
MinIO | Read/Write | |
S3 | Read/Write |
MongoDB and MQTT Readers are included in the base library but are not supported.
Deployment and Execution
mabel supports running on a range of platforms:
Platform | |
---|---|
Docker | |
Kubernetes | |
Raspberry Pi (1) | |
Windows (2) | |
Linux (3) |
MacOS also supported.
Adapters for other data services can be written.
1 - Raspbian fully functional with ujson
.
2 - Multi-Processing not available on Windows. Alternate indexing libraries may be used on Windows.
3 - Tested on Debian and Ubuntu.
How Can I Contribute?
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
If you have a suggestion for an improvement or a bug, raise a ticket or start a discussion.
Want to help build mabel? See the contribution guidance.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.