Skip to main content

Python Data Libraries

Project description

overlapping arrows

mabel is a fully-portable Data Engineering platform designed to run on low-spec compute nodes.

There is no server component, mabel just runs when you need it, where you want it.

License Regression Suite codecov Static Analysis PyPI Latest Release Maintainability Rating Security Rating mabel deepcode Downloads Code style: black

Documentation GitHub Wiki
Bug Reports GitHub Issues
Feature Requests GitHub Issues
Source Code GitHub
Discussions GitHub Discussions

Focus on What Matters

We've built mabel to enable Data Analysts to write complex data engineering tasks quickly and easily, so they could get on with doing what they do best.

from mabel import operator
from mabel.operators import EndOperator

@operator
def say_hello(name):
    print(F"Hello, {name}!")

flow = say_hello > EndOperator()
with flow as runner:
    runner("world")  # Hello, world!

Key Features

  • Programatically define data pipelines
  • Treats datasets as immutable
  • On-the-fly compression
  • Automatic version tracking of processing operations
  • Trace messages through the pipeline (random sampling)
  • Automatic retry of failed operations
  • Low-memory requirements, even with terabytes of data
  • Indexing and partitioning of data for fast reads
  • Cursors for tracking reading position

Installation

From PyPI (recommended)

pip install --upgrade mabel

From GitHub

pip install --upgrade git+https://github.com/mabel-dev/mabel

Guides

How to Write a Flow
How to Read Data

Dependencies

  • dateutil is used to convert dates received as strings
  • mmh3 is used for non-cryptographic hashing
  • pydantic is used to define internal data models
  • UltraJSON (AKA ujson) is used where orjson is not available. (Notice1)
  • zstandard is used for real-time compression

There are a number of optional dependencies which are usually only required for specific features and functionality. These are listed in the requirements.txt file in the tests folder which is used for testing. The key exception is orjson which is the preferred JSON library but not available on all platforms.

Integrations

mabel comes with adapters for the following services:

Service Support
GCP Storage Google Cloud Storage Read/Write
MinIo MinIO Read/Write
AWS S3 S3 Read/Write
MongoDB MongoDB Read Only
MQTT MQTT Read Only

Deployment and Execution

mabel supports running on a range of platforms:

Platform
Docker Docker
Kubernetes Kubernetes
Raspberry Pi Raspberry Pi (Notice1)
Windows Windows (Notice2)
Linux Linux (Notice3)

MacOS also supported.

Adapters for other data services can be written.

Notice1 - Raspbian fully functional with ujson.
Notice2 - Multi-Processing not available on Windows. Alternate indexing libraries may be used on Windows.
Notice3 - Tested on Debian and Ubuntu.

How Can I Contribute?

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

If you have a suggestion for an improvement or a bug, raise a ticket or start a discussion.

Want to help build mabel? See the contribution guidance.

License

Apache 2.0

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mabel-0.4.31.tar.gz (75.7 kB view hashes)

Uploaded Source

Built Distribution

mabel-0.4.31-py3-none-any.whl (106.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page