Skip to main content

Sync Time-Series Pipes with Meerschaum

Project description

Meerschaum banner
PyPI GitHub Info Stats
PyPI GitHub Repo stars License Number of plugins
PyPI - Python Version GitHub Sponsors Artifact Hub Number of registered users

Meerschaum demo

What is Meerschaum?

Meerschaum is an ETL framework for time-series data. You define pipes — named data streams — and Meerschaum keeps them in sync: it fetches only the new or changed rows, deduplicates and upserts them, manages the schema, and handles scheduling, serving, and storage.

Write a few lines of fetch logic; Meerschaum handles the rest of the pipeline. No more copy/pasting ETL scripts, hand-rolling incremental windows, or babysitting cron jobs. Drop it into an existing stack or stand up a full database-and-dashboard stack in minutes.

import meerschaum as mrsm

pipe = mrsm.Pipe('plugin:noaa', 'weather', 'atl', instance='sql:local')
pipe.sync()  ### Pulls only what's new since the last sync.

Features

  • ⚡️ Incremental by defaultthe sync engine fetches only new or changed rows and concurrently updates many streams at once. Duplicate rows are ignored; rows with existing keys are updated.
  • 📊 Built for data scientists and analysts — integrate with Pandas, Grafana, and friends; persist DataFrames and always get the latest data. Skip pandas overhead and read rows as plain dicts with Pipe.get_docs().
  • 🗄️ Production-ready, batteries included — one-click deploy a TimescaleDB + Grafana stack, serve data org-wide via FastAPI (uvicorn/gunicorn), and secure API instances with scoped auth tokens. Supports PostGIS geometry (incl. ESRI CRS) for geospatial pipelines.
  • 💼 Jobs and scheduling — run any command as a background job with -d. Built-in scheduler handles cron and interval schedules — no crontab or systemd setup. Execute locally, via systemd, or remotely on an API instance with --executor-keys.
  • 🔌 Easily expandable — ingest any source with a simple plugin: just return a DataFrame. Add any function as a command, define parent/child pipe relationships for composable SQL pipelines, or embed Meerschaum via its Python API.
  • Tailored for your experience — a rich CLI that's surprisingly enjoyable, a web dashboard for the graphically inclined, and connectors for SQL, API, Valkey, and custom backends.
  • 🧳 Portable from the start$MRSM_ROOT_DIR emulates multiple installations and groups instances. No dependencies required (anything needed installs into a virtual environment), and it's uv-compatible: uv tool install meerschaum.

Want to learn more?

Find a wealth of information at meerschaum.io, or read up on Meerschaum in the wild:

Installation

For a more thorough setup guide, visit the Getting Started page at meerschaum.io.

TL;DR

pip install meerschaum # or `uv tool install meerschaum[api]`
mrsm stack up -d
mrsm bootstrap pipes

Usage

Visit meerschaum.io for setup, usage, and troubleshooting information. You can find technical documentation at docs.meerschaum.io.

CLI

### Install the NOAA weather plugin.
mrsm install plugin noaa

### Register a new pipe to the built-in SQLite DB.
### You can instead run `bootstrap pipe` for a wizard.
### Enter 'KATL' for Atlanta when prompted.
mrsm register pipe -c plugin:noaa -m weather -l atl -i sql:local

### Pull data and create the table "plugin_noaa_weather_atl".
mrsm sync pipes -l atl -i sql:local

Python API

import meerschaum as mrsm

pipe = mrsm.Pipe(
    'foo', 'bar',
    instance = 'sql:local',                  ### Built-in SQLite DB.
    columns  = {'datetime': 'dt', 'id': 'id'},
)

### Sync a DataFrame (or list of dicts) — creates the table on first run.
pipe.sync([{'dt': '2024-07-01', 'id': 1, 'val': 10}])

### Duplicates are ignored; rows with existing keys are updated.
pipe.sync([{'dt': '2024-07-01', 'id': 1, 'val': 100}])
assert pipe.get_rowcount() == 1

### Read back as a DataFrame, filtered by time range and params.
df = pipe.get_data(begin='2024-01-01', end='2025-01-01', params={'id': [1]})

### Or skip pandas and read plain dicts.
docs = pipe.get_docs(params={'id': [1]})
### [{'dt': datetime(2024, 7, 1), 'id': 1, 'val': 100}]

For composable in-database SQL pipelines (reference inheritance and {{ Pipe(...) }} table resolution), see the SQL pipes guide.

Plugins

Ingest any source by returning rows from a fetch function — Meerschaum handles the rest:

# ~/.config/meerschaum/plugins/example.py
__version__ = '1.0.0'
required = ['requests']

def register(pipe, **kw):
    return {'columns': {'datetime': 'dt', 'id': 'id'}}

def fetch(pipe, begin=None, end=None, **kw):
    import requests
    rows = requests.get('https://api.example.com/data').json()
    return rows  ### list of dicts or a Pandas DataFrame

Support Meerschaum's Development

For consulting services and to support Meerschaum's development, please considering sponsoring me on GitHub sponsors.

Additionally, you can always buy me a coffee☕!

License

Copyright 2020-2026 Bennett Meares

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meerschaum-3.4.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

meerschaum-3.4.0-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file meerschaum-3.4.0.tar.gz.

File metadata

  • Download URL: meerschaum-3.4.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for meerschaum-3.4.0.tar.gz
Algorithm Hash digest
SHA256 cbf02d5ba85142d32acb45c7f0fd372f41a0278be03b4d07aa06d2ef4bae19c8
MD5 94f55ba5551dd742c38282b3dbd717bc
BLAKE2b-256 6e59bb1f1ea938b20cdbe97a30534f5dcef8d8423e5b43d892968957c2d3124e

See more details on using hashes here.

File details

Details for the file meerschaum-3.4.0-py3-none-any.whl.

File metadata

  • Download URL: meerschaum-3.4.0-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for meerschaum-3.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aa5eef1e5a44de098bb90ec9afa28e9f8dad5a41ff4b927e1c268ed6220050f5
MD5 8e3e1434de4f2c87c6af77040c7aeeb9
BLAKE2b-256 aff1bf380ca5da472fd67b8800b48d433f19d9f72e28da844872e35322c457cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page