Skip to main content

Sync Time-Series Pipes with Meerschaum

Project description

Meerschaum banner
PyPI GitHub Info Stats
PyPI GitHub Repo stars License Number of plugins
PyPI - Python Version GitHub Sponsors Artifact Hub Number of registered users

Meerschaum demo

What is Meerschaum?

Meerschaum is a tool for quickly synchronizing time-series data streams called pipes. With Meerschaum, you can have a data pipeline and visualization stack running in minutes.

Why Meerschaum?

Two words: incremental updates. Fetch the data you need, and Meerschaum will handle the rest.

If you've worked with time-series data, you know the headaches that come with ETL. Data engineering often gets in analysts' way, and when work needs to get done, every minute spent on pipelining is time taken away from real analysis.

Rather than copy / pasting your ETL scripts, simply build pipes with Meerschaum! Meerschaum gives you the tools to design your data streams how you like ― and don't worry — you can always incorporate Meerschaum into your existing systems!

Want to Learn More?

You can find a wealth of information at meerschaum.io!

Additionally, below are several articles published about Meerschaum:

Installation

For a more thorough setup guide, visit the Getting Started page at meerschaum.io.

TL;DR

pip install -U --user meerschaum
mrsm stack up -d db grafana
mrsm bootstrap pipes

Usage

Please visit meerschaum.io for setup, usage, and troubleshooting information. You can find technical documentation at docs.meerschaum.io, and here is a complete list of the Meerschaum actions.

CLI

### Install the NOAA weather plugin.
mrsm install plugin noaa

### Register a new pipe to the built-in SQLite DB.
### You can instead run `bootstrap pipe` for a wizard.
### Enter 'KATL' for Atlanta when prompted.
mrsm register pipe -c plugin:noaa -m weather -l atl -i sql:local

### Pull data and create the table "plugin_noaa_weather_atl".
mrsm sync pipes -l atl -i sql:local

Python API

import meerschaum as mrsm
pipe = mrsm.Pipe(
    'foo', 'bar',              ### Connector and metric labels.
    target   = 'MyTableName!', ### Table name. Defaults to 'foo_bar'.
    instance = 'sql:local',    ### Built-in SQLite DB. Defaults to 'sql:main'.
    columns  = {
        'datetime': 'dt',      ### Column for the datetime index.
        'id'      : 'id',      ### Column for the ID index (optional).
    },
)
### Pass a DataFrame to create the table and indices.
pipe.sync([{'dt': '2024-07-01', 'id': 1, 'val': 10}])

### Duplicate rows are ignored.
pipe.sync([{'dt': '2024-07-01', 'id': 1, 'val': 10}])
assert len(pipe.get_data()) == 1

### Rows with existing keys (datetime and/or id) are updated.
pipe.sync([{'dt': '2024-07-01', 'id': 1, 'val': 100}])
assert len(pipe.get_data()) == 1

### Translates to this query for SQLite:
###
### SELECT *
### FROM "MyTableName!"
### WHERE "dt" >= datetime('2024-01-01', '0 minute')
###   AND "dt" <  datetime('2025-01-01', '0 minute')
###   AND "id" IN ('1')
df = pipe.get_data(
    begin  = '2024-01-01',
    end    = '2025-01-01',
    params = {'id': [1]},
)

### Shape of the DataFrame:
###           dt  id  val
### 0 2024-07-01   1  100

### Drop the table and remove the pipe's metadata.
pipe.delete()

Simple Plugin

# ~/.config/meerschaum/plugins/example.py

__version__ = '1.0.0'
required = ['requests']

def register(pipe, **kw):
    return {
        'columns': {
            'datetime': 'dt',
            'id'      : 'id',
        },
    }

def fetch(pipe, begin=None, end=None, **kw):
    import requests, random
    from datetime import datetime, timezone

    ### Fetch data from an external API.
    response = requests.get('https://api.example.com/data')
    data = response.json()  ### list of dicts

    now = datetime.now(timezone.utc).replace(tzinfo=None)

    ### You may also return a Pandas DataFrame.
    return [{
        "dt"   : now,
        "id"   : random.randint(1, 4),
        "value": random.uniform(1, 100),
    }]

Features

  • 📊 Built for Data Scientists and Analysts
    • Integrate with Pandas, Grafana, and other popular data analysis tools.
    • Persist your dataframes and always get the latest data.
    • Filter pipes by connector, metric, location, tags, or datetime column dtype (--dtype datetime|int|None).
  • ⚡️ Production-Ready, Batteries Included
    • Synchronization engine concurrently updates many time-series data streams.
    • One-click deploy a TimescaleDB and Grafana stack for prototyping.
    • Serve data to your entire organization through the power of uvicorn, gunicorn, and FastAPI.
    • Supports PostGIS geometry columns for geospatial pipelines.
  • 💼 Jobs and Scheduling
    • Run any command as a background job with -d (--daemon).
    • Built-in scheduler handles cron and interval schedules — no crontab or systemd setup required.
    • Execute jobs locally, via systemd services, or remotely on an API instance with --executor-keys.
    • Copy pipes across instances with the improved copy pipes command and --sync-data flag.
  • 🔌 Easily Expandable
    • Ingest any data source with a simple plugin. Just return a DataFrame, and Meerschaum handles the rest.
    • Add any function as a command to the Meerschaum system.
    • Define parent/child pipe relationships and pipe references for composable SQL pipelines.
    • Include Meerschaum in your projects with its easy-to-use Python API.
  • Tailored for Your Experience
    • Rich CLI makes managing your data streams surprisingly enjoyable!
    • Web dashboard for those who prefer a more graphical experience.
    • Manage your database connections with Meerschaum connectors (SQL, API, Valkey, and custom).
    • Utility commands with sensible syntax let you control many pipes with grace.
  • 💼 Portable from the Start
    • The environment variable $MRSM_ROOT_DIR lets you emulate multiple installations and group together your instances.
    • No dependencies required; anything needed will be installed into a virtual environment.
    • Compatible with uv — install with uv tool install meerschaum.
    • Specify required packages for your plugins, and users will get those packages in a virtual environment.

Support Meerschaum's Development

For consulting services and to support Meerschaum's development, please considering sponsoring me on GitHub sponsors.

Additionally, you can always buy me a coffee☕!

License

Copyright 2020-2026 Bennett Meares

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meerschaum-3.3.1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

meerschaum-3.3.1-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file meerschaum-3.3.1.tar.gz.

File metadata

  • Download URL: meerschaum-3.3.1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for meerschaum-3.3.1.tar.gz
Algorithm Hash digest
SHA256 40089a613ba22ff4a05603bde6dd5f1a3cafc3e6916ca64d30be1c3805386453
MD5 fe727f8f132bab8d56ea0a03fe36ac87
BLAKE2b-256 301d6fe1ea0d1120aab770babf633b75f90c79baf128bb6b649b5f4851472d4e

See more details on using hashes here.

File details

Details for the file meerschaum-3.3.1-py3-none-any.whl.

File metadata

  • Download URL: meerschaum-3.3.1-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for meerschaum-3.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3e413cc091ca9605ac9f31ed5eab0520a2ec864ddadac5b0ea6d29d2706db51a
MD5 4972af565937365d413c0f34d24e4628
BLAKE2b-256 f8910044502bf5c0be80d332f3b4d955e13b9766a77168a3c96633df9355b0af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page