Skip to main content

pull based pandas dataframe syncing

Project description

pullframe

pypi Build Status codecov

pull based pandas dataframe syncing


To reduce network consumption, it syncs dataframe from the other nodes only on demand. When your task is divide and conquer style, you should consider dask instead.

Features

  • Once the cache has been synced, it will not call remotes. So cache's locality is 1.
  • Ideal situations is that you need to read some dataframe multiple times on serveral nodes and the data frame should be updated frequently.
  • Only unique str name is required configuration when you add a new dataframe on the system.
  • No configuration, no operation is needed when a new node is added and a node is crashed and restored.
  • No configuration, no operation makes it be easy to scale up in the cloud.

Communications

  • Coordination via zookeeper
  • Synchronize files via http POST

Start Service

$ uvicorn pullframe.sender:app

Example

Load / Save

from pullframe import pullframe

with pullframe(hosts, directory, sync_timeo 60.0) as pf:
    # set start as None if you want to load from the very beginning
    # set end as None if you want to load from the very ending
    df = pf.load(name, start: Optional[datetime], end: Optional[datetime])

    pf.save(name, df)

TODO

  • Check cache discrepency/corruption between nodes.
  • Stable backup using Amazon S3 / Google cloud storage.
  • Replace zookeeper client to zake (fake kazoo client) during tests.

Requirements

  • zookeeper
  • the dataframe's index should be datetime
  • linux
  • python>=3.7
  • python = "^3.7"
  • pandas = "^1.0.0"
  • tables = "^3.6.1"
  • fastapi = "^0.58.0"
  • aiofiles = "^0.5.0"
  • kazoo = "^2.7.0"

Free software: MIT License

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pullframe-0.1.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

pullframe-0.1.0-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file pullframe-0.1.0.tar.gz.

File metadata

  • Download URL: pullframe-0.1.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.9 CPython/3.8.0 Linux/4.15.0-1028-gcp

File hashes

Hashes for pullframe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9398bbd7610ea552d5bf7002950c4f3387d60440759cab92c4b0d6a3df7618c8
MD5 c1180ad3e0eac98bd9019f2a75a7ad8d
BLAKE2b-256 fe7726a13d3eb54ddcb6f5d53245e71eac8df855cdde94a32be71da1946f12f3

See more details on using hashes here.

File details

Details for the file pullframe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pullframe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.9 CPython/3.8.0 Linux/4.15.0-1028-gcp

File hashes

Hashes for pullframe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8225b4d37ae4cd7f43d2506b569cc069bb79b07d4b1771aa45fa99191a53f8a1
MD5 d5768d6d198149403b2790cd6c968c9a
BLAKE2b-256 6e6adb992a3a8d5024bdf19cf5966a127fb37ed18a16a0a7671b7478ae78f28a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page