pull based pandas dataframe syncing
Project description
pullframe
pull based pandas dataframe syncing
To reduce network consumption, it syncs dataframe from the other nodes only on demand. When your task is divide and conquer style, you should consider dask instead.
Features
- Once the cache has been synced, it will not call remotes. So cache's locality is 1.
- Ideal situations is that you need to read some dataframe multiple times on serveral nodes and the data frame should be updated frequently.
- Only unique str name is required configuration when you add a new dataframe on the system.
- No configuration, no operation is needed when a new node is added and a node is crashed and restored.
- No configuration, no operation makes it be easy to scale up in the cloud.
Communications
- Coordination via zookeeper
- Synchronize files via http POST
Start Service
$ uvicorn pullframe.sender:app
Example
Load / Save
from pullframe import pullframe
with pullframe(hosts, directory, sync_timeo 60.0) as pf:
# set start as None if you want to load from the very beginning
# set end as None if you want to load from the very ending
df = pf.load(name, start: Optional[datetime], end: Optional[datetime])
pf.save(name, df)
TODO
- Check cache discrepency/corruption between nodes.
- Stable backup using Amazon S3 / Google cloud storage.
- Replace zookeeper client to zake (fake kazoo client) during tests.
Requirements
- zookeeper
- the dataframe's index should be datetime
- linux
- python>=3.7
- python = "^3.7"
- pandas = "^1.0.0"
- tables = "^3.6.1"
- fastapi = "^0.58.0"
- aiofiles = "^0.5.0"
- kazoo = "^2.7.0"
Free software: MIT License
Credits
- This package was created with Cookiecutter
- Also was copied and modified from the audreyr/cookiecutter-pypackage project template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pullframe-0.1.0.tar.gz
(10.4 kB
view hashes)
Built Distribution
pullframe-0.1.0-py3-none-any.whl
(13.9 kB
view hashes)
Close
Hashes for pullframe-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8225b4d37ae4cd7f43d2506b569cc069bb79b07d4b1771aa45fa99191a53f8a1 |
|
MD5 | d5768d6d198149403b2790cd6c968c9a |
|
BLAKE2b-256 | 6e6adb992a3a8d5024bdf19cf5966a127fb37ed18a16a0a7671b7478ae78f28a |