Python utility for fetching any historical data using caching.
Project description
Cached Historical Data Fetcher
Python utility for fetching any historical data using caching. Suitable for acquiring data that is added frequently and incrementally, e.g. news, posts, weather, etc.
Installation
Install this via pip (or your favourite package manager):
pip install cached-historical-data-fetcher
Features
- Uses cache built on top of
joblib
,lz4
andaiofiles
. - Ready to use with
asyncio
,aiohttp
,aiohttp-client-cache
. Usesasyncio.gather
for fetching chunks in parallel. (For performance reasons, only usingaiohttp-client-cache
is probably not a good idea when fetching large number of chunks (web requests).) - Based on
pandas
and supportsMultiIndex
.
Usage
HistoricalDataCache
, HistoricalDataCacheWithChunk
and HistoricalDataCacheWithFixedChunk
Override get_one()
method to fetch data for one chunk. update()
method will call get_one()
for each unfetched chunk and concatenate results, then save to cache.
from cached_historical_data_fetcher import HistoricalDataCacheWithFixedChunk
from pandas import DataFrame, Timedelta, Timestamp
from typing import Any
# define cache class
class MyCacheWithFixedChunk(HistoricalDataCacheWithFixedChunk[Timestamp, Timedelta, Any]):
delay_seconds = 0.0 # delay between chunks (requests) in seconds
interval = Timedelta(days=1) # interval between chunks, can be any type
start_index = Timestamp.utcnow().floor("10D") # start index, can be any type
async def get_one(self, start: Timestamp, *args: Any, **kwargs: Any) -> DataFrame:
"""Fetch data for one chunk."""
return DataFrame({"day": [start.day]}, index=[start])
# get complete data
print(await MyCacheWithFixedChunk().update())
day
2023-09-30 00:00:00+00:00 30
2023-10-01 00:00:00+00:00 1
2023-10-02 00:00:00+00:00 2
See example.ipynb for real-world example.
IdCacheWithFixedChunk
Override get_one
method to fetch data for one chunk in the same way as in HistoricalDataCacheWithFixedChunk
.
After updating ids
by calling set_ids()
, update()
method will call get_one()
for every unfetched id and concatenate results, then save to cache.
from cached_historical_data_fetcher import IdCacheWithFixedChunk
from pandas import DataFrame
from typing import Any
class MyIdCache(IdCacheWithFixedChunk[str, Any]):
delay_seconds = 0.0 # delay between chunks (requests) in seconds
async def get_one(self, start: str, *args: Any, **kwargs: Any) -> DataFrame:
"""Fetch data for one chunk."""
return DataFrame({"id+hello": [start + "+hello"]}, index=[start])
cache = MyIdCache() # create cache
cache.set_ids(["a"]) # set ids
cache.set_ids(["b"]) # set ids again, now `cache.ids` is ["a", "b"]
print(await cache.update(reload=True)) # discard previous cache and fetch again
cache.set_ids(["b", "c"]) # set ids again, now `cache.ids` is ["a", "b", "c"]
print(await cache.update()) # fetch only new data
id+hello
a a+hello
b b+hello
id+hello
a a+hello
b b+hello
c c+hello
Contributors ✨
Thanks goes to these wonderful people (emoji key):
This project follows the all-contributors specification. Contributions of any kind welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cached_historical_data_fetcher-0.2.24.tar.gz
.
File metadata
- Download URL: cached_historical_data_fetcher-0.2.24.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 629f9552cc629d5f00b67c85ecef1fe9139b829b79dbfa3020a32bbc04101e44 |
|
MD5 | 31f12be217e21d0285d50369d2c79071 |
|
BLAKE2b-256 | 2ea643ed7601d9b586ff927d295a61de85cc51b46ba7251f271acfb29dacd09f |
File details
Details for the file cached_historical_data_fetcher-0.2.24-py3-none-any.whl
.
File metadata
- Download URL: cached_historical_data_fetcher-0.2.24-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60712533621f5957ca9e6b31c1693aa8eaf31320a98fe91fdb89ab0a66e33e1d |
|
MD5 | c16231585388360139ce62500be3d1e9 |
|
BLAKE2b-256 | dd092b16aa68fe463d06792b437d17db444fb05d4d16cc78794570de5692b9de |