Skip to main content

Python utility for fetching any historical data using caching.

Project description

Cached Historical Data Fetcher

CI Status Documentation Status Test coverage percentage

Poetry black pre-commit

PyPI Version Supported Python versions License

Python utility for fetching any historical data using caching. Suitable for acquiring data that is added frequently and incrementally, e.g. news, posts, weather, etc.

Installation

Install this via pip (or your favourite package manager):

pip install cached-historical-data-fetcher

Features

  • Uses cache built on top of joblib, lz4 and aiofiles.
  • Ready to use with asyncio, aiohttp, aiohttp-client-cache. Uses asyncio.gather for fetching chunks in parallel. (For performance reasons, only using aiohttp-client-cache is probably not a good idea when fetching large number of chunks (web requests).)
  • Based on pandas and supports MultiIndex.

Usage

HistoricalDataCache, HistoricalDataCacheWithChunk and HistoricalDataCacheWithFixedChunk

Override get_one() method to fetch data for one chunk. update() method will call get_one() for each unfetched chunk and concatenate results, then save to cache.

from cached_historical_data_fetcher import HistoricalDataCacheWithFixedChunk
from pandas import DataFrame, Timedelta, Timestamp
from typing import Any

# define cache class
class MyCacheWithFixedChunk(HistoricalDataCacheWithFixedChunk[Timestamp, Timedelta, Any]):
    delay_seconds = 0.0 # delay between chunks (requests) in seconds
    interval = Timedelta(days=1) # interval between chunks, can be any type
    start_index = Timestamp.utcnow().floor("10D") # start index, can be any type

    async def get_one(self, start: Timestamp, *args: Any, **kwargs: Any) -> DataFrame:
        """Fetch data for one chunk."""
        return DataFrame({"day": [start.day]}, index=[start])

# get complete data
print(await MyCacheWithFixedChunk().update())
                           day
2023-09-30 00:00:00+00:00   30
2023-10-01 00:00:00+00:00    1
2023-10-02 00:00:00+00:00    2

See example.ipynb for real-world example.

IdCacheWithFixedChunk

Override get_one method to fetch data for one chunk in the same way as in HistoricalDataCacheWithFixedChunk. After updating ids by calling set_ids(), update() method will call get_one() for every unfetched id and concatenate results, then save to cache.

from cached_historical_data_fetcher import IdCacheWithFixedChunk
from pandas import DataFrame
from typing import Any

class MyIdCache(IdCacheWithFixedChunk[str, Any]):
    delay_seconds = 0.0 # delay between chunks (requests) in seconds

    async def get_one(self, start: str, *args: Any, **kwargs: Any) -> DataFrame:
        """Fetch data for one chunk."""
        return DataFrame({"id+hello": [start + "+hello"]}, index=[start])

cache = MyIdCache() # create cache
cache.set_ids(["a"]) # set ids
cache.set_ids(["b"]) # set ids again, now `cache.ids` is ["a", "b"]
print(await cache.update(reload=True)) # discard previous cache and fetch again
cache.set_ids(["b", "c"]) # set ids again, now `cache.ids` is ["a", "b", "c"]
print(await cache.update()) # fetch only new data
       id+hello
    a   a+hello
    b   b+hello
       id+hello
    a   a+hello
    b   b+hello
    c   c+hello

Contributors ✨

Thanks goes to these wonderful people (emoji key):

This project follows the all-contributors specification. Contributions of any kind welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cached_historical_data_fetcher-0.2.23.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file cached_historical_data_fetcher-0.2.23.tar.gz.

File metadata

File hashes

Hashes for cached_historical_data_fetcher-0.2.23.tar.gz
Algorithm Hash digest
SHA256 01effc2cad2a7e12989153aba66c6f3637d10df23caaf81d2e72b59cbef9b5c8
MD5 a84ec9c16bfbf4a8eae58def3786e93c
BLAKE2b-256 74bf6cd5f15d2d0ef9d84961e291a3bb51c9f184d8d4145eb2ad923da5528535

See more details on using hashes here.

File details

Details for the file cached_historical_data_fetcher-0.2.23-py3-none-any.whl.

File metadata

File hashes

Hashes for cached_historical_data_fetcher-0.2.23-py3-none-any.whl
Algorithm Hash digest
SHA256 da8f9998a762350d9ee95ddb20185c49e96d1ef285010f82cabfa9e0df8570e7
MD5 c44a3de0b29d793583fea7cbea9d00af
BLAKE2b-256 7a6b2d6fbd5ed63b4f98682f2ffa89e6b3216f604aef0c0f8d341dabfa230807

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page