Skip to main content

Python utility for fetching any historical data using caching.

Project description

Cached Historical Data Fetcher

CI Status Documentation Status Test coverage percentage

Poetry black pre-commit

PyPI Version Supported Python versions License

Python utility for fetching any historical data using caching. Suitable for acquiring data that is added frequently and incrementally, e.g. news, posts, weather, etc.

Installation

Install this via pip (or your favourite package manager):

pip install cached-historical-data-fetcher

Features

  • Uses cache built on top of joblib, lz4 and aiofiles.
  • Ready to use with asyncio, aiohttp, aiohttp-client-cache. Uses asyncio.gather for fetching chunks in parallel. (For performance reasons, only using aiohttp-client-cache is probably not a good idea when fetching large number of chunks (web requests).)
  • Based on pandas and supports MultiIndex.

Usage

HistoricalDataCache, HistoricalDataCacheWithChunk and HistoricalDataCacheWithFixedChunk

Override get_one() method to fetch data for one chunk. update() method will call get_one() for each unfetched chunk and concatenate results, then save to cache.

from cached_historical_data_fetcher import HistoricalDataCacheWithFixedChunk
from pandas import DataFrame, Timedelta, Timestamp
from typing import Any

# define cache class
class MyCacheWithFixedChunk(HistoricalDataCacheWithFixedChunk[Timestamp, Timedelta, Any]):
    delay_seconds = 0.0 # delay between chunks (requests) in seconds
    interval = Timedelta(days=1) # interval between chunks, can be any type
    start_index = Timestamp.utcnow().floor("10D") # start index, can be any type

    async def get_one(self, start: Timestamp, *args: Any, **kwargs: Any) -> DataFrame:
        """Fetch data for one chunk."""
        return DataFrame({"day": [start.day]}, index=[start])

# get complete data
print(await MyCacheWithFixedChunk().update())
                           day
2023-09-30 00:00:00+00:00   30
2023-10-01 00:00:00+00:00    1
2023-10-02 00:00:00+00:00    2

See example.ipynb for real-world example.

IdCacheWithFixedChunk

Override get_one method to fetch data for one chunk in the same way as in HistoricalDataCacheWithFixedChunk. After updating ids by calling set_ids(), update() method will call get_one() for every unfetched id and concatenate results, then save to cache.

from cached_historical_data_fetcher import IdCacheWithFixedChunk
from pandas import DataFrame
from typing import Any

class MyIdCache(IdCacheWithFixedChunk[str, Any]):
    delay_seconds = 0.0 # delay between chunks (requests) in seconds

    async def get_one(self, start: str, *args: Any, **kwargs: Any) -> DataFrame:
        """Fetch data for one chunk."""
        return DataFrame({"id+hello": [start + "+hello"]}, index=[start])

cache = MyIdCache() # create cache
cache.set_ids(["a"]) # set ids
cache.set_ids(["b"]) # set ids again, now `cache.ids` is ["a", "b"]
print(await cache.update(reload=True)) # discard previous cache and fetch again
cache.set_ids(["b", "c"]) # set ids again, now `cache.ids` is ["a", "b", "c"]
print(await cache.update()) # fetch only new data
       id+hello
    a   a+hello
    b   b+hello
       id+hello
    a   a+hello
    b   b+hello
    c   c+hello

Contributors ✨

Thanks goes to these wonderful people (emoji key):

This project follows the all-contributors specification. Contributions of any kind welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cached_historical_data_fetcher-0.2.31.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cached_historical_data_fetcher-0.2.31-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file cached_historical_data_fetcher-0.2.31.tar.gz.

File metadata

File hashes

Hashes for cached_historical_data_fetcher-0.2.31.tar.gz
Algorithm Hash digest
SHA256 e355bedcc552b8268e1682fc37a87a4bb3ff5372d54feea06bd23c539833b713
MD5 91410062539a5472341b4c8003e0359d
BLAKE2b-256 f845908191adb13d638e6b6413184c410dddf571a5e3c8c1623e1e9bce5c1ad0

See more details on using hashes here.

File details

Details for the file cached_historical_data_fetcher-0.2.31-py3-none-any.whl.

File metadata

File hashes

Hashes for cached_historical_data_fetcher-0.2.31-py3-none-any.whl
Algorithm Hash digest
SHA256 dd71e853b4ec361261569c29ccb8593d8109a3f1be2073d32b953844fd441c5a
MD5 1441eed97bff35dfec17b42c07c5413d
BLAKE2b-256 73c0f780a5b7c17f9a87270f234f70ca58bc567ed633e426ff86f0bbf9993c76

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page