Skip to main content

Smart caching wrapper for 'yfinance' module

Project description

yfinance-cache

Persistent caching wrapper for yfinance module. Intelligent caching, not dumb caching of web requests - only update cache where missing/outdated and new data expected.

Only price data caching fully implemented. Everything else is cached once but never updated (unless you delete their files) - I ran out of time to implement e.g. financials cache update.

Persistent cache stored in your user cache folder:

  • Windows = C:/Users/<USER>/AppData/Local/py-yfinance-cache
  • Linux = /home/<USER>/.cache/py-yfinance-cache
  • MacOS = /Users/<USER>/Library/Caches/py-yfinance-cache

Price cache

Idea behind this cache is to minimise fetch frequency and quantity. Yahoo API officially only cares about frequency, but I'm guessing they also care about server load from scrapers.

How is this caching different to caching URL fetches? Simple - they don't adjust cached data for new stock splits or dividends.

What makes the cache smart? Adds 'fetched date' to each price data, then combines with an exchange schedule to know when new price data expected.

Note:

  • '1d' price data always fetched from start date to today (i.e. ignores end), as need to know all dividends and stock splits since start.
  • price repair enabled, to prevent bad Yahoo data corrupting cache. See yfinance Wiki for detail

Financials cache

I planned to implement this after prices cache, but ran out of time. Strategy to minimise fetch frequency is to fetch at/after the next earnings date, inferred from Ticker.calendar and/or Ticker.earnings_dates.

Interface

Interaction almost identical to yfinance. Differences highlighted underneath code:

import yfinance_cache as yfc

msft = yfc.Ticker("MSFT")

# get stock info
msft.info

# get historical market data
hist = msft.history(period="1d")

# bulk download
yfc.download("MSFT AMZN", period="1d")
...
# etc. See yfinance documentation for full API

Refreshing cache

df = msft.history(interval="1d", max_age="1h", trigger_at_market_close=False, ...)

max_age controls when to update cache. If market is still open and max_age time has passed since last fetch, then today's cached price data will be refreshed. If trigger_at_market_close=True then refresh also triggered if market has closed since last fetch. Must be Timedelta or equivalent str, defaults to half of interval.

The returned table has 2 new columns:

  • FetchDate = when data was fetched
  • Final? = true if don't expect future fetches to change

Adjusting price

Price can be adjusted for stock splits, dividends, or both.

msft.history(..., adjust_splits=True, adjust_divs=True)

Verifying cache

Cached prices can be compared against latest Yahoo Finance data, and correct differences:

# Verify prices of one ticker symbol
msft.verify_cached_prices(
	rtol=0.0001,  # relative tolerance for differences
	vol_rtol=0.005,  # relative tolerance specifically for Volume
	correct=False,  # delete incorrect cached data?
	discard_old=False,  # if cached data too old to check (e.g. 30m), assume incorrect and delete?
	quiet=True,  # enable to print nothing, disable to print summary detail of why cached data wrong
	debug=False,  # enable even more detail for debugging 
	debug_interval=None)  # only verify this interval (note: 1d always verified)

# Verify prices of entire cache, ticker symbols processed alphabetically. Recommend using `requests_cache` session.
yfc.verify_cached_tickers_prices(
	session=None,  # recommend you provide a requests_cache here
	rtol=0.0001,
	vol_rtol=0.005,
	correct=False,
	halt_on_fail=True,  # stop verifying on first fail
	resume_from_tkr=None,  # in case you aborted verification, can jump ahead to this ticker symbol. Append '+1' to start AFTER the ticker
	debug_tkr=None,  # only verify this ticker symbol
	debug_interval=None)

With latest version the only genuine differences you should see are tiny Volume differences (~0.5%). Seems Yahoo is still adjusting Volume over 24 hours after that day ended, e.g. updating Monday Volume on Wednesday.

If you see big differences in the OHLC price of recent intervals (last few days), probably Yahoo is wrong! Since fetching that price data on day / day after, Yahoo has messed up their data - at least this is my experience. Cross-check against TradingView or stock exchange website.

Performance

For each ticker, YFC basically performs 2 tasks:

1 - check if fetch needed

2 - fetch data and integrate into cache

Throughput on 1 thread decent CPU: task 1 @ ~60/sec, task 2 @ ~5/sec.

Installation

Available on PIP: pip install yfinance_cache

Limitations

  • only price data is checked if refresh needed
  • intraday pre/post price data not available

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yfinance-cache-0.4.8.tar.gz (130.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yfinance_cache-0.4.8-py3-none-any.whl (151.6 kB view details)

Uploaded Python 3

File details

Details for the file yfinance-cache-0.4.8.tar.gz.

File metadata

  • Download URL: yfinance-cache-0.4.8.tar.gz
  • Upload date:
  • Size: 130.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for yfinance-cache-0.4.8.tar.gz
Algorithm Hash digest
SHA256 0c8ef20117462f0e91aeb083a0a5645d5748e0919aeecb8ac78ff8045a338529
MD5 cf298bef6e306be51e8731da82aae54b
BLAKE2b-256 3cc544838ecc7292c2be171a09aef708e5f4460234617ddbd748c947f9aa1028

See more details on using hashes here.

File details

Details for the file yfinance_cache-0.4.8-py3-none-any.whl.

File metadata

  • Download URL: yfinance_cache-0.4.8-py3-none-any.whl
  • Upload date:
  • Size: 151.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for yfinance_cache-0.4.8-py3-none-any.whl
Algorithm Hash digest
SHA256 5aa88bab5ecfe5a749f226ec2d18bf6de05e46c3b9c5d2e7a9c609a727ac19e0
MD5 f6e94b5480d311d332af7b2b7613df25
BLAKE2b-256 6f0f68030fa7710e94ce884b6cfc610af7cad75f555a2a34d37bb520fcfc97cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page