ETF holdings scraper
Project description
ETFTracker
ETFTracker is a small Python package for collecting ETF holdings data, normalizing the results, and storing the holdings in a local DuckDB database for reuse.
Features
- Scrapes ETF holdings tables into a dataframe with
symbol,name,weight,shares_owned, andshares_value. - Normalizes holdings data into a consistent schema with
etf_tickerandcollected_at. - Stores companies separately from ETF holdings so repeated company metadata is not copied for every ETF.
- Stores holdings in DuckDB with parsed numeric fields for percentages, share counts, and dollar values.
- Uses
PRIMARY KEY (etf_ticker, collected_at, symbol)so ETF holdings can be tracked historically across scrape runs. - Reads holdings back from DuckDB with a simple ETF ticker filter.
- Deletes individual holdings rows by
etf_tickerandsymbol. - Refreshes stale data automatically based on a configurable
stale_threshold. - Supports fetching one ETF or multiple ETFs in a single call.
Installation
pip install etftracker
Command Line
After installation, the package exposes an etftracker command.
Fetch one ticker and print the normalized holdings to stdout:
etftracker SPY --headless
Fetch multiple tickers and write the result to CSV:
etftracker SPY VTI VOO --headless --csv holdings.csv
Force a fresh scrape instead of using cached database rows:
etftracker SPY --headless --force-update
Requirements
- Python 3.12+
- Firefox installed for Selenium WebDriver usage
- A working geckodriver / Selenium Firefox setup on the machine
Quick Start
import datetime as dt
from etftracker import get_etf_holdings
df = get_etf_holdings("SPY", stale_threshold=dt.timedelta(days=7))
print(df.head())
Fetch multiple ETFs:
from etftracker import get_etf_holdings
df = get_etf_holdings(["SPY", "VTI", "VOO"])
print(df[["etf_ticker", "symbol", "name"]].head())
Database Helpers
Read holdings for a single ETF:
from etftracker import read_holdings
df = read_holdings("SPY")
Read historical holdings for a single ETF:
from etftracker import read_holdings_history
df = read_holdings_history("SPY")
Delete a single holding:
from etftracker import delete_holding
deleted = delete_holding("SPY", "AAPL")
print(deleted)
Delete all holdings for one or more ETFs:
from etftracker import delete_etf_holdings
deleted = delete_etf_holdings(["SPY", "VTI"])
print(deleted)
Delete every holding row in the database:
from etftracker import delete_all_holdings
deleted = delete_all_holdings()
print(deleted)
Save a freshly scraped dataframe manually:
from etftracker import pipeline, save_holdings
df = pipeline("SPY")
save_holdings(df, "SPY")
For a quick script entry point, the repository also includes main.py, but the
packaged interface is the etftracker command above.
Data Model
The DuckDB schema stores company metadata once:
CREATE TABLE companies (
symbol TEXT PRIMARY KEY,
name TEXT NOT NULL
);
ETF holdings reference those company symbols and keep one row per ETF, scrape timestamp, and holding symbol:
CREATE TABLE etf_holdings (
etf_ticker TEXT NOT NULL,
collected_at TIMESTAMP NOT NULL,
symbol TEXT NOT NULL,
weight TEXT,
weight_pct DOUBLE,
shares_owned TEXT,
shares_owned_num DOUBLE,
shares_value TEXT,
shares_value_num DOUBLE,
PRIMARY KEY (etf_ticker, collected_at, symbol),
FOREIGN KEY (symbol) REFERENCES companies(symbol)
);
read_holdings() returns the latest snapshot joined with company names. The
returned dataframe includes:
etf_tickercollected_atsymbolnameweightweight_pctshares_ownedshares_owned_numshares_valueshares_value_num
Database Location
By default the DuckDB database is stored in a user config directory:
- Linux/BSD:
~/.config/etftracker/etftracker.duckdb - macOS:
~/Library/Application Support/etftracker/etftracker.duckdb - Windows:
%APPDATA%/etftracker/etftracker.duckdb
You can override that path with the ETFTRACKER_DB environment variable.
License
This repository's source code is licensed under the MIT License. See LICENSE.
Third-Party Data Notice
This package license applies only to the code in this repository. It does not grant any rights to data obtained from third-party websites or services.
Users are responsible for ensuring their use of this package complies with the terms of service, contracts, licenses, and other restrictions that apply to any data source they access.
This repository does not ship third-party holdings datasets, cached database files, or redistributed source data as part of the package itself.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file etftracker-0.1.0.tar.gz.
File metadata
- Download URL: etftracker-0.1.0.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9f81353d385fb4a5b623935f0ef9f62c2c776d8aa96c97180725637dc9713b7
|
|
| MD5 |
61edb8b2283aca48141d7091b8988210
|
|
| BLAKE2b-256 |
1839c0ed3055beba34179c38a09c9054a92279d25b74f4feb72bf6fcfc5f202d
|
File details
Details for the file etftracker-0.1.0-py3-none-any.whl.
File metadata
- Download URL: etftracker-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc5bb6b38afd15e30bea498be17188aa04bec5a610c1a9666955bd358e4ca772
|
|
| MD5 |
f2e0a6da2018bd160483358e62ef2974
|
|
| BLAKE2b-256 |
18bc39b33267140fcf0449a7d007e27f8017bcf79cfd8d326d39b61e544f58b4
|