Historical financial data management system with support for equities, indices, and multi-asset data
Project description
FinBase - Historical Financial Data Management
A comprehensive system for managing historical financial time series data across multiple asset classes. Built for quantitative researchers, traders, and financial engineers who need reliable, well-organized market data.
๐ฏ Purpose
FinBase is a data management layer designed to:
- Download and store historical OHLCV data from multiple sources
- Maintain a centralized SQLite database of time series
- Track index constituents
- Manage risk factor groups (equities, indices, FX, rates, commodities)
- Provide a clean API for data access by analysis projects
Philosophy: Separate data acquisition from data analysis. FinBase handles the messy work of downloading, validating, and organizing financial data so your analysis code stays clean.
โจ Features
Core Capabilities
- Multi-Asset Support: Equities, indices, FX (planned), rates (planned), commodities (planned)
- Index Management: Track constituents for SP500, DOW30, NASDAQ-100, FTSE 100, DAX
- Temporal Tracking: Historical point-in-time index composition queries
- Smart Loading: Automatic skip of existing data with resumable downloads
- Rate Limiting: Conservative API throttling to respect data provider limits
- Data Quality: Metadata tracking, audit trails, validation
Index Support (v0.1.0)
| Index | Constituents | Country | Data Source |
|---|---|---|---|
| S&P 500 | 503 | ๐บ๐ธ US | Wikipedia |
| DOW 30 | 30 | ๐บ๐ธ US | Wikipedia |
| NASDAQ-100 | 101 | ๐บ๐ธ US | Wikipedia |
| FTSE 100 | 100 | ๐ฌ๐ง UK | Wikipedia |
| DAX | 41 | ๐ฉ๐ช Germany | Wikipedia |
Data Sources Support
- YFinance: Equity and index data (current)
- FRED API: US Treasury rates, economic indicators (planned)
- Alpha Vantage: FX=, commodity data and alternative equity (planned)
- Polygon.io: Alternative equity data (planned)
๐ Quick Start
Installation
Option 1: Conda (Recommended)
git clone https://github.com/yourusername/finbase.git
cd finbase
conda env create -f environment.yml
conda activate finbase
Option 2: Pip
git clone https://github.com/yourusername/finbase.git
cd finbase
pip install -e .
# Or with extras
pip install -e ".[dev,dashboard]"
Basic Usage
1. Initialize Database
# Creates ~/.finbase/timeseries.db and ~/.finbaserc
python scripts/setup_database.py --init
2. Update Index Constituents
# Get current index memberships from Wikipedia
python scripts/setup_database.py --update-index SP500
python scripts/setup_database.py --update-index DOW30
# Or update all at once
python scripts/setup_database.py --update-all-indices
3. Download Historical Data
# Load price data for all DOW30 constituents
python scripts/setup_database.py --load-index-data DOW30
# Load SP500 from 2020 (faster than full history)
python scripts/setup_database.py --load-index-data SP500 --index-start-date 2020-01-01
# Test with first 10 stocks
python scripts/setup_database.py --load-index-data SP500 --index-max-symbols 10
4. Access Data via API
from finbase import DataClient
client = DataClient()
# Get closing prices for portfolio
portfolio = ['AAPL', 'MSFT', 'GOOGL', 'AMZN']
prices = client.get_closes(portfolio, start='2020-01-01')
# Get all DOW30 constituents
dow30 = client.get_index_constituents('DOW30')
dow30_prices = client.get_closes(dow30['symbol'].tolist())
# Calculate returns
returns = prices.pct_change()
๐ Project Structure
finbase/
โโโ src/ # Source code
โ โโโ client/ # DataClient API for external projects
โ โโโ config/ # Configuration management
โ โโโ data/
โ โ โโโ database/ # TimeSeriesDB, IndexDB, schema
โ โ โโโ loaders/ # EquityLoader (YFinance)
โ โ โโโ parsers/ # Wikipedia parsers
โ โ โโโ risk_factor_groups/ # Risk factor group management
โ โ โโโ validators/ # Data validation
โ โโโ dashboard/ # Optional Streamlit dashboard
โ โโโ utils/ # Logging utilities
โ
โโโ scripts/ # Command-line scripts
โ โโโ setup_database.py # Main data loading script
โ
โโโ data/ # Data files (created on init)
โ โโโ risk_factor_groups/ # JSON group definitions
โ โโโ index_configs/ # Index configuration files
โ
โโโ examples/ # Usage examples
โ โโโ client_api_examples.py
โ โโโ index_management_example.py
โ โโโ load_index_data_example.py
โ
โโโ tests/ # Unit tests
โโโ docs/ # Quick start guides
User space (created on init):
~/.finbase/
โโโ timeseries.db # SQLite database (shared with other projects)
~/.finbaserc # User configuration (YAML)
๐ Documentation
- QUICK_START_INDEX_DATA.md - Loading index data guide
- QUICKSTART_INDEX_MANAGEMENT.md - Managing indices
- DASHBOARD.md - Running the web dashboard
- CHANGELOG.md - Version history
๐ Key Concepts
Database Schema
risk_factors: Master table with metadata
- symbol, asset_class, asset_subclass
- description, country, currency, sector
- data_source (yfinance, fred, etc.)
- frequency, start_date, end_date
timeseries_data: OHLCV price data
- risk_factor_id (FK), date
- open, high, low, close, adj_close, volume
- Optimized indexes for fast queries
indices: Index metadata
- index_code, index_name, country
- data_source, last_updated
index_constituents: Temporal membership tracking
- index_id, symbol, effective_date, end_date
- Slowly changing dimension pattern for historical queries
DataClient API
The recommended way to access data from external projects:
from finbase import DataClient
client = DataClient()
# Discovery
stats = client.get_stats()
symbols = client.list_symbols(asset_class='equity', sector='Technology')
info = client.get_symbol_info('AAPL')
# Data Retrieval (long format)
df = client.get_data(['AAPL', 'MSFT'], start='2020-01-01')
# Data Retrieval (wide format for analysis)
prices = client.get_closes(['AAPL', 'MSFT'], start='2020-01-01')
# Index Queries
sp500 = client.get_index_constituents('SP500')
sp500_2020 = client.get_index_constituents('SP500', as_of_date='2020-01-01')
# Bulk Retrieval
tech_stocks = client.get_by_sector('Technology')
See examples/client_api_examples.py for comprehensive usage.
Database Performance
- SQLite is optimized for <1M records
- Typical portfolio (100 stocks, 20 years) = ~500K records
- For larger datasets, migration to DuckDB planned for v0.3.0
๐ ๏ธ Advanced Usage
Adding New Indices
Create a config file in data/index_configs/:
{
"index_code": "FTSE250",
"index_name": "FTSE 250",
"url": "https://en.wikipedia.org/wiki/FTSE_250_Index",
"country": "GB",
"asset_class": "equity",
"data_source": "wikipedia",
"constituents_table": {
"table_index": 2,
"column_mapping": {
"Company": "company_name",
"Ticker": "symbol"
}
}
}
Then run: python scripts/setup_database.py --update-index FTSE250
Custom Risk Factor Groups
Create JSON files in data/risk_factor_groups/:
{
"group_name": "tech_giants",
"asset_class": "equity",
"asset_subclass": "stock",
"data_source": "yfinance",
"frequency": "daily",
"risk_factors": [
{
"symbol": "AAPL",
"description": "Apple Inc.",
"country": "US",
"currency": "USD",
"sector": "Technology"
}
]
}
Running the Dashboard
# Install dashboard dependencies
pip install -e ".[dashboard]"
# Run Streamlit dashboard
streamlit run dashboard_app.py
๐งช Development
Running Tests
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# With coverage
pytest --cov=src tests/
Project Status
- โ Core database system
- โ Index management (5 major indices)
- โ DataClient API
- โ Smart loading with rate limiting
- โ Dashboard
- โณ FX data support (planned v0.2.0)
- โณ Rates data via FRED (planned v0.2.0)
- โณ Alternative data sources (planned v0.3.0)
- โณ DuckDB migration (planned v0.3.0)
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file finbase-0.1.1.tar.gz.
File metadata
- Download URL: finbase-0.1.1.tar.gz
- Upload date:
- Size: 65.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8de7e5c6a881bc50cce1a8f5f30515c23dabef83e22bcbf3ac44a10a41858921
|
|
| MD5 |
192f5b88c76a22a3669c5b279227a924
|
|
| BLAKE2b-256 |
9fb047619c2213b8cb216b1b4e4ab8caac317d28f85f2aa4e55aa923ddf21325
|
File details
Details for the file finbase-0.1.1-py3-none-any.whl.
File metadata
- Download URL: finbase-0.1.1-py3-none-any.whl
- Upload date:
- Size: 49.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc4a86de36749491fc81dc2bd5435f90bd4330be209fa5f113e96ec5849112b2
|
|
| MD5 |
df09366f1753c18d9976156ff45e9064
|
|
| BLAKE2b-256 |
33c122e11fa425a96e39ca2f92fe41cc6a99b1d2c027d53aacb97da20a48ecdc
|