Skip to main content

Historical financial data management system with support for equities, indices, and multi-asset data

Project description

FinBase - Historical Financial Data Management

Python 3.12 License: MIT

A comprehensive system for managing historical financial time series data across multiple asset classes. Built for quantitative researchers, traders, and financial engineers who need reliable, well-organized market data.

๐ŸŽฏ Purpose

FinBase is a data management layer designed to:

  • Download and store historical OHLCV data from multiple sources
  • Maintain a centralized SQLite database of time series
  • Track index constituents
  • Manage risk factor groups (equities, indices, FX, rates, commodities)
  • Provide a clean API for data access by analysis projects

Philosophy: Separate data acquisition from data analysis. FinBase handles the messy work of downloading, validating, and organizing financial data so your analysis code stays clean.

โœจ Features

Core Capabilities

  • Multi-Asset Support: Equities, indices, FX (planned), rates (planned), commodities (planned)
  • Index Management: Track constituents for SP500, DOW30, NASDAQ-100, FTSE 100, DAX
  • Temporal Tracking: Historical point-in-time index composition queries
  • Smart Loading: Automatic skip of existing data with resumable downloads
  • Rate Limiting: Conservative API throttling to respect data provider limits
  • Data Quality: Metadata tracking, audit trails, validation

Index Support (v0.1.0)

Index Constituents Country Data Source
S&P 500 503 ๐Ÿ‡บ๐Ÿ‡ธ US Wikipedia
DOW 30 30 ๐Ÿ‡บ๐Ÿ‡ธ US Wikipedia
NASDAQ-100 101 ๐Ÿ‡บ๐Ÿ‡ธ US Wikipedia
FTSE 100 100 ๐Ÿ‡ฌ๐Ÿ‡ง UK Wikipedia
DAX 41 ๐Ÿ‡ฉ๐Ÿ‡ช Germany Wikipedia

Data Sources Support

  • YFinance: Equity and index data (current)
  • FRED API: US Treasury rates, economic indicators (planned)
  • Alpha Vantage: FX=, commodity data and alternative equity (planned)
  • Polygon.io: Alternative equity data (planned)

๐Ÿš€ Quick Start

Installation

Option 1: Conda (Recommended)

git clone https://github.com/yourusername/finbase.git
cd finbase
conda env create -f environment.yml
conda activate finbase

Option 2: Pip

git clone https://github.com/yourusername/finbase.git
cd finbase
pip install -e .

# Or with extras
pip install -e ".[dev,dashboard]"

Basic Usage

1. Initialize Database

# Creates ~/.finbase/timeseries.db and ~/.finbaserc
python scripts/setup_database.py --init

2. Update Index Constituents

# Get current index memberships from Wikipedia
python scripts/setup_database.py --update-index SP500
python scripts/setup_database.py --update-index DOW30

# Or update all at once
python scripts/setup_database.py --update-all-indices

3. Download Historical Data

# Load price data for all DOW30 constituents
python scripts/setup_database.py --load-index-data DOW30

# Load SP500 from 2020 (faster than full history)
python scripts/setup_database.py --load-index-data SP500 --index-start-date 2020-01-01

# Test with first 10 stocks
python scripts/setup_database.py --load-index-data SP500 --index-max-symbols 10

4. Access Data via API

from finbase import DataClient

client = DataClient()

# Get closing prices for portfolio
portfolio = ['AAPL', 'MSFT', 'GOOGL', 'AMZN']
prices = client.get_closes(portfolio, start='2020-01-01')

# Get all DOW30 constituents
dow30 = client.get_index_constituents('DOW30')
dow30_prices = client.get_closes(dow30['symbol'].tolist())

# Calculate returns
returns = prices.pct_change()

๐Ÿ“Š Project Structure

finbase/
โ”œโ”€โ”€ src/                          # Source code
โ”‚   โ”œโ”€โ”€ client/                   # DataClient API for external projects
โ”‚   โ”œโ”€โ”€ config/                   # Configuration management
โ”‚   โ”œโ”€โ”€ data/
โ”‚   โ”‚   โ”œโ”€โ”€ database/             # TimeSeriesDB, IndexDB, schema
โ”‚   โ”‚   โ”œโ”€โ”€ loaders/              # EquityLoader (YFinance)
โ”‚   โ”‚   โ”œโ”€โ”€ parsers/              # Wikipedia parsers
โ”‚   โ”‚   โ”œโ”€โ”€ risk_factor_groups/   # Risk factor group management
โ”‚   โ”‚   โ””โ”€โ”€ validators/           # Data validation
โ”‚   โ”œโ”€โ”€ dashboard/                # Optional Streamlit dashboard
โ”‚   โ””โ”€โ”€ utils/                    # Logging utilities
โ”‚
โ”œโ”€โ”€ scripts/                      # Command-line scripts
โ”‚   โ””โ”€โ”€ setup_database.py         # Main data loading script
โ”‚
โ”œโ”€โ”€ data/                         # Data files (created on init)
โ”‚   โ”œโ”€โ”€ risk_factor_groups/       # JSON group definitions
โ”‚   โ””โ”€โ”€ index_configs/            # Index configuration files
โ”‚
โ”œโ”€โ”€ examples/                     # Usage examples
โ”‚   โ”œโ”€โ”€ client_api_examples.py
โ”‚   โ”œโ”€โ”€ index_management_example.py
โ”‚   โ””โ”€โ”€ load_index_data_example.py
โ”‚
โ”œโ”€โ”€ tests/                        # Unit tests
โ””โ”€โ”€ docs/                         # Quick start guides

User space (created on init):
~/.finbase/
โ””โ”€โ”€ timeseries.db                 # SQLite database (shared with other projects)
~/.finbaserc                      # User configuration (YAML)

๐Ÿ“– Documentation

๐Ÿ”‘ Key Concepts

Database Schema

risk_factors: Master table with metadata

  • symbol, asset_class, asset_subclass
  • description, country, currency, sector
  • data_source (yfinance, fred, etc.)
  • frequency, start_date, end_date

timeseries_data: OHLCV price data

  • risk_factor_id (FK), date
  • open, high, low, close, adj_close, volume
  • Optimized indexes for fast queries

indices: Index metadata

  • index_code, index_name, country
  • data_source, last_updated

index_constituents: Temporal membership tracking

  • index_id, symbol, effective_date, end_date
  • Slowly changing dimension pattern for historical queries

DataClient API

The recommended way to access data from external projects:

from finbase import DataClient

client = DataClient()

# Discovery
stats = client.get_stats()
symbols = client.list_symbols(asset_class='equity', sector='Technology')
info = client.get_symbol_info('AAPL')

# Data Retrieval (long format)
df = client.get_data(['AAPL', 'MSFT'], start='2020-01-01')

# Data Retrieval (wide format for analysis)
prices = client.get_closes(['AAPL', 'MSFT'], start='2020-01-01')

# Index Queries
sp500 = client.get_index_constituents('SP500')
sp500_2020 = client.get_index_constituents('SP500', as_of_date='2020-01-01')

# Bulk Retrieval
tech_stocks = client.get_by_sector('Technology')

See examples/client_api_examples.py for comprehensive usage.

Database Performance

  • SQLite is optimized for <1M records
  • Typical portfolio (100 stocks, 20 years) = ~500K records
  • For larger datasets, migration to DuckDB planned for v0.3.0

๐Ÿ› ๏ธ Advanced Usage

Adding New Indices

Create a config file in data/index_configs/:

{
  "index_code": "FTSE250",
  "index_name": "FTSE 250",
  "url": "https://en.wikipedia.org/wiki/FTSE_250_Index",
  "country": "GB",
  "asset_class": "equity",
  "data_source": "wikipedia",
  "constituents_table": {
    "table_index": 2,
    "column_mapping": {
      "Company": "company_name",
      "Ticker": "symbol"
    }
  }
}

Then run: python scripts/setup_database.py --update-index FTSE250

Custom Risk Factor Groups

Create JSON files in data/risk_factor_groups/:

{
  "group_name": "tech_giants",
  "asset_class": "equity",
  "asset_subclass": "stock",
  "data_source": "yfinance",
  "frequency": "daily",
  "risk_factors": [
    {
      "symbol": "AAPL",
      "description": "Apple Inc.",
      "country": "US",
      "currency": "USD",
      "sector": "Technology"
    }
  ]
}

Running the Dashboard

# Install dashboard dependencies
pip install -e ".[dashboard]"

# Run Streamlit dashboard
streamlit run dashboard_app.py

๐Ÿงช Development

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# With coverage
pytest --cov=src tests/

Project Status

  • โœ… Core database system
  • โœ… Index management (5 major indices)
  • โœ… DataClient API
  • โœ… Smart loading with rate limiting
  • โœ… Dashboard
  • โณ FX data support (planned v0.2.0)
  • โณ Rates data via FRED (planned v0.2.0)
  • โณ Alternative data sources (planned v0.3.0)
  • โณ DuckDB migration (planned v0.3.0)

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finbase-0.1.1.tar.gz (65.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

finbase-0.1.1-py3-none-any.whl (49.4 kB view details)

Uploaded Python 3

File details

Details for the file finbase-0.1.1.tar.gz.

File metadata

  • Download URL: finbase-0.1.1.tar.gz
  • Upload date:
  • Size: 65.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for finbase-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8de7e5c6a881bc50cce1a8f5f30515c23dabef83e22bcbf3ac44a10a41858921
MD5 192f5b88c76a22a3669c5b279227a924
BLAKE2b-256 9fb047619c2213b8cb216b1b4e4ab8caac317d28f85f2aa4e55aa923ddf21325

See more details on using hashes here.

File details

Details for the file finbase-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: finbase-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 49.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for finbase-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fc4a86de36749491fc81dc2bd5435f90bd4330be209fa5f113e96ec5849112b2
MD5 df09366f1753c18d9976156ff45e9064
BLAKE2b-256 33c122e11fa425a96e39ca2f92fe41cc6a99b1d2c027d53aacb97da20a48ecdc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page