Elegant factor analysis for quantitative finance, built on Polars
Project description
factr
Factor library for quantitative finance built on Polars.
Overview
Composable factor definitions with automatic scope-based execution. Factors wrap Polars expressions with metadata to handle time-series vs cross-sectional operations correctly.
Installation
pip install git+https://github.com/gilvir/factr.git
For development (using uv):
git clone https://github.com/gilvir/factr.git
cd factr
uv pip install -e ".[dev]"
Or with pip:
pip install -e ".[dev]"
Quick Start
Basic Factor Composition
import polars as pl
from factr.datasets import EquityPricing
from factr.pipeline import Pipeline
# Load sample data
data = pl.DataFrame({
'date': ['2024-01-01'] * 3 + ['2024-01-02'] * 3,
'asset': ['AAPL', 'GOOGL', 'MSFT'] * 2,
'close': [150.0, 2800.0, 380.0, 152.0, 2820.0, 382.0],
'volume': [1e6, 2e6, 1.5e6, 1.1e6, 1.9e6, 1.6e6],
})
# Define factors using natural composition
close = EquityPricing.close
returns = close.pct_change(1)
momentum = returns.rolling_sum(252)
ranked = momentum.rank(pct=True)
# Build and run pipeline
pipeline = Pipeline(data).add_factors({
'momentum': momentum,
'rank': ranked,
})
result = pipeline.run()
print(result)
Sector-Neutral Strategy
from factr.datasets import EquityPricing
from factr.universe import Q500US
from factr.pipeline import Pipeline
from factr import factors as F
# Define factors
close = EquityPricing.close
momentum = F.momentum(252, 21) # 252-day momentum, skip last 21 days
volatility = F.volatility(60)
# Sector-neutral ranking
risk_adjusted = momentum / volatility
sector_neutral = risk_adjusted.demean(by='sector')
ranked = sector_neutral.rank(pct=True)
# Build pipeline with universe filter
# Assumes you have a LazyFrame 'prices_lf' with columns: date, asset, close, volume, sector
pipeline = (
Pipeline(prices_lf)
.add_factors({
'sector_neutral_mom': sector_neutral,
'rank': ranked,
})
.screen(Q500US())
)
# Show execution plan
print(pipeline.explain())
# Run pipeline
result = pipeline.run(start_date='2020-01-01')
Dataset Loading
Configure data sources and field transforms using Pydantic-inspired patterns:
from factr.datasets import DataSet, Column
from factr.data import ParquetSource, SQLSource, DataContext
import polars as pl
# Define dataset with field-level configuration
class EquityPricing(DataSet):
# Simple columns
close = Column(pl.Float64)
volume = Column(pl.Int64)
# With alias (source has different name)
market_cap = Column(pl.Float64, alias='mkt_cap', fill_strategy='forward')
# With null filling
sentiment = Column(pl.Float64, fill_null=0.0)
sector = Column(pl.Utf8, default='Unknown', required=False)
class Config:
source = ParquetSource('data/prices.parquet')
date_column = 'date'
entity_column = 'ticker'
# Load data - automatically applies:
# - Column name mappings (aliases)
# - Null filling strategies
# - Defaults for missing columns
prices = EquityPricing.load(start_date='2020-01-01')
# Explicit source binding with DataContext
class Fundamentals(DataSet):
market_cap = Column(pl.Float64)
pe_ratio = Column(pl.Float64)
# No Config - bind source explicitly at runtime
# Bind source and load
ctx = DataContext()
ctx.bind(Fundamentals, ParquetSource('data/fundamentals.parquet'))
funds = ctx.load(Fundamentals, start_date='2020-01-01')
# Use DataContext for complex workflows (with concurrent loading)
ctx = DataContext()
ctx.bind(EquityPricing, ParquetSource('prices.parquet'))
ctx.bind(Fundamentals, SQLSource('db', table='fundamentals'))
# Concurrent collection for better performance
data = ctx.load_many(
EquityPricing,
Fundamentals,
start_date='2020-01-01',
collect=True # Collect all datasets concurrently
)
Key Features:
- Pydantic-inspired Column fields - alias, default, fill_null, validation bounds
- Composition over inheritance - datasets compose columns and sources
- Field-level transformations - automatic null filling, bounds enforcement
- Flexible source configuration - direct source in Config or explicit binding via DataContext
- Concurrent loading - use
collect=Truefor parallel execution - No global state - DataContext is explicit and composable
- Testing-friendly - clone contexts, swap sources easily
See examples/data_loading_example.py for comprehensive examples.
Core Concepts
Factors = Expressions + Scope
from factr.core import Factor, Scope
close = EquityPricing.close # RAW scope
returns = close.pct_change(1) # TIME_SERIES scope
ranked = returns.rank(pct=True) # CROSS_SECTION scope
Scopes:
RAW- raw column dataTIME_SERIES- per-entity operations (rolling windows, shifts)CROSS_SECTION- per-date operations (rank, demean, zscore)
Pipeline
pipeline = Pipeline(data).add_factors({'momentum': momentum, 'rank': ranked})
result = pipeline.run()
Pipeline handles .over() application based on scope automatically.
Datasets
from factr.datasets import EquityPricing, Fundamentals
close = EquityPricing.close
volume = EquityPricing.volume
market_cap = Fundamentals.market_cap
Built-in Factors
from factr import factors as F
momentum = F.momentum(252, 21)
returns = F.returns(1)
sma_50 = F.sma(50)
rsi = F.rsi(14)
Factor Library
The library includes 26+ production-ready financial indicators across multiple categories:
Price & Returns
returns(window=1)- Simple returnslog_returns(window=1)- Logarithmic returnsmomentum(window=252, skip=21)- Price momentumreversal(window=21)- Short-term mean reversion
Technical Indicators
sma(window=20)- Simple moving averageema(window=20)- Exponential moving averagemacd(fast=12, slow=26, signal=9)- MACD with signal line and histogramrsi(window=14)- Relative Strength Indexbollinger_bands(window=20, num_std=2.0)- Bollinger Bands (lower, middle, upper)stochastic(window=14, smooth_k=3, smooth_d=3)- Stochastic Oscillator (%K, %D)atr(window=14)- Average True Range (volatility measure)parabolic_sar()- Parabolic Stop and Reverse
Momentum & Trend
acceleration(window=21)- Price acceleration (2nd derivative)trend_strength(window=63)- Linear regression R-squared
Risk Indicators
volatility(window=21, annualize=True)- Rolling volatilitymax_drawdown(window=252)- Maximum peak-to-trough declinedownside_deviation(window=21, annualize=True)- Semi-deviation (downside only)
Volume Indicators
dollar_volume(window=1)- Price × Volumevwap(window=20)- Volume-Weighted Average Pricevwap_bands(window=20, num_std=2.0)- VWAP with std bandsobv()- On-Balance Volumechaikin_money_flow(window=21)- CMF indicatorvolume_profile(window=21)- Normalized volume
Statistical
autocorrelation(window=21, lag=1)- Rolling autocorrelation
Value Factors
earnings_yield()- 1 / P/E ratiobook_to_market()- 1 / P/B ratioprofit_margin()- Earnings / Revenue
Growth Factors
revenue_growth(window=252)- YoY revenue growthearnings_growth(window=252)- YoY earnings growth
Usage Examples
from factr import factors as F
from factr.datasets import EquityPricing, Fundamentals
# Technical indicators
macd_line, signal, histogram = F.macd(fast=12, slow=26)
percent_k, percent_d = F.stochastic(window=14)
atr_value = F.atr(window=14)
# Risk-adjusted momentum
mom = F.momentum(252, 21)
vol = F.volatility(60)
sharpe = mom / vol
# Volume analysis
obv_factor = F.obv()
cmf = F.chaikin_money_flow(window=21)
lower, vwap_mid, upper = F.vwap_bands(window=20)
# Value investing
earnings_yield = F.earnings_yield(Fundamentals.pe_ratio)
margin = F.profit_margin()
growth = F.earnings_growth(window=252)
# Mean reversion
reversal = F.reversal(window=21) # Negative of recent returns
acf = F.autocorrelation(window=21, lag=1)
# Combine factors
alpha = (
F.momentum(252, 21).zscore() +
F.earnings_yield().zscore() +
F.volume_profile().zscore()
) / 3
Custom Factors
Composing Polars Expressions
from factr import custom
@custom.time_series
def momentum_quality(window: int = 5):
close = EquityPricing.close
momentum = close.pct_change(window)
trend = close > close.shift(1)
return trend * momentum
@custom.cross_section(by='sector')
def sector_neutral_momentum():
mom = F.momentum(252, 21)
return mom.demean()
Custom Python Functions
For calculations that can't be expressed in Polars (e.g., using numpy, scipy, ta-lib):
from factr import custom_factor
from factr.core import Scope
from factr.datasets import EquityPricing
import polars as pl
# Time-series custom factor (per-entity)
# Can use Factor objects or string column names as inputs
@custom_factor(
scope=Scope.TIME_SERIES,
inputs=[EquityPricing.close, EquityPricing.volume] # Factor objects for type safety
)
def custom_indicator(df: pl.DataFrame) -> pl.Series:
"""Uses numpy/scipy for complex calculations."""
import numpy as np
close = df['close'].to_numpy()
volume = df['volume'].to_numpy()
# Your custom logic here
result = np.some_complex_calculation(close, volume)
return pl.Series(result)
# Or use string column names
@custom_factor(scope=Scope.TIME_SERIES, inputs=['close', 'volume'])
def custom_indicator_v2(df: pl.DataFrame) -> pl.Series:
return df['close'] * df['volume']
# Cross-sectional custom factor (per-date)
@custom_factor(scope=Scope.CROSS_SECTION, inputs=['returns'], groupby='sector')
def sector_adjusted(df: pl.DataFrame) -> pl.Series:
"""Custom sector-neutral calculation."""
import numpy as np
returns = df['returns'].to_numpy()
# Apply custom transformation
adjusted = custom_logic(returns)
return pl.Series(adjusted)
# Use in pipeline like any other factor
factor = custom_indicator()
pipeline = Pipeline(data).add_factors({'custom': factor})
result = pipeline.run()
Note: Custom factors use map_batches which breaks Polars' query optimization. Use only when necessary - prefer pure Polars expressions when possible.
Universe Filters
from factr.universe import Q500US, LiquidUniverse
q500 = Q500US()
pipeline.screen(q500)
Architecture
Factor = pl.Expr + Scope metadata
Polars handles expression dependencies via lazy evaluation. We track scope to apply .over() correctly.
factr/
├── core/ # Factor, Filter, Classifier, Scope
├── pipeline.py # Multi-stage orchestration
├── factors.py # Built-in factors
├── datasets.py # Type-safe column access
├── universe.py # Universe filters
└── custom.py # Decorators
Examples
See the examples/ directory for complete runnable examples:
quickstart.py- Get started in 5 minutesfactor_api_example.py- Comprehensive API coveragedata_loading_example.py- Data loading patternssqlite_example.py- SQLite integrationperformance_example.py- Large-scale benchmarking
from factr import factors as F
from factr.pipeline import Pipeline
from factr.universe import Q500US
# Multi-factor alpha combining momentum and value
mom = F.momentum(252, 21)
value = F.earnings_yield()
alpha = (mom + value) / 2
# Build pipeline (assumes you have data loaded)
pipeline = Pipeline(prices_lf).add_factors({'alpha': alpha}).screen(Q500US())
result = pipeline.run(start_date='2020-01-01')
Testing
# Run tests
uv run pytest
# Run tests with coverage
uv run pytest --cov=factr --cov-report=term-missing
# Format code
uv run ruff format factr tests examples
# Lint and auto-fix
uv run ruff check factr tests examples --fix
See CONTRIBUTING.md for detailed development guidelines.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file factr-0.1.0.tar.gz.
File metadata
- Download URL: factr-0.1.0.tar.gz
- Upload date:
- Size: 96.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02e8cadf0b1e7779ebe37e8d9ca4f22a356c6f1fd0419535021d32c2723c40bc
|
|
| MD5 |
ad023a75c23bdaedd9ef61488eea2ef1
|
|
| BLAKE2b-256 |
bc1673827bad2f3197ff7568966d1fc67f57b1cc8dedbe7041beed9f33202652
|
Provenance
The following attestation bundles were made for factr-0.1.0.tar.gz:
Publisher:
publish.yml on Gilvir/factr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
factr-0.1.0.tar.gz -
Subject digest:
02e8cadf0b1e7779ebe37e8d9ca4f22a356c6f1fd0419535021d32c2723c40bc - Sigstore transparency entry: 1092523301
- Sigstore integration time:
-
Permalink:
Gilvir/factr@a06e8af6aea8216e0baa325d492c514853192249 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Gilvir
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a06e8af6aea8216e0baa325d492c514853192249 -
Trigger Event:
release
-
Statement type:
File details
Details for the file factr-0.1.0-py3-none-any.whl.
File metadata
- Download URL: factr-0.1.0-py3-none-any.whl
- Upload date:
- Size: 39.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdc53e094246071db64642d398b11768ff4f905ba1d30a44f4269fd4a59d8887
|
|
| MD5 |
934c381a076af7a0940db30697afcdc0
|
|
| BLAKE2b-256 |
3c3693b41c5d138a2ad29aedc89c84ea900d6852f2e1a7c6feb9c0e4bb070f99
|
Provenance
The following attestation bundles were made for factr-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Gilvir/factr
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
factr-0.1.0-py3-none-any.whl -
Subject digest:
bdc53e094246071db64642d398b11768ff4f905ba1d30a44f4269fd4a59d8887 - Sigstore transparency entry: 1092523304
- Sigstore integration time:
-
Permalink:
Gilvir/factr@a06e8af6aea8216e0baa325d492c514853192249 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/Gilvir
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a06e8af6aea8216e0baa325d492c514853192249 -
Trigger Event:
release
-
Statement type: