Skip to main content

Financial data processing, feature engineering and AI agent toolkit for Python

Project description

finasys

From raw market data to ML-ready features in five lines of code.

PyPI Tests Coverage License Python

Documentation: finasys Docs


finasys is a toolkit for financial data processing — not manual wrangling — for ML pipelines and AI agents. It lets you go from raw market data to production-ready features in a few lines of code, whether you're building trading models, running portfolio analysis, or powering financial AI agents.

finasys is Polars-first — every indicator and feature runs as a native Polars expression, making it 10-100x faster than pandas-based alternatives with zero C dependencies (no ta-lib build headaches). It supports 37+ international markets, crypto, forex, commodities, and macro indicators out of the box. Learn more via our official documentation or start contributing via this GitHub repo.

Quick Start

import finasys as fs

# Load stock data (auto-cached with DuckDB)
df = fs.load("AAPL", start="2024-01-01")

# Add technical indicators + returns in one call
df = fs.features.add_all(df)

# Generate an LLM-ready summary
print(fs.agents.summarize(df))

Install

pip install finasys

Optional extras:

pip install finasys[langchain]   # LangChain tool integration
pip install finasys[pandas]      # Pandas interop
pip install finasys[all]         # Everything

Features

Data Sources (fs.load())

  • Single fs.load() entry point for Yahoo Finance, CSV, and Parquet files
  • Standardized OHLCV column names across all sources
  • DuckDB-backed local caching (second call is instant)
  • Multi-symbol fetching with automatic alignment
df = fs.load("AAPL", start="2024-01-01")
df = fs.load(["AAPL", "GOOGL", "MSFT"], start="2024-01-01")
df = fs.load("./data/prices.csv")

Feature Engineering (fs.features)

  • 15+ technical indicators: RSI, MACD, Bollinger Bands, ATR, VWAP, OBV, Stochastic, ADX, CCI, Williams %R, MFI, ROC, Momentum
  • Returns: simple, log, cumulative, drawdown
  • Rolling statistics: mean, std, min, max, skew, z-score
  • Lag features with built-in look-ahead bias protection
  • Calendar features: day of week, month, quarter
  • Cross-sectional: rank, percentile, z-score across symbols

All implemented in pure Polars expressions -- no ta-lib C dependency, 10-100x faster than pandas-ta.

df = fs.features.rsi(df, period=14)
df = fs.features.macd(df)
df = fs.features.returns(df, periods=[1, 5, 21])

Target / Label Engineering (fs.features)

  • Forward returns for regression targets
  • Ternary classification labels (up/flat/down) with configurable thresholds
  • Triple-barrier labeling (Lopez de Prado method) -- the gold standard for financial ML
  • Volatility-adjusted labels that adapt to the current regime
# Forward returns for regression
df = fs.features.forward_returns(df, periods=[1, 5])

# Classification labels
df = fs.features.classify_returns(df, period=5, thresholds=(-0.01, 0.01))

# Triple-barrier method
df = fs.features.triple_barrier_labels(df, profit_take=0.02, stop_loss=0.02, max_holding=10)

# Volatility-adjusted labels (adapts to regime)
df = fs.features.volatility_adjusted_labels(df, period=5, vol_multiplier=1.0)

Distribution Features (fs.features)

  • Rolling kurtosis, skewness, tail ratio -- capture fat-tail dynamics
  • Rolling Jarque-Bera normality test
  • Z-score of returns vs rolling distribution
df = fs.features.rolling_kurtosis(df, window=30)
df = fs.features.rolling_skewness(df, window=30)
df = fs.features.tail_ratio(df, window=30)
df = fs.features.zscore_returns(df, window=30)

Market Regime Features (fs.features)

  • Volatility regimes from fast/slow rolling volatility
  • Trend strength via rolling Hurst-style approximation
  • Combined market states such as trending/high-volatility or ranging/low-volatility
  • Breakout flags and strength scores
df = fs.features.volatility_regime(df, fast_window=21, slow_window=63)
df = fs.features.trend_strength(df, window=63)
df = fs.features.market_state(df)
df = fs.features.breakout_detection(df, window=20)

Risk & Performance Metrics (fs.stats)

  • Sharpe, Sortino, Calmar ratios
  • Value at Risk (historical, parametric, Cornish-Fisher)
  • Conditional VaR (Expected Shortfall)
  • CAPM alpha/beta, information ratio
  • Max drawdown duration tracking
  • Dual mode: scalar for reporting, rolling columns for ML features
# Scalar metrics (whole-series)
sharpe = fs.stats.sharpe_ratio(df)                         # => 1.47
var = fs.stats.value_at_risk(df, confidence=0.95)           # => -0.0216
cvar = fs.stats.cvar(df, confidence=0.95)                   # => -0.0285

# Rolling metrics (ML features)
df = fs.stats.sharpe_ratio(df, window=63)                   # adds sharpe_63
df = fs.stats.value_at_risk(df, window=63)                  # adds var_63

Portfolio Analytics (fs.portfolio)

  • Correlation and covariance matrices for multi-symbol DataFrames
  • Pairwise rolling correlation
  • Weighted and equal-weight portfolio returns
  • Minimum-variance portfolio weights
df = fs.load(["AAPL", "GOOGL", "MSFT"], start="2024-01-01")
corr = fs.portfolio.correlation_matrix(df)
portfolio = fs.portfolio.equal_weight_returns(df)
weights = fs.portfolio.minimum_variance_weights(df)

Data Quality Checks (fs.quality)

  • Missing business-day gaps
  • Outlier flags
  • Suspected split flags
  • Completeness reports with nulls, duplicates, zero-volume days, gaps, and flags
gaps = fs.quality.detect_gaps(df)
df = fs.quality.flag_outliers(df)
df = fs.quality.detect_splits(df)
report = fs.quality.completeness_report(df)

Smart Profiler (fs.profiler)

  • One-call data quality assessment for financial time series
  • Detects: missing dates, price outliers, suspected stock splits, zero-volume days
  • Distribution analysis: skewness, kurtosis, Jarque-Bera normality test, tail ratio
  • LLM-ready text summaries and JSON-serializable structured reports
# Text summary (great for LLM system prompts)
print(fs.profiler.profile_summary(df))
# DATA PROFILE | 252 rows x 7 columns
# Quality issues: 9 missing dates; 11 price outliers
# Returns distribution: skew=0.501, kurtosis=3.647, non-normal (JB p=0.0000)

# Full structured report
report = fs.profiler.profile(df)
report.quality.missing_dates      # ['2024-01-15', '2024-02-19', ...]
report.distribution.is_normal     # False
report.to_dict()                  # JSON-serializable

AI Agent Tools (fs.agents)

  • LLM-ready summaries of financial DataFrames
  • Tool definitions in OpenAI function-calling format
  • Extended tools for risk reports, portfolio analysis, stock screening, quality checks, and profile summaries
  • Context extraction for RAG-style usage
  • Schema descriptions for system prompts
  • LangChain integration (optional)
summary = fs.agents.summarize(df)
tools = fs.agents.tools(symbols=["AAPL", "GOOGL"])

from finasys.agents.langchain import get_tools
lc_tools = get_tools(symbols=["AAPL"])

Composable Pipelines (fs.FeatureSet)

Serializable, reproducible feature pipelines with 21 built-in step classes.

pipeline = fs.FeatureSet([
    fs.features.RSI(period=14),
    fs.features.Returns(periods=[1, 5, 21]),
    fs.features.RollingStats(windows=[5, 21]),
    fs.features.RollingKurtosis(window=30),
    fs.features.VolatilityRegime(),
    fs.features.ForwardReturns(periods=[1, 5]),
    fs.features.TripleBarrier(profit_take=0.02, stop_loss=0.02),
])
df = pipeline.transform(df)
pipeline.save("pipeline.json")  # version control your feature engineering

Why finasys?

finasys pandas-ta ta-lib
Engine Polars (fast) pandas (slow) C library
Install pip install finasys pip install pandas-ta Requires C build tools
ML Targets Triple-barrier, vol-adjusted labels None None
Risk Metrics Sharpe, VaR, CVaR, alpha/beta None None
Data Profiling Financial-specific quality checks None None
AI Agent support Built-in None None
Caching DuckDB auto-cache None None
Look-ahead protection Built-in None None

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finasys-0.1.4.tar.gz (51.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

finasys-0.1.4-py3-none-any.whl (61.9 kB view details)

Uploaded Python 3

File details

Details for the file finasys-0.1.4.tar.gz.

File metadata

  • Download URL: finasys-0.1.4.tar.gz
  • Upload date:
  • Size: 51.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for finasys-0.1.4.tar.gz
Algorithm Hash digest
SHA256 c64c10606eec92b815ec6ea008719191fe80043dbf0b5268755de57216f6d08f
MD5 ce5140a50142e8407a28fae117ddc9f7
BLAKE2b-256 722e27d7f45a6a4a745dfaf3c15eb56968ac78f201ec6a3893fd8d4f89e28ee4

See more details on using hashes here.

File details

Details for the file finasys-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: finasys-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 61.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for finasys-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 78bb2ed6abf0eb32d3e9c7be6bb97cc40dfbb85cb3c1870ef2c531d6596bff32
MD5 856311a9d2aac027b40cbc5ee503ba7b
BLAKE2b-256 6665943d3f47c96ed8fe23fef1efd02bc42bd350f1be2175f675a127bb34d228

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page