High-performance quantitative factor analysis and purification toolkit
Project description
AlphaPurify: Factor analytics for quants
AlphaPurify Python library for financial data aggregation, factor construction, IC testing, factor return attribution, full-pipeline backtesting, and large-scale experimentation to help quants rapidly validate ideas.
AlphaPurify is comprised of 4 main modules:
alphapurify.FactorAnalyzer— for IC testing and quantile portfolio analysis to evaluate factor predictive ability.alphapurify.AlphaPurifier— for factor preprocessing, including 40+ Winsorization, Neutralization, and Standardization methods.alphapurify.Database— for reading, writing, and aggregating financial and factor datasets.alphapurify.Exposures— for factor correlation analysis and factor-based return attribution.
Why AlphaPurify?
Compared with traditional factor research tools, You merely just need a Dataframe.
• Optimized for single-machine research
Many independent researchers work on a single laptop where memory overflow and slow computation are common issues.
AlphaPurify is designed with optimized caching, vectorized computation, and multiprocessing wherever possible.
For example, a 15-year daily dataset of the CSI 300 universe can complete full factor evaluation — including long-only, long-short, short portfolios and IC analysis — in around 30 seconds on a typical laptop.
• Adaptive to arbitrary bar frequency
AlphaPurify works with any bar frequency (daily, hourly, minute-level, etc.).
Return aggregation automatically adapts to the data frequency, while allowing users to explicitly specify the horizon if needed.
The framework is carefully designed to strictly prevent look-ahead bias.
• Professional factor preprocessing toolkit
AlphaPurify provides 40+ built-in preprocessing methods for factor research, including common operations such as:
- winsorization
- neutralization
- standardization
This allows researchers to rapidly experiment with different factor cleaning pipelines.
• Lightweight high-performance data backend
AlphaPurify integrates a fast Parquet + DuckDB data layer for factor storage and aggregation.
This avoids the need for configuring complex database systems while still providing high-performance querying and fast factor construction workflows.
Quick Start
1.Install with pip
Users can easily install AlphaPurify by pip according to the following command.
pip install alphapurify
Note: pip will install the latest stable AlphaPurify. However, the main branch of AlphaPurify is in active development. If you want to test the latest scripts or functions in the main branch. Please install AlphaPurify with clone.
2.Load your DataFrame
| datetime | symbol | close | volume | factor | momentum_12_1 | vol_60 | beta_252 |
|---|---|---|---|---|---|---|---|
| 2024-01-01 09:30 | AAPL | 189.9 | 120034 | 0.42 | 0.15 | 0.21 | 1.08 |
| 2024-01-01 09:31 | AAPL | 190.0 | 98321 | 0.38 | 0.16 | 0.22 | 1.07 |
| 2024-01-01 09:32 | AAPL | 190.4 | 101245 | 0.41 | 0.17 | 0.23 | 1.06 |
| 2024-01-01 09:30 | MSFT | 378.5 | 84211 | -0.15 | -0.05 | 0.18 | 0.95 |
| 2024-01-01 09:31 | MSFT | 378.9 | 90122 | -0.12 | -0.04 | 0.19 | 0.96 |
| 2024-01-01 09:32 | MSFT | 379.1 | 95433 | -0.08 | -0.03 | 0.20 | 0.97 |
3.Creating reports
from alphapurify import AlphaPurifier, FactorAnalyzer, Pure_Exposures
# preprocess
df = (
AlphaPurifier(df, factor_col="alpha_003")
.winsorize(method="mad")
.standardize(method="zscore")
.to_result()
)
#backtest
FA = FactorAnalyzer(base_df=df,
trade_date_col='datetime',
symbol_col='symbol',
price_col='close',
factor_name='alpha_003')
FA.run()
FA.create_long_return_sheet()
FA.create_long_short_return_sheet()
FA.create_short_return_sheet()
FA.create_single_fac_ic_sheet()
#contributions of other factors
Ex = Pure_Exposures(
base_df=df,
trade_date_col='datetime',
symbol_col='symbol',
price_col='close',
factor_name='alpha_003',
exposure_cols=['momentum_12_1', 'vol_60', 'beta_252'],
)
Ex.run()
Ex.plot_pure_exposures()
Ex.plot_pure_returns()
Ex.plot_pure_exposures_and_returns()
Ex.plot_correlations()
Examples of Outputs
Portfolio for long positions only:
Contributions of other factors:
P.S.
More detailed documentation and examples will be released soon.
Suggestions and improvements are welcome. Feel free to open an issue, submit a pull request, or contact me via email.
Elias Wu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alphapurify-0.1.6.tar.gz.
File metadata
- Download URL: alphapurify-0.1.6.tar.gz
- Upload date:
- Size: 60.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f722e1f7c8297f459a160cceaf72188f41bcde55f6a059da17878a542495e65a
|
|
| MD5 |
b83ceb698dff2cd1078d595090ef8228
|
|
| BLAKE2b-256 |
dabd1f08c870156212fc4eba46855f19a4cb46a28e998e542f2994ea4824343e
|
File details
Details for the file alphapurify-0.1.6-py3-none-any.whl.
File metadata
- Download URL: alphapurify-0.1.6-py3-none-any.whl
- Upload date:
- Size: 61.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
069f8290d3e64466dc9b5825f803cf9ae00d65be2f608ae08b5f6c2f599b2161
|
|
| MD5 |
72347c31a2c7dedba54330ff29cf8ebd
|
|
| BLAKE2b-256 |
0a64d5192447b70beb8071a8c69b980c032d364f8ec6a25f9cdb0c4a3af85859
|