Extract quarterly EPS estimates from FactSet Earnings Insight reports using OCR and calculate S&P 500 P/E ratios
Project description
FactSet Report Analyzer
A Python package for extracting quarterly EPS (Earnings Per Share) estimates from FactSet financial reports using OCR and image processing techniques.
โ ๏ธ Disclaimer: This package is for educational and research purposes only. For production use, please use FactSet's official API. This package processes publicly available PDF reports and is not affiliated with or endorsed by FactSet.
Overview
This project processes chart images containing S&P 500 quarterly EPS data and extracts quarter labels (e.g., Q1'14, Q2'15) and corresponding EPS values. The extracted data is saved in CSV format for further analysis.
Motivation
Financial data providers (FactSet, Bloomberg, Investing.com, etc.) typically offer historical EPS data as actual valuesโonce a quarter's earnings are reported, the estimate is overwritten with the actual figure. This creates a challenge for backtesting predictive models: using historical data means testing against information that was already reflected in stock prices at the time, making it difficult to evaluate the true predictive power of EPS estimates.
To address this, this project extracts point-in-time EPS estimates from historical earnings insight reports. By preserving the estimates as they appeared at each report date (before actual earnings were announced), a dataset can be built that accurately reflects what was known and expected at each point in time, enabling more meaningful backtesting and predictive analysis.
Current P/E Ratio Analysis (๐ Auto-updated every Monday)
The following graph shows the current S&P 500 Price with Trailing and Forward P/E Ratios, highlighting periods outside ยฑ1.5ฯ range.
Installation
Install from PyPI:
pip install factset-report-analyzer
Or with uv:
uv pip install factset-report-analyzer
Workflow Overview
The complete workflow from PDF documents to final P/E ratio calculation:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ Step 1: PDF Download โ
โ โ
โ FactSet Earnings Insight Reports โ
โ โโ> Download PDFs from FactSet website โ
โ (e.g., EarningsInsight_20251114_111425.pdf) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ผ๏ธ Step 2: EPS Chart Page Extraction โ
โ โ
โ PDF Document โ
โ โโ> Extract EPS chart page (Page 6) โ
โ โโ> Convert to PNG image โ
โ (e.g., 20161209-6.png) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ Step 3: OCR Processing & Data Extraction โ
โ โ
โ Chart Image โ
โ โโ> Google Cloud Vision API (149 text regions detected) โ
โ โโ> Coordinate-based matching (Q1'14 โ 27.85) โ
โ โโ> Bar classification (dark = actual, light = estimate) โ
โ โโ> Extract quarter labels and EPS values โ
โ โ
โ Output: CSV with quarterly EPS estimates โ
โ โโ> extracted_estimates.csv โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ Step 4: P/E Ratio Calculation โ
โ โ
โ EPS Estimates + S&P 500 Prices โ
โ โโ> Load EPS data from public URL โ
โ โโ> Load S&P 500 prices from yfinance (2016-12-09 to today) โ
โ โโ> Calculate 4-quarter EPS sum (e.g. forward: Q(0)+Q(1)+Q(2)+Q(3))โ
โ โโ> Calculate P/E Ratio = Price / EPS_4Q_Sum โ
โ โ
โ Output: DataFrame with P/E ratios โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Visual Workflow
Step 1: PDF Document โ Downloads FactSet Earnings Insight PDF reports
Step 2: EPS Chart Page Extraction โ Extracts chart page from PDF and converts to PNG image
Step 3: OCR Processing & Bar Classification โ Extracts quarter labels and EPS values, classifies bars (dark = actual, light = estimate)
Step 4: P/E Ratio Calculation โ See example output below
Usage
Python API
from factset_report_analyzer import SP500
# Initialize SP500 class (auto-loads CSV and S&P 500 prices)
sp500 = SP500()
# Get P/E ratio DataFrame (default: forward type)
pe_df = sp500.pe_ratio
print(pe_df)
# Switch to trailing type
sp500.set_type('trailing')
pe_trailing = sp500.pe_ratio
print(pe_trailing)
# Get current P/E ratio
current = sp500.current_pe
print(f"Current P/E: {current['pe_ratio']:.2f} on {current['date']}")
P/E Types:
forward: Q(0) + Q(1) + Q(2) + Q(3) - Report date quarter and next 3 quarterstrailing: Q(-4) + Q(-3) + Q(-2) + Q(-1) - Last 4 quarters before report date
Plotting P/E Ratios
from factset_report_analyzer.utils.plot import plot_pe_ratio_with_price
from pathlib import Path
# Generate and save P/E ratio plot
plot_pe_ratio_with_price(
output_path=Path("output/pe_ratio_plot.png"),
std_threshold=1.5, # Highlight periods outside ยฑ1.5ฯ
figsize=(14, 12)
)
The plot shows:
- Top panel: S&P 500 Price with Trailing P/E Ratio (Q(-4)+Q(-3)+Q(-2)+Q(-1))
- Bottom panel: S&P 500 Price with Forward P/E Ratio (Q(0)+Q(1)+Q(2)+Q(3))
- Highlighting: Periods where P/E ratios are outside ยฑ1.5ฯ range
Example: P/E Ratio Calculation Result
from factset_report_analyzer import SP500
# Initialize and get trailing P/E ratios
sp500 = SP500()
sp500.set_type('trailing')
pe_df = sp500.pe_ratio
print(pe_df)
Output:
๐ Loading S&P 500 data...
โ
EPS data: 381 reports
โ
Price data: 2251 trading days
Date Price EPS_4Q_Sum PE_Ratio Type
0 2016-12-09 2259.530029 117.49 19.231680 trailing
1 2016-12-12 2256.959961 117.49 19.209805 trailing
2 2016-12-13 2271.719971 117.49 19.335433 trailing
3 2016-12-14 2253.280029 117.49 19.178484 trailing
4 2016-12-15 2262.030029 117.49 19.252958 trailing
... ... ... ... ... ...
2246 2025-11-17 6672.410156 267.21 24.970660 trailing
2247 2025-11-18 6617.319824 267.21 24.764492 trailing
2248 2025-11-19 6642.160156 267.21 24.857454 trailing
2249 2025-11-20 6650.740234 267.21 24.900000 trailing
2250 2025-11-21 6660.000000 267.21 24.950000 trailing
[2251 rows x 5 columns]
API Reference
SP500 Class
S&P 500 Market Data with EPS and P/E ratio calculations.
Initialization:
from factset_report_analyzer import SP500
sp500 = SP500()
Properties:
sp500.price: DataFrame with S&P 500 price data (Date, Price)sp500.eps: DataFrame with EPS data (Date, EPS) - depends on current typesp500.pe_ratio: DataFrame with P/E ratio data (Date, Price, EPS, PE_Ratio) - depends on current typesp500.current_pe: Dictionary with latest P/E ratio info ({'date': ..., 'pe_ratio': ...})
Methods:
sp500.set_type(type): Set P/E type to'forward'or'trailing'
P/E Types:
'forward': Q(0) + Q(1) + Q(2) + Q(3) - Report date quarter and next 3 quarters'trailing': Q(-4) + Q(-3) + Q(-2) + Q(-1) - Last 4 quarters before report date
Features:
- โ No API keys required
- โ Always loads latest data from public URL
- โ No local files needed
- โ Auto-loads S&P 500 prices from yfinance
- โ Caches data for efficient repeated access
plot_pe_ratio_with_price() Function
Generate visualization of S&P 500 Price with P/E Ratios.
Function Signature:
from factset_report_analyzer.utils.plot import plot_pe_ratio_with_price
from pathlib import Path
plot_pe_ratio_with_price(
output_path: Path | None = None,
std_threshold: float = 1.5,
figsize: tuple[int, int] = (14, 12)
) -> None
Parameters:
output_path(Path | None): Path to save the plot. If None, displays the plot.std_threshold(float): Standard deviation threshold for highlighting outliers (default: 1.5)figsize(tuple[int, int]): Figure size in inches (default: (14, 12))
Example:
from factset_report_analyzer.utils.plot import plot_pe_ratio_with_price
from pathlib import Path
# Save plot to file
plot_pe_ratio_with_price(
output_path=Path("output/pe_ratio_plot.png"),
std_threshold=1.5,
figsize=(14, 12)
)
# Or display interactively
plot_pe_ratio_with_price()
Legal Disclaimer
This package is provided for educational and research purposes only.
- This package processes publicly available PDF reports from FactSet's website
- The data extraction and processing methods are implemented for academic research
- This package is NOT affiliated with, endorsed by, or sponsored by FactSet
- For production use, please use FactSet's official API
No Warranty: This software is provided "as is" without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and noninfringement.
Limitation of Liability: In no event shall the authors or copyright holders be liable for any claim, damages, or other liability arising from the use of this software.
Data Usage: Users are responsible for ensuring compliance with FactSet's terms of service and any applicable data usage agreements when using this package.
License
MIT License
Recent Updates
v0.4.2 (2025-11-21) - Plotting Module Refactoring
- โ
Code Organization: Moved plot functions to
utils/plot/module for better separation of concernsplot_time_series()โutils/plot/time_series.py(generic time series plotting)plot_pe_ratio_with_price()โutils/plot/pe_ratio.py(P/E ratio specific plotting)
- โ Clean Architecture: Analysis module now focuses on calculation logic only
- โ Improved Reusability: Plot functions can be easily reused by other modules
- โ Documentation: Added plotting examples and API reference
Links
- GitHub: seung-gu/factset-report-analyzer
- PyPI: factset-report-analyzer
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file factset_report_analyzer-0.4.2.tar.gz.
File metadata
- Download URL: factset_report_analyzer-0.4.2.tar.gz
- Upload date:
- Size: 7.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27a9206937fe5a022f5e47a4d78bac6a14529aadcca4b9ed9e9f7e141964be5a
|
|
| MD5 |
a5517f6dfbecf60d9c79f12c6b726ed2
|
|
| BLAKE2b-256 |
2e7d536c2825b83c64b3fcba763baede40f3945c0c8d166a3259e3d2e16085e4
|
File details
Details for the file factset_report_analyzer-0.4.2-py3-none-any.whl.
File metadata
- Download URL: factset_report_analyzer-0.4.2-py3-none-any.whl
- Upload date:
- Size: 35.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02117facf3a9ca8935b89bf1056fd57db786eaff86c94cc2167748ba7da924ca
|
|
| MD5 |
8056edefcb5c989bb41fa509687bc456
|
|
| BLAKE2b-256 |
4aa89d24153a6a7595947ff8eed7a4f436c286a90a9555e4039a7bdb1a09814c
|