Skip to main content

Extract quarterly EPS estimates from FactSet Earnings Insight reports using OCR and calculate S&P 500 P/E ratios

Project description

FactSet Report Analyzer

A Python package for extracting quarterly EPS (Earnings Per Share) estimates from FactSet financial reports using OCR and image processing techniques.

โš ๏ธ Disclaimer: This package is for educational and research purposes only. For production use, please use FactSet's official API. This package processes publicly available PDF reports and is not affiliated with or endorsed by FactSet.

Overview

This project processes chart images containing S&P 500 quarterly EPS data and extracts quarter labels (e.g., Q1'14, Q2'15) and corresponding EPS values. The extracted data is saved in CSV format for further analysis.

Motivation

Financial data providers (FactSet, Bloomberg, Investing.com, etc.) typically offer historical EPS data as actual valuesโ€”once a quarter's earnings are reported, the estimate is overwritten with the actual figure. This creates a challenge for backtesting predictive models: using historical data means testing against information that was already reflected in stock prices at the time, making it difficult to evaluate the true predictive power of EPS estimates.

To address this, this project extracts point-in-time EPS estimates from historical earnings insight reports. By preserving the estimates as they appeared at each report date (before actual earnings were announced), a dataset can be built that accurately reflects what was known and expected at each point in time, enabling more meaningful backtesting and predictive analysis.

Current P/E Ratio Analysis (๐Ÿ”„ Auto-updated every Monday)

The following graph shows the current S&P 500 Price with Trailing and Forward P/E Ratios, highlighting periods outside ยฑ1.5ฯƒ range.

P/E Ratio Analysis

Installation

Install from PyPI:

pip install factset-report-analyzer

Or with uv:

uv pip install factset-report-analyzer

Workflow Overview

The complete workflow from PDF documents to final P/E ratio calculation:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    ๐Ÿ“„ Step 1: PDF Download                          โ”‚
โ”‚                                                                     โ”‚
โ”‚  FactSet Earnings Insight Reports                                   โ”‚
โ”‚  โ””โ”€> Download PDFs from FactSet website                             โ”‚
โ”‚      (e.g., EarningsInsight_20251114_111425.pdf)                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                            โ”‚
                            โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              ๐Ÿ–ผ๏ธ  Step 2: EPS Chart Page Extraction                  โ”‚
โ”‚                                                                     โ”‚
โ”‚  PDF Document                                                       โ”‚
โ”‚  โ””โ”€> Extract EPS chart page (Page 21)                               โ”‚
โ”‚      โ””โ”€> Convert to PNG image                                       โ”‚
โ”‚          (e.g., 20161209.png)                                       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                            โ”‚
                            โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              ๐Ÿ” Step 3: OCR Processing & Data Extraction            โ”‚
โ”‚                                                                     โ”‚
โ”‚  Chart Image                                                        โ”‚
โ”‚  โ”œโ”€> Google Cloud Vision API (149 text regions detected)            โ”‚
โ”‚  โ”œโ”€> Coordinate-based matching (Q1'14 โ†” 27.85)                      โ”‚
โ”‚  โ”œโ”€> Bar classification (dark = actual, light = estimate)           โ”‚
โ”‚  โ””โ”€> Extract quarter labels and EPS values                          โ”‚
โ”‚                                                                     โ”‚
โ”‚  Output: CSV with quarterly EPS estimates                           โ”‚
โ”‚  โ””โ”€> extracted_estimates.csv                                        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                            โ”‚
                            โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              ๐Ÿ“Š Step 4: P/E Ratio Calculation                       โ”‚
โ”‚                                                                     โ”‚
โ”‚  EPS Estimates + S&P 500 Prices                                     โ”‚
โ”‚  โ”œโ”€> Load EPS data from public URL                                  โ”‚
โ”‚  โ”œโ”€> Load S&P 500 prices from yfinance (2016-12-09 to today)        โ”‚
โ”‚  โ”œโ”€> Calculate 4-quarter EPS sum (e.g. forward: Q(0)+Q(1)+Q(2)+Q(3))โ”‚
โ”‚  โ””โ”€> Calculate P/E Ratio = Price / EPS_4Q_Sum                       โ”‚
โ”‚                                                                     โ”‚
โ”‚  Output: DataFrame with P/E ratios                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Visual Workflow

Step 1: PDF Document โ†’ Downloads FactSet Earnings Insight PDF reports

Step 2: EPS Chart Page Extraction โ†’ Extracts chart page from PDF and converts to PNG image

Step 3: OCR Processing & Bar Classification โ†’ Extracts quarter labels and EPS values, classifies bars (dark = actual, light = estimate)

Step 4: P/E Ratio Calculation โ†’ See example output below

Usage

Python API

from factset_report_analyzer import SP500

# Initialize SP500 class (auto-loads CSV and S&P 500 prices)
sp500 = SP500()

# Get P/E ratio DataFrame (default: forward type)
pe_df = sp500.pe_ratio
print(pe_df)

# Switch to trailing type
sp500.set_type('trailing')
pe_trailing = sp500.pe_ratio
print(pe_trailing)

# Get current P/E ratio
current = sp500.current_pe
print(f"Current P/E: {current['pe_ratio']:.2f} on {current['date']}")

P/E Types:

  • forward: Q(0) + Q(1) + Q(2) + Q(3) - Report date quarter and next 3 quarters
  • trailing: Q(-4) + Q(-3) + Q(-2) + Q(-1) - Last 4 quarters before report date

Plotting Time Series

Generic time series plotting with optional sigma threshold highlighting:

from factset_report_analyzer import SP500
from factset_report_analyzer.utils.plot import plot_time_series
from pathlib import Path

# Get SP500 data
sp500 = SP500()
sp500.set_type('trailing')
pe_df = sp500.pe_ratio.sort_values('Date')

# Plot single series with sigma highlighting
plot_time_series(
    dates=pe_df['Date'],
    values=pe_df['PE_Ratio'],
    sigma=1.5,  # Highlight periods outside ยฑ1.5ฯƒ
    labels=['Trailing P/E Ratio'],
    colors=['green'],
    output_path=Path("output/pe_ratio_plot.png")
)

# Plot dual axis (price and P/E ratio)
plot_time_series(
    dates=pe_df['Date'],
    values=[pe_df['Price'], pe_df['PE_Ratio']],
    sigma=1.5,
    sigma_index=1,  # Apply sigma to P/E ratio (second series)
    labels=['S&P 500 Price', 'Trailing P/E Ratio'],
    colors=['black', 'green'],
    output_path=Path("output/price_pe_plot.png")
)

Features:

  • Single or dual-axis plotting (up to 2 series)
  • Optional sigma threshold highlighting for outlier detection
  • Automatic legend with mean and ยฑฯƒ lines
  • Customizable colors and labels

Plotting P/E Ratios

from factset_report_analyzer.utils.plot import plot_pe_ratio_with_price
from pathlib import Path

# Generate and save P/E ratio plot
plot_pe_ratio_with_price(
    output_path=Path("output/pe_ratio_plot.png"),
    std_threshold=1.5,  # Highlight periods outside ยฑ1.5ฯƒ
    figsize=(14, 12)
)

The plot shows:

  • Top panel: S&P 500 Price with Trailing P/E Ratio (Q(-4)+Q(-3)+Q(-2)+Q(-1))
  • Bottom panel: S&P 500 Price with Forward P/E Ratio (Q(0)+Q(1)+Q(2)+Q(3))
  • Highlighting: Periods where P/E ratios are outside ยฑ1.5ฯƒ range

Example: P/E Ratio Calculation Result

from factset_report_analyzer import SP500

# Initialize and get trailing P/E ratios
sp500 = SP500()
sp500.set_type('trailing')
pe_df = sp500.pe_ratio
print(pe_df)

Output:

๐Ÿ“Š Loading S&P 500 data...
  โœ… EPS data: 381 reports
  โœ… Price data: 2251 trading days
        Date        Price  EPS_4Q_Sum   PE_Ratio      Type
0 2016-12-09  2259.530029      117.49  19.231680  trailing
1 2016-12-12  2256.959961      117.49  19.209805  trailing
2 2016-12-13  2271.719971      117.49  19.335433  trailing
3 2016-12-14  2253.280029      117.49  19.178484  trailing
4 2016-12-15  2262.030029      117.49  19.252958  trailing
...          ...          ...         ...        ...       ...
2246  2025-11-17  6672.410156      267.21  24.970660  trailing
2247  2025-11-18  6617.319824      267.21  24.764492  trailing
2248  2025-11-19  6642.160156      267.21  24.857454  trailing
2249  2025-11-20  6650.740234      267.21  24.900000  trailing
2250  2025-11-21  6660.000000      267.21  24.950000  trailing

[2251 rows x 5 columns]

API Reference

SP500 Class

S&P 500 Market Data with EPS and P/E ratio calculations.

Initialization:

from factset_report_analyzer import SP500
sp500 = SP500()

Properties:

  • sp500.price: DataFrame with S&P 500 price data (Date, Price)
  • sp500.eps: DataFrame with EPS data (Date, EPS) - depends on current type
  • sp500.pe_ratio: DataFrame with P/E ratio data (Date, Price, EPS, PE_Ratio) - depends on current type
  • sp500.current_pe: Dictionary with latest P/E ratio info ({'date': ..., 'pe_ratio': ...})

Methods:

  • sp500.set_type(type): Set P/E type to 'forward' or 'trailing'

P/E Types:

  • 'forward': Q(0) + Q(1) + Q(2) + Q(3) - Report date quarter and next 3 quarters
  • 'trailing': Q(-4) + Q(-3) + Q(-2) + Q(-1) - Last 4 quarters before report date

Features:

  • โœ… No API keys required
  • โœ… Always loads latest data from public URL
  • โœ… No local files needed
  • โœ… Auto-loads S&P 500 prices from yfinance
  • โœ… Caches data for efficient repeated access

Plotting Functions

For detailed API documentation, see function docstrings:

  • plot_time_series() - Generic time series plotting with sigma highlighting
  • plot_pe_ratio_with_price() - S&P 500 P/E ratio visualization
# View help
from factset_report_analyzer.utils.plot import plot_time_series, plot_pe_ratio_with_price
help(plot_time_series)
help(plot_pe_ratio_with_price)

Legal Disclaimer

This package is provided for educational and research purposes only.

  • This package processes publicly available PDF reports from FactSet's website
  • The data extraction and processing methods are implemented for academic research
  • This package is NOT affiliated with, endorsed by, or sponsored by FactSet
  • For production use, please use FactSet's official API

No Warranty: This software is provided "as is" without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and noninfringement.

Limitation of Liability: In no event shall the authors or copyright holders be liable for any claim, damages, or other liability arising from the use of this software.

Data Usage: Users are responsible for ensuring compliance with FactSet's terms of service and any applicable data usage agreements when using this package.

License

MIT License

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

factset_report_analyzer-0.4.3.tar.gz (7.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

factset_report_analyzer-0.4.3-py3-none-any.whl (36.3 kB view details)

Uploaded Python 3

File details

Details for the file factset_report_analyzer-0.4.3.tar.gz.

File metadata

  • Download URL: factset_report_analyzer-0.4.3.tar.gz
  • Upload date:
  • Size: 7.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for factset_report_analyzer-0.4.3.tar.gz
Algorithm Hash digest
SHA256 8fd58de83549a77e200876961c1c0005bac2b1f0ee7278f23605f0b69e040ce2
MD5 a19ef238a38eee37fc9da44ed6036321
BLAKE2b-256 15f2245c464a9b86b82608de8866012fee37ee13b2d02ac8ccd380fe35f63096

See more details on using hashes here.

File details

Details for the file factset_report_analyzer-0.4.3-py3-none-any.whl.

File metadata

File hashes

Hashes for factset_report_analyzer-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d1c9195c655a725f324a41142040d32729f8a6222955f4736124c462921d72a5
MD5 24240ddd6b274ae31a3ba45543451c31
BLAKE2b-256 73e9f8eff0e5c4f7fb0d58513c7f5f26d2c8497a8cc3f84b32ff35536fd81570

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page