Skip to main content

A Python package to detect AI-generated text in financial documents using the msperlin/finbert-ai-detector model.

Project description

FinBERT AI Detector

This is an easy-to-use Python package for the msperlin/finbert-ai-detector model.

The model is designed specifically to detect AI-generated text in financial documents, such as corporate annual reports (e.g., 10-K filings). It is fine-tuned from yiyanghkust/finbert-pretrain.

The model is used in the working paper:

Perlin, Marcelo and Foguesatto, Cristian and Karagrigoriou Galanos, Aliki and Affonso, Felipe, The use of AI in 10-K Filings: An Empirical Analysis of S&P 500 Reports (January 21, 2026). Available at SSRN: https://ssrn.com/abstract=6108946 or http://dx.doi.org/10.2139/ssrn.6108946

Features

  • GPU automatically supported: CUDA and Apple Silicon (MPS) are fully supported if available.
  • Easy Interface: Detect AI-generated text with a single method call.
  • Batched Inference: Run predictions efficiently on huge datasets using batched inputs.

Installation

Install using pip:

pip install finbert-ai-detector

Quick Start

from finbert_ai_detector import FinbertAIDetector

# Initialize the detector (downloads the model if not cached)
detector = FinbertAIDetector()

# Example text
text = "The Tax Cuts and Jobs Act enacted in 2017 in the United States, significantly changed the tax rules applicable to U.S.-domiciled corporations. Changes such as lower corporate tax rates, full expensing for qualified property, taxation of offshore earnings, limitations on interest expense deductions, and changes to the municipal bond tax exemption may impact demand for our products and services."

# Predict a single text
result = detector.predict(text)
print(f"Prediction: {result['label']}")
print(f"AI Probability: {result['ai_probability']:.2%}")

Batched Prediction

For analyzing multiple documents or sentences quickly, use batched inference:

from finbert_ai_detector import FinbertAIDetector

detector = FinbertAIDetector()

texts = [
    "Company revenue grew by 15% due to increased demand in the European market.",
    "A machine learning model generated this text based on recent financial statements."
]

results = detector.predict_batch(texts)
for result in results:
    print(f"Text: {result['text']}")
    print(f"AI Probability: {result['ai_probability']:.2%}")
    print("---")

Intended Use & Limitations

  • Intended Usage: Analyzing formal financial reports, press releases, corporate filings, and similar structured financial disclosures.
  • Limitations: The model is optimized specifically for the formal, complex tone of financial documents. Its accuracy may be lower when applied to texts outside the financial domain, such as social media posts, casual emails, news articles, or creative text.
  • Length Constraint: The underlying standard FinBERT architecture implies a maximum sequence length of 512 tokens. Texts longer than this will be truncated prior to sequence prediction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finbert_ai_detector-0.1.0.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

finbert_ai_detector-0.1.0-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file finbert_ai_detector-0.1.0.tar.gz.

File metadata

  • Download URL: finbert_ai_detector-0.1.0.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for finbert_ai_detector-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e08135930c5f2a04518b53f75509e3bcdd428f4088a9aab09e9f9e490519f3b8
MD5 e6059cd0e53a8750eabc6c0abb466df1
BLAKE2b-256 caef88c7c7607e678ea23f57bd753742b3bcfe5efd2d90a9d364be9d947ce376

See more details on using hashes here.

File details

Details for the file finbert_ai_detector-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for finbert_ai_detector-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3fc0f41ef5ddf9cda016205552ddd1e4d86d13fcf69663c8ea509bdb9d95290e
MD5 02abf7e2144d778c6a78fd49616ad749
BLAKE2b-256 643e829dd8db0c6170e05242c5e3bfba9339076840842fb619b83e9b598dcc69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page