A Python package to detect AI-generated text in financial documents using the msperlin/finbert-ai-detector model.
Project description
FinBERT AI Detector
This is an easy-to-use Python package for the msperlin/finbert-ai-detector model.
The model is designed specifically to detect AI-generated text in financial documents, such as corporate annual reports (e.g., 10-K filings). It is fine-tuned from yiyanghkust/finbert-pretrain.
The model is used in the working paper:
Perlin, Marcelo and Foguesatto, Cristian and Karagrigoriou Galanos, Aliki and Affonso, Felipe, The use of AI in 10-K Filings: An Empirical Analysis of S&P 500 Reports (January 21, 2026). Available at SSRN: https://ssrn.com/abstract=6108946 or http://dx.doi.org/10.2139/ssrn.6108946
Features
- GPU automatically supported: CUDA and Apple Silicon (MPS) are fully supported if available.
- Easy Interface: Detect AI-generated text with a single method call.
- Batched Inference: Run predictions efficiently on huge datasets using batched inputs.
Installation
Install using pip:
pip install finbert-ai-detector
Quick Start
from finbert_ai_detector import FinbertAIDetector
# Initialize the detector (downloads the model if not cached)
detector = FinbertAIDetector()
# Example text
text = "The Tax Cuts and Jobs Act enacted in 2017 in the United States, significantly changed the tax rules applicable to U.S.-domiciled corporations. Changes such as lower corporate tax rates, full expensing for qualified property, taxation of offshore earnings, limitations on interest expense deductions, and changes to the municipal bond tax exemption may impact demand for our products and services."
# Predict a single text
result = detector.predict(text)
print(f"Prediction: {result['label']}")
print(f"AI Probability: {result['ai_probability']:.2%}")
Batched Prediction
For analyzing multiple documents or sentences quickly, use batched inference:
from finbert_ai_detector import FinbertAIDetector
detector = FinbertAIDetector()
texts = [
"Company revenue grew by 15% due to increased demand in the European market.",
"A machine learning model generated this text based on recent financial statements."
]
results = detector.predict_batch(texts)
for result in results:
print(f"Text: {result['text']}")
print(f"AI Probability: {result['ai_probability']:.2%}")
print("---")
Intended Use & Limitations
- Intended Usage: Analyzing formal financial reports, press releases, corporate filings, and similar structured financial disclosures.
- Limitations: The model is optimized specifically for the formal, complex tone of financial documents. Its accuracy may be lower when applied to texts outside the financial domain, such as social media posts, casual emails, news articles, or creative text.
- Length Constraint: The underlying standard FinBERT architecture implies a maximum sequence length of 512 tokens. Texts longer than this will be truncated prior to sequence prediction.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file finbert_ai_detector-0.1.0.tar.gz.
File metadata
- Download URL: finbert_ai_detector-0.1.0.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e08135930c5f2a04518b53f75509e3bcdd428f4088a9aab09e9f9e490519f3b8
|
|
| MD5 |
e6059cd0e53a8750eabc6c0abb466df1
|
|
| BLAKE2b-256 |
caef88c7c7607e678ea23f57bd753742b3bcfe5efd2d90a9d364be9d947ce376
|
File details
Details for the file finbert_ai_detector-0.1.0-py3-none-any.whl.
File metadata
- Download URL: finbert_ai_detector-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fc0f41ef5ddf9cda016205552ddd1e4d86d13fcf69663c8ea509bdb9d95290e
|
|
| MD5 |
02abf7e2144d778c6a78fd49616ad749
|
|
| BLAKE2b-256 |
643e829dd8db0c6170e05242c5e3bfba9339076840842fb619b83e9b598dcc69
|