FiCi: a lightweight detector for fake/hallucinated citations in scientific papers.

These details have not been verified by PyPI

Project links

Project description

FiCi

FiCi (Fictitious Citations) is a lightweight Python package for detecting fabricated or hallucinated citations in scientific PDFs. It's tuned for standard single-/double-column conference layouts (NeurIPS, ICLR, ACM acmart / SIG conf) and avoids LLMs or heavy ML models.

Install

From PyPI:

pip install fici

From source (editable, for development):

git clone https://github.com/sadjadeb/fici.git
cd fici
pip install -e ".[dev]"

Command-line usage

Installing the package registers a fici console script:

fici paper.pdf --email you@example.org

Useful flags:

fici paper.pdf --email you@example.org --workers 8        # more concurrency
fici paper.pdf --email you@example.org --json > out.json  # machine-readable
fici paper.pdf --email you@example.org --quiet            # summary only
fici --help

The CLI returns a non-zero exit code if any citation is flagged, which makes it easy to drop into CI pipelines:

Exit code	Meaning
`0`	All references verified.
`1`	At least one reference is flagged or errored.
`2`	Bad input (e.g. PDF not found).

python -m fici ... is equivalent to the fici script if you haven't added your Python bin directory to PATH.

Programmatic usage

from fici import FiCiPipeline

pipeline = FiCiPipeline(email="you@example.org")  # polite pool
reports = pipeline.run("paper.pdf")

for r in reports:
    print(r.index, r.verdict.value, round(r.score, 1), r.suspected_title)

print(FiCiPipeline.summarize(reports))

See example.py for a complete programmatic usage example.

How it works

The pipeline has four phases, each exposed as a standalone class:

Extraction (ReferenceExtractor): PyMuPDF pulls text, heuristics locate the References / Bibliography section, and regex splitters handle the dominant reference styles ([1] ..., 1. ..., Author-Year).
Structuring + Search (primary) (CitationSearcher.search_openalex): each raw citation is sent to the OpenAlex /works endpoint as a free-text query (title only, for precision), using the polite pool via mailto. The hits are then handed to the verifier.
Search (second opinion) (CitationSearcher.search_crossref): whenever the OpenAlex-based verdict is anything other than Verified (suspicious match, no match, or error), FiCi also queries Crossref's query.bibliographic endpoint and verifies its hits. The pipeline returns whichever of the two reports is stronger — Verified always beats other verdicts, and within the same tier the higher score wins. If OpenAlex verifies on the first try, Crossref is skipped to save latency.

Verification (CitationVerifier): rapidfuzz.fuzz.token_sort_ratio compares the API-returned title to the suspected title in the raw string, with a small bonus for corroborating author surnames. The pipeline emits one of four verdicts:

Verdict	Condition
`Verified`	Score ≥ verify threshold (default 85).
`Suspicious/Mismatch`	API found candidates but score < threshold (default 75–85).
`Highly Likely Fake`	Neither API returned any results.
`Error`	API call raised an unrecoverable exception.

Tuning knobs

FiCiPipeline(verify_threshold=85, mismatch_threshold=75): move the cutoffs up/down to trade precision for recall.
FiCiPipeline(max_workers=4): API calls are dispatched concurrently via a thread pool (I/O-bound work). Default is 4, which stays under the OpenAlex / Crossref polite-pool rate limits. Set to 1 to force sequential execution, or override per-call with pipeline.run(pdf, max_workers=N).
CitationSearcher(max_results=5, timeout=15, retries=2): control API politeness and robustness.
Inject a custom ReferenceExtractor subclass if you need to support a non-standard template (e.g. workshop-specific layouts).

Current limitations

Title extraction from raw strings is heuristic; unusual punctuation or missing years can occasionally yield an incomplete suspected_title, which is why scoring also consults the full raw string.
Author matching uses surname containment rather than a structured parse. If you'd like structured parsing via anystyle or GROBID, that's a clean extension point on CitationSearcher._prepare_query.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

Apr 21, 2026

This version

0.1.1

Apr 20, 2026

0.1.0 yanked

Apr 20, 2026

Reason this release was yanked:

This version has wrong requirements versioning.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fici-0.1.1.tar.gz (34.1 kB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fici-0.1.1-py3-none-any.whl (35.7 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file fici-0.1.1.tar.gz.

File metadata

Download URL: fici-0.1.1.tar.gz
Upload date: Apr 20, 2026
Size: 34.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for fici-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`4e32d7ffc9c748444d030ce779c46c75f2b0bc7af9e7832e2acc69ab4c4042c5`
MD5	`45097ced81c37767686210d5ee667e6e`
BLAKE2b-256	`0af772dccacd094a0e58c8094fa6e5caa6f9af1846f2219ba5ef693b5d07a7d9`

See more details on using hashes here.

File details

Details for the file fici-0.1.1-py3-none-any.whl.

File metadata

Download URL: fici-0.1.1-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 35.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for fici-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`93b2f61198d14ae264c94dc1d05433c805160e2f41e9fbb1e2f44ffe5dd06248`
MD5	`ca98dd3c90e7fd32acb42eaed05f56f7`
BLAKE2b-256	`1d051e3e14ea8d463876e63d42d20ec3f7c7890d5f829cd43b3787ac2827ed82`

See more details on using hashes here.

fici 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FiCi

Install

Command-line usage

Programmatic usage

How it works

Tuning knobs

Current limitations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes