AI-powered fact extraction and citation mapping for documents (PDF, Word, web, text)
Project description
ai-citer
AI-powered fact extraction and citation mapping for documents — PDF, Word, web pages, and plain text.
Built on FastAPI + Anthropic Claude. Extracts verbatim-quoted facts from documents, maps each quote back to its exact character offset, and optionally assigns PDF page numbers.
Install
pip install ai-citer
Requires Python 3.11+ and a PostgreSQL database.
Quick start
Run as a standalone server
Set environment variables (or create a .env file):
ANTHROPIC_API_KEY=sk-ant-...
DATABASE_URL=postgresql://user:pass@localhost/ai_citer
ai-citer serve # starts on :3001
ai-citer serve --port 8080 --reload
Or with uvicorn directly:
uvicorn ai_citer.main:app --port 3001
Embed the router in your own FastAPI app
from fastapi import FastAPI
from ai_citer import documents_router
app = FastAPI()
app.include_router(documents_router, prefix="/ai-citer")
Note: the router reads
app.state.pool(asyncpg pool) andapp.state.anthropic_clientfrom the FastAPI app state. Use the lifespan fromapp.mainas a reference, or set them up yourself.
Use the core functions directly
import anthropic
import asyncio
from ai_citer import (
create_pool, init_db,
extract_facts, map_citations, assign_page_numbers,
parse_pdf, parse_word, parse_web, parse_text,
)
async def main():
pool = await create_pool("postgresql://localhost/mydb")
await init_db(pool)
client = anthropic.AsyncAnthropic(api_key="sk-ant-...")
# Parse a PDF
with open("report.pdf", "rb") as f:
content = parse_pdf(f.read())
# Extract facts
extraction, usage = await extract_facts(client, content.rawText)
# Map quotes back to character offsets
facts = map_citations(content.rawText, extraction.facts)
print(facts[0].citations[0].charOffset) # exact position in raw text
print(f"Cost: ${usage.costUsd:.4f}")
asyncio.run(main())
REST API
When running as a server, the following endpoints are available under /api/documents:
| Method | Path | Description |
|---|---|---|
GET |
/ |
List all documents |
POST |
/ |
Upload a file (multipart/form-data) or URL (url form field) |
GET |
/:id |
Get a document (includes pdfData for PDFs) |
POST |
/:id/extract |
Run fact extraction (optional { "prompt": "..." } body) |
GET |
/:id/facts |
Get all accumulated facts for a document |
POST |
/:id/chat |
Chat with a document ({ "message": "...", "history": [] }) |
MCP server
ai-citer ships an MCP server that exposes extraction tools to AI assistants (Claude Desktop, etc.):
ai-citer mcp
Tools: upload_document_url, extract_facts, get_facts, list_documents.
Environment variables
| Variable | Required | Default | Description |
|---|---|---|---|
ANTHROPIC_API_KEY |
Yes | — | Anthropic API key |
DATABASE_URL |
Yes | — | PostgreSQL connection string |
Development
git clone https://github.com/czawora/ai-citer
cd ai-citer/server
pip install -e ".[dev]"
pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_citer-1.0.1.tar.gz.
File metadata
- Download URL: ai_citer-1.0.1.tar.gz
- Upload date:
- Size: 22.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59ffd4a899eca2cc7fecf29f0642fcda192bf8c6fa8190d98d214823ddffa1f5
|
|
| MD5 |
66449fdf7175b92a8afe3209a717ce19
|
|
| BLAKE2b-256 |
956e211ad0b18d5eb040dd791eacbcc430d468844a7a745d794de0eb8456477c
|
File details
Details for the file ai_citer-1.0.1-py3-none-any.whl.
File metadata
- Download URL: ai_citer-1.0.1-py3-none-any.whl
- Upload date:
- Size: 18.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e506299d63996a667704f19e821f23fd46c7f9101739358f569355b2aaa756e7
|
|
| MD5 |
29ea8e28bf2e14d9ec6d7751b0f416f9
|
|
| BLAKE2b-256 |
879549f121c1dfb83b8daf2a0053996de0c91d584049447747f27a6147a397fe
|