Skip to main content

Convert arXiv research papers into runnable Python implementations using Claude AI

Project description

PaperForge SDK

Convert arXiv research papers into runnable Python implementations — in seconds.

PyPI version Python 3.10+ License: MIT

PaperForge reads an arXiv paper, extracts the methodology using Claude AI, and generates a working Python implementation — complete with section references, usage examples, and honest limitations.


Installation

pip install paperforge

Quick Start

from paperforge import PaperForge

pf = PaperForge(base_url="http://localhost:8000")  # or your deployed URL

# Analyze a paper
analysis = pf.analyze("https://arxiv.org/abs/1706.03762")
print(analysis.key_algorithm)              # Transformer
print(analysis.implementation_difficulty)  # hard
print(analysis.reported_results)           # {'WMT 2014 EN-DE BLEU': '28.4'}

# Generate a Python implementation
code = pf.generate("https://arxiv.org/abs/1706.03762")
print(code.strategy)        # core  (hard papers get core mechanism only)
print(code.estimated_lines) # 67
code.save("transformer.py") # save to disk

# Full pipeline in one call
result = pf.paper("https://arxiv.org/abs/1706.03762")
result.save_code("output/")
print(f"Used {result.total_tokens:,} tokens")

Features

Feature Description
Paper analysis Extracts algorithm, datasets, metrics, novelty, reproducibility notes
Code generation Generates runnable Python with paper section references
Smart strategy Easy → full impl · Hard → core mechanism · Non-implementable → skeleton
PDF upload Works with local PDFs, not just arXiv
Benchmarking Run generated code against your CSV dataset via E2B sandbox
Type-safe Full dataclass models with properties and methods

API Reference

PaperForge(base_url, timeout)

Main client class.

# Against local dev server
pf = PaperForge(base_url="http://localhost:8000")

# Against deployed API
pf = PaperForge(base_url="https://paperforge.onrender.com")

# As context manager (auto-closes HTTP client)
with PaperForge(base_url="http://localhost:8000") as pf:
    analysis = pf.analyze("1706.03762")

pf.analyze(url)PaperAnalysis

Fetch and analyze any arXiv paper.

analysis = pf.analyze("https://arxiv.org/abs/1706.03762")
# or bare ID:
analysis = pf.analyze("1706.03762")

print(analysis.title)                    # "Attention Is All You Need"
print(analysis.key_algorithm)            # "Transformer"
print(analysis.implementation_difficulty) # "hard"
print(analysis.is_hard)                  # True
print(analysis.datasets_used)           # ["WMT 2014 English-German", ...]
print(analysis.evaluation_metrics)      # ["BLEU"]
print(analysis.reported_results)        # {"WMT 2014 EN-DE BLEU": "28.4"}
print(analysis.dependencies)            # ["torch", "numpy"]
print(analysis.reproducibility_notes)   # "Full hyperparameters in appendix..."
print(analysis.tokens_used)             # 12868

PaperAnalysis properties:

  • .is_hard — True if difficulty is "hard"
  • .is_easy — True if difficulty is "easy"

pf.generate(url)GeneratedCode

Generate a Python implementation from an arXiv paper.

code = pf.generate("https://arxiv.org/abs/1603.02754")

print(code.strategy)          # "full"  (XGBoost is medium difficulty)
print(code.estimated_lines)   # 85
print(code.explanation)       # "Implements XGBoost gradient boosting..."
print(code.install_command)   # "pip install sklearn numpy"
print(code.limitations)       # "No distributed training, no GPU support..."
print(code.code)              # Full Python source code

# Save to file
code.save("xgboost_impl.py")          # saves to file
code.save("output/")                   # saves as paperforge_implementation.py
code.print_usage()                     # prints usage example to stdout

Generation strategies:

  • "full" — complete implementation (easy/medium papers)
  • "core" — core mechanism only (hard papers like Transformer, BERT)
  • "skeleton" — documented stubs (non-implementable or theory papers)

pf.generate_from_analysis(analysis)GeneratedCode

Skip re-parsing when you already have an analysis.

analysis = pf.analyze("1706.03762")
# Regenerate without fetching the paper again:
code = pf.generate_from_analysis(analysis)

pf.paper(url)PaperResult

Full pipeline: analyze + generate in one call.

result = pf.paper("https://arxiv.org/abs/1706.03762")

print(result.arxiv_id)                    # "1706.03762"
print(result.analysis.key_algorithm)      # "Transformer"
print(result.code.strategy)               # "core"
print(result.total_tokens)                # 14968

result.save_code("output/")               # save generated code
result.save_code("transformer_impl.py")   # save to specific file

pf.analyze_pdf(path)PaperAnalysis

Analyze a local PDF file.

analysis = pf.analyze_pdf("papers/my_paper.pdf")
print(analysis.title)

pf.benchmark(csv_path, analysis, code)BenchmarkResult

Run generated code against your dataset in an E2B cloud sandbox.

Requires E2B_API_KEY configured on the server.

analysis = pf.analyze("https://arxiv.org/abs/1603.02754")
code = pf.generate_from_analysis(analysis)
result = pf.benchmark("data/iris.csv", analysis, code)

print(result.status)          # "success"
print(result.dataset_rows)    # 150
print(result.interpretation)  # Claude's plain-English analysis
print(result.execution_time_ms) # 6690

for metric in result.metrics:
    print(metric.name, metric.your_value, metric.paper_value, metric.gap_pct)
    print(metric.beat_paper)  # True/False/None

Error Handling

from paperforge import PaperForge
from paperforge.exceptions import (
    PaperNotFoundError,
    InvalidArxivURLError,
    TimeoutError,
    ConnectionError,
    APIError,
)

pf = PaperForge(base_url="http://localhost:8000")

try:
    analysis = pf.analyze("https://arxiv.org/abs/1706.03762")
except PaperNotFoundError:
    print("Paper not found on arXiv — check the ID")
except InvalidArxivURLError:
    print("Invalid arXiv URL format")
except TimeoutError:
    print("Request timed out — try increasing timeout parameter")
except ConnectionError:
    print("Cannot connect to PaperForge API — is the server running?")
except APIError as e:
    print(f"API error {e.status_code}: {e}")

Self-Hosting

The SDK points to any PaperForge API instance. To run locally:

# Clone the PaperForge backend
git clone https://github.com/GPREETHAMSAXON/PaperForge
cd PaperForge

# Install and configure
pip install -r requirements.txt
cp .env.example .env
# Add ANTHROPIC_API_KEY to .env

# Start the server
uvicorn app.main:app --reload

Then use the SDK:

pf = PaperForge(base_url="http://localhost:8000")

Examples

# Run the quickstart example (requires local server running)
python examples/quickstart.py

License

MIT © Saxon


Related

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paperforge_sdk-0.2.0.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paperforge_sdk-0.2.0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file paperforge_sdk-0.2.0.tar.gz.

File metadata

  • Download URL: paperforge_sdk-0.2.0.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for paperforge_sdk-0.2.0.tar.gz
Algorithm Hash digest
SHA256 73163250087ab996ed0ff42d240fa8f7dd715c559da526d4bf38db889dd70758
MD5 6d127d492a7d820822c9c4c34df827a2
BLAKE2b-256 31a6edff458cb2836dbe99fdf3db2aba5aee7a4718e83f7fb7066529ea87c9cc

See more details on using hashes here.

File details

Details for the file paperforge_sdk-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: paperforge_sdk-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for paperforge_sdk-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 543bc7bc0987d8b54df377547da683bcba02011814d42d53df4d647edaf6a527
MD5 4fb288b47816d4befd676eb09f815b56
BLAKE2b-256 840487bb6b312d52c4a989992cb333cd1738d83012d1e3590dca3b870d2bc84d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page