Skip to main content

Python scraper to extract AI responses from Perplexity's web interface.

Project description

Perplexity WebUI Scraper

Python scraper to extract AI responses from Perplexity's web interface.

PyPI Python License


Installation

As a Library

# From PyPI (stable)
uv add perplexity-webui-scraper

# From GitHub prod branch (latest fixes)
uv add git+https://github.com/henrique-coder/perplexity-webui-scraper.git@prod

As MCP Server

No installation required — uvx handles everything automatically:

# From PyPI (stable)
uvx --from perplexity-webui-scraper[mcp]@latest perplexity-webui-scraper-mcp

# From GitHub prod branch (latest fixes)
uvx --from "perplexity-webui-scraper[mcp]@git+https://github.com/henrique-coder/perplexity-webui-scraper.git@prod" perplexity-webui-scraper-mcp

# From local directory (for development)
uv --directory /path/to/perplexity-webui-scraper run perplexity-webui-scraper-mcp

Requirements

  • Perplexity Pro or Max account
  • Session token (__Secure-next-auth.session-token cookie)

Getting Your Session Token

Option 1: Automatic (CLI Tool)

uv run get-perplexity-session-token

This interactive tool will:

  1. Ask for your Perplexity email
  2. Send a verification code to your email
  3. Accept either a 6-digit code or magic link
  4. Extract and display your session token
  5. Optionally save it to your .env file

Option 2: Manual (Browser)

  1. Log in at perplexity.ai
  2. Open DevTools (F12) → Application/Storage → Cookies
  3. Copy the value of __Secure-next-auth.session-token
  4. Store in .env: PERPLEXITY_SESSION_TOKEN="your_token"

Quick Start

from perplexity_webui_scraper import Perplexity

client = Perplexity(session_token="YOUR_TOKEN")
conversation = client.create_conversation()

conversation.ask("What is quantum computing?")
print(conversation.answer)

# Follow-up (context is preserved automatically)
conversation.ask("Explain it simpler")
print(conversation.answer)

Streaming

for chunk in conversation.ask("Explain AI", stream=True):
    if chunk.answer:
        print(chunk.answer, end="\r")

With Options

from perplexity_webui_scraper import (
    CitationMode,
    ConversationConfig,
    Coordinates,
    SourceFocus,
)

config = ConversationConfig(
    model="deep-research",
    citation_mode=CitationMode.MARKDOWN,
    source_focus=[SourceFocus.WEB, SourceFocus.ACADEMIC],
    language="en-US",
    coordinates=Coordinates(latitude=12.3456, longitude=-98.7654),
)

conversation = client.create_conversation(config)
conversation.ask("Latest AI research", files=["paper.pdf"])
print(conversation.answer)

API Reference

Perplexity(session_token, config?)

Main client — create once and reuse across multiple conversations.

Parameter Type Description
session_token str Browser cookie value
config ClientConfig Timeout, retry, TLS settings
from perplexity_webui_scraper import ClientConfig, LogLevel, Perplexity

client = Perplexity(
    session_token="YOUR_TOKEN",
    config=ClientConfig(
        timeout=7200,
        max_retries=3,
        logging_level=LogLevel.DEBUG,
        log_file=".debug/perplexity.log",
    ),
)

client.create_conversation(config?)

Returns a Conversation object. Each conversation maintains its own context for follow-up questions.

conversation = client.create_conversation(ConversationConfig(model="gpt-5.4"))

Conversation.ask(query, model?, files?, citation_mode?, stream?)

Parameter Type Default Description
query str required The question to ask
model str | None "best" Model ID string
files list[FileInput] | None None File attachments
citation_mode CitationMode | None None Override conversation config
stream bool False Yield chunks as they arrive

Returns self (the Conversation) for method chaining or iteration when streaming.

Conversation Properties

Property Type Description
answer str | None Full response text
title str | None Auto-generated conversation title
search_results list[SearchResultItem] Source URLs used in the response
uuid str | None Conversation backend UUID

Models

Models are specified as plain strings — the same style as the OpenAI SDK:

ConversationConfig(model="gpt-5.4-thinking")
conversation.ask("...", model="gemini-3.1-pro")
Model ID Name Description Min. Tier
"best" Pro Automatically selects the most responsive model based on the query pro
"deep-research" Deep research Fast and thorough for routine research pro
"sonar" Sonar Perplexity's latest model pro
"gemini-3-flash" Gemini 3 Flash Google's fast model pro
"gemini-3-flash-thinking" Gemini 3 Flash Thinking Google's fast model with thinking pro
"gemini-3.1-pro" Gemini 3.1 Pro Google's latest model pro
"gemini-3.1-pro-thinking" Gemini 3.1 Pro Thinking Google's latest model with thinking pro
"gpt-5.4" GPT-5.4 OpenAI's latest model pro
"gpt-5.4-thinking" GPT-5.4 Thinking OpenAI's latest model with thinking pro
"claude-sonnet-4.6" Claude Sonnet 4.6 Anthropic's fast model pro
"claude-sonnet-4.6-thinking" Claude Sonnet 4.6 Thinking Anthropic's newest reasoning model pro
"claude-opus-4.6" Claude Opus 4.6 Anthropic's most advanced model max
"claude-opus-4.6-thinking" Claude Opus 4.6 Thinking Anthropic's Opus reasoning model with thinking max
"grok-4.1" Grok 4.1 xAI's latest model pro
"grok-4.1-thinking" Grok 4.1 Thinking xAI's latest model with thinking pro
"kimi-k2.5-thinking" Kimi K2.5 Thinking Moonshot AI's latest model with thinking pro

You can also inspect available models programmatically:

from perplexity_webui_scraper import MODELS

for model_id, model in MODELS.items():
    print(f"{model_id!r:35}{model.name} [{model.subscription_tier}]")

File Attachments (FileInput)

ask() accepts files in multiple formats via the FileInput type:

from perplexity_webui_scraper import FileInput  # for type annotations

# 1. Local file path (str or Path)
conversation.ask("Describe this image", files=["photo.jpg"])
conversation.ask("Summarize this", files=[Path("document.pdf")])

# 2. Raw bytes — filename defaults to "file", mimetype to "application/octet-stream"
image_bytes: bytes = requests.get("https://example.com/image.jpg").content
conversation.ask("What's in this image?", files=[image_bytes])

# 3. Bytes + filename — mimetype is guessed from the filename extension
conversation.ask("Analyze this", files=[(image_bytes, "photo.jpg")])

# 4. Bytes + filename + explicit mimetype — full control
conversation.ask("Read this PDF", files=[(pdf_bytes, "report.pdf", "application/pdf")])

# Mix and match different types in one call
conversation.ask("Compare these", files=["local.jpg", (remote_bytes, "remote.png")])

Limits: up to 30 files per prompt, 50 MB each.

CitationMode

Mode Output format Description
DEFAULT text[1] Keep original markers
MARKDOWN text[1](url) Convert to markdown links
CLEAN text Remove all citations

ConversationConfig

Parameter Type Default Description
model str | None None ("best") Model ID string
citation_mode CitationMode CLEAN Citation format
save_to_library bool False Save conversation to Perplexity library
search_focus SearchFocus WEB Search type (WEB or WRITING)
source_focus SourceFocus | list[SourceFocus] WEB Source types to prioritize
time_range TimeRange ALL Recency filter for results
language str "en-US" Language for the response
timezone str | None None IANA timezone (e.g. "America/Sao_Paulo")
coordinates Coordinates | None None Geographic location (lat/lng)

ClientConfig

Parameter Type Default Description
timeout int 3600 Request timeout in seconds
impersonate str "chrome" Browser fingerprint to impersonate
max_retries int 3 Maximum retry attempts on transient errors
retry_base_delay float 1.0 Initial backoff delay in seconds
retry_max_delay float 60.0 Maximum backoff delay in seconds
retry_jitter float 0.5 Jitter factor for retry delay randomization
requests_per_second float 0.5 Rate limit (requests per second)
rotate_fingerprint bool True Rotate browser fingerprint on each retry
max_init_query_length int 2000 Truncate init query to avoid HTTP 414
logging_level LogLevel DISABLED Log verbosity
log_file str | PathLike | None None Write logs to file instead of stderr

Enums

SourceFocus

Value Targets
WEB General web search
ACADEMIC Academic papers and scholarly articles
SOCIAL Social media (Reddit, Twitter, etc.)
FINANCE SEC EDGAR filings

SearchFocus

Value Description
WEB Search the web
WRITING Writing-focused mode

TimeRange

Value Description
ALL No time filter
TODAY Last 24 hours
LAST_WEEK Last 7 days
LAST_MONTH Last 30 days
LAST_YEAR Last 365 days

LogLevel

Value Description
DISABLED No logging (default)
DEBUG All messages including debug
INFO Info, warnings, and errors
WARNING Warnings and errors only
ERROR Errors only
CRITICAL Critical/fatal errors only

Exceptions

Exception Description
PerplexityError Base exception for all library errors
HTTPError HTTP error with status code and response body
AuthenticationError Session token is invalid or expired (HTTP 403)
RateLimitError Rate limit exceeded (HTTP 429)
FileUploadError File upload to Perplexity's S3 failed
FileValidationError File validation failed (size, type, not found)
ResearchClarifyingQuestionsError Research mode requires clarifying questions
ResponseParsingError API response could not be parsed
StreamingError Error during streaming response
from perplexity_webui_scraper import (
    AuthenticationError,
    PerplexityError,
    ResearchClarifyingQuestionsError,
)

try:
    conversation.ask("Analyze recent market trends", model="deep-research")
except ResearchClarifyingQuestionsError as e:
    print("Needs clarification:", e.questions)
except AuthenticationError:
    print("Token expired — refresh your session token")
except PerplexityError as e:
    print(f"Library error: {e}")

MCP Server (Model Context Protocol)

The library includes an MCP server that exposes every model as a separate tool for AI assistants like Claude Desktop and Antigravity. Enable only the models you need to keep agent context size small.

Configuration

Add to your MCP config file (no installation required):

Claude Desktop (~/.config/claude/claude_desktop_config.json):

{
  "mcpServers": {
    "perplexity-webui-scraper": {
      "command": "uvx",
      "args": [
        "--from",
        "perplexity-webui-scraper[mcp]@latest",
        "perplexity-webui-scraper-mcp"
      ],
      "env": {
        "PERPLEXITY_SESSION_TOKEN": "your_token_here"
      }
    }
  }
}

From GitHub prod branch:

{
  "mcpServers": {
    "perplexity-webui-scraper": {
      "command": "uvx",
      "args": [
        "--from",
        "perplexity-webui-scraper[mcp]@git+https://github.com/henrique-coder/perplexity-webui-scraper.git@prod",
        "perplexity-webui-scraper-mcp"
      ],
      "env": {
        "PERPLEXITY_SESSION_TOKEN": "your_token_here"
      }
    }
  }
}

From local directory (for development):

{
  "mcpServers": {
    "perplexity-webui-scraper": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/perplexity-webui-scraper",
        "run",
        "perplexity-webui-scraper-mcp"
      ],
      "env": {
        "PERPLEXITY_SESSION_TOKEN": "your_token_here"
      }
    }
  }
}

Available Tools

Each tool uses a specific AI model. Enable only the ones you need:

Tool Model Description Min. Tier
pplx_ask Pro Automatically selects the most responsive model based on the query pro
pplx_deep_research Deep research Fast and thorough for routine research pro
pplx_sonar Sonar Perplexity's latest model pro
pplx_gemini_flash Gemini 3 Flash Google's fast model pro
pplx_gemini_flash_think Gemini 3 Flash Thinking Google's fast model with thinking pro
pplx_gemini31_pro Gemini 3.1 Pro Google's latest model pro
pplx_gemini31_pro_think Gemini 3.1 Pro Thinking Google's latest model with thinking pro
pplx_gpt54 GPT-5.4 OpenAI's latest model pro
pplx_gpt54_thinking GPT-5.4 Thinking OpenAI's latest model with thinking pro
pplx_claude_s46 Claude Sonnet 4.6 Anthropic's fast model pro
pplx_claude_s46_think Claude Sonnet 4.6 Thinking Anthropic's newest reasoning model pro
pplx_claude_o46 Claude Opus 4.6 Anthropic's most advanced model max
pplx_claude_o46_think Claude Opus 4.6 Thinking Anthropic's Opus reasoning model with thinking max
pplx_grok41 Grok 4.1 xAI's latest model pro
pplx_grok41_think Grok 4.1 Thinking xAI's latest model with thinking pro
pplx_kimi_k25_think Kimi K2.5 Thinking Moonshot AI's latest model with thinking pro

All tools support source_focus: web, academic, social, finance, all

Disclaimer

This is an unofficial library. It uses internal APIs that may change without notice. Use at your own risk.

By using this library, you agree to Perplexity AI's Terms of Service.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

perplexity_webui_scraper-0.6.1.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

perplexity_webui_scraper-0.6.1-py3-none-any.whl (29.6 kB view details)

Uploaded Python 3

File details

Details for the file perplexity_webui_scraper-0.6.1.tar.gz.

File metadata

  • Download URL: perplexity_webui_scraper-0.6.1.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for perplexity_webui_scraper-0.6.1.tar.gz
Algorithm Hash digest
SHA256 aa791015cb167ceca9c64e3c28646f18b8c82c6b1179a5ef65e2355012bbdfe0
MD5 1ae905506a175015de778c3d616532c4
BLAKE2b-256 1a6d8e883475c2ea962d713d0fb47f732c960e81de98b96331518d7fb7da4d9d

See more details on using hashes here.

File details

Details for the file perplexity_webui_scraper-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: perplexity_webui_scraper-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 29.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for perplexity_webui_scraper-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f31c25f565169053bfa94dec1aa4f4be8fcbc47d42c413104af1386070e8911a
MD5 418d336c1015158fd34402778ba58949
BLAKE2b-256 768bc4f6a00ba4fe6082660790f5bf6185bf9245d67d5598514972fb0c4ce092

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page