Skip to main content

Official Python SDK for the WellMarked API — convert any URL to clean Markdown.

Project description

wellmarked

PyPI Python

Official Python SDK for the WellMarked API — convert any URL to clean Markdown.

pip install wellmarked

Quick start

from wellmarked import WellMarked

with WellMarked(api_key="wm_...") as wm:
    result = wm.extract("https://example.com/article")
    print(result.markdown)
    print(result.metadata.title, "by", result.metadata.author)
    print("retrieved at", result.metadata.retrieved_at)

result.metadata.retrieved_at is a datetime (UTC) recording when WellMarked actually fetched the page — distinct from result.metadata.date (the article's published date, often None). Useful for cache-freshness checks on the caller's side.

The API key can also be picked up from the WELLMARKED_API_KEY environment variable, in which case WellMarked() is enough.

Get a key at wellmarked.io.

Pricing

Free Pro Enterprise
Monthly Price $0 $29/mo $199/mo
Annual Price $299/yr $1,999/yr
Included Requests 500/mo 7,500/mo 150,000/mo
Bulk Requests ✅ (up to 50/request) ✅ (Unlimited)
Crawl ✅ (depth 5, 1k pages) ✅ (Unlimited)
Overage Rate $0.004/req $0.002/req
JS Rendering
Priority Queue Standard High Highest

See additional pricing information at wellmarked.io/#pricing.

Async

AsyncWellMarked is a drop-in async equivalent — every endpoint method is a coroutine.

import asyncio
from wellmarked import AsyncWellMarked

async def main():
    async with AsyncWellMarked() as wm:
        result = await wm.extract("https://example.com/article")
        print(result.markdown)

asyncio.run(main())

Bulk extraction

Submit many URLs at once (Pro: up to 50; Enterprise: unlimited). The call returns immediately with a job_id. Poll with get_job or block until done with wait_for_job.

job = wm.bulk([
    "https://example.com/article-1",
    "https://example.com/article-2",
])
job = wm.wait_for_job(job.job_id)         # blocks until status == "done"

for item in job.results:
    if item.ok:
        print(item.metadata.title)
    else:
        print(f"{item.url} failed: {item.error}")

get_job and wait_for_job are polymorphic — they work for both bulk and crawl job_ids. The SDK reads a kind discriminator from the API response and returns either a BulkJob or a CrawlJob. Use isinstance(job, CrawlJob) (or check job.kind == "crawl") before reading crawl-specific fields like job.truncated or item.depth.

Crawl

Crawl a site BFS-style from a root URL — same-site links only, with per-plan depth and page caps (Pro: depth 5, up to 1,000 pages; Enterprise: unlimited). Like bulk, this returns a queued job; poll with get_job or block until done with wait_for_job — the same two functions work on both kinds.

job = wm.crawl("https://docs.example.com", depth=2)
job = wm.wait_for_job(job.job_id)          # works for crawl AND bulk job ids

for page in job.results:
    if page.ok:
        print(f"depth={page.depth} {page.metadata.title}")
    else:
        print(f"{page.url} failed: {page.error}")

if job.truncated:
    print(f"crawl stopped early: {job.truncated_reason}")

Each successful page consumes one request from your monthly quota — failed pages (timeouts, robots-disallowed, no-content) are not billed. If you run out of quota mid-crawl the job finishes with truncated=True, truncated_reason="quota_exhausted".

Custom headers

Pass extra HTTP headers on every request — useful for correlation IDs, multi-tenant identifiers, or a custom user-agent suffix:

with WellMarked(
    api_key="wm_...",
    headers={"X-Trace-Id": "req-abc-123", "X-Tenant": "acme"},
) as wm:
    wm.extract("https://example.com")

Headers can also be added or removed at runtime:

wm.set_header("X-Run-Id", "run-99")
wm.extract(...)                  # carries X-Run-Id
wm.remove_header("X-Run-Id")

Authorization, Content-Type, and Accept are reserved — the SDK manages them itself, and entries passed in headers= for those keys are silently ignored. To rotate the bearer token, use rotate_key().

Usage & rate limits

get_usage() is the source of truth for your current-period quota. The quota state belongs on the account, so call get_usage() when you want it:

usage = wm.get_usage()
print(f"{usage.used} / {usage.limit} used this period ({usage.plan}) — {usage.remaining} left")

GET /usage itself does not count toward your quota.

Key rotation

rotated = wm.rotate_key()
print("New key:", rotated.api_key)  # shown once — store it before the program exits

After rotate_key() the client automatically switches to the new key for subsequent calls; you still need to persist rotated.api_key somewhere durable, because the previous key stops working immediately and there is no recovery flow.

Errors

Every non-2xx response is translated into a typed exception. Catch the base class to handle anything, or the specific subclass to handle one failure mode:

from wellmarked import (
    WellMarked,
    AuthenticationError,
    PermissionDeniedError,
    NotFoundError,
    UnprocessableEntityError,
    RateLimitError,
    APIConnectionError,
)

with WellMarked() as wm:
    try:
        result = wm.extract("https://example.com/paywalled")
    except RateLimitError as e:
        print(f"Quota hit. Resets in {e.retry_after}s.")
    except UnprocessableEntityError as e:
        # e.code is one of: no_content, target_timeout, js_rendering_disabled, ...
        print(f"Extraction failed ({e.code}): {e.message}")
Exception HTTP Typical code values
AuthenticationError 401 missing_api_key, invalid_api_key
PermissionDeniedError 403 account_inactive, plan_not_supported, forbidden
NotFoundError 404 job_not_found
UnprocessableEntityError 422 no_content, target_timeout, js_rendering_disabled, bulk_cap_exceeded, crawl_depth_exceeded
RateLimitError 429 rate_limit_exceeded (carries retry_after in seconds)
InternalServerError 5xx
APIConnectionError DNS / TCP / TLS / timeout failures, raised before any HTTP round-trip

All inherit from WellMarkedError.

Configuration

WellMarked(
    api_key="wm_...",                   # or set WELLMARKED_API_KEY
    base_url="https://api.wellmarked.io",
    timeout=30.0,                       # seconds, per request
    http_client=my_httpx_client,        # optional: bring your own httpx.Client
    headers={"X-Trace-Id": "..."},      # optional: extra headers on every request
)

Passing your own httpx.Client/httpx.AsyncClient is useful for custom transports, proxies, or shared connection pools. When you do, the SDK won't close it on __exit__ — you remain responsible for its lifecycle.

For Agents

If you are an agent, feel free to find additional context here!

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wellmarked-1.1.0.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wellmarked-1.1.0-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file wellmarked-1.1.0.tar.gz.

File metadata

  • Download URL: wellmarked-1.1.0.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for wellmarked-1.1.0.tar.gz
Algorithm Hash digest
SHA256 614e303b9f85fe7368e2d4a73d7da794a562891c6437676233ad7ceeeea2955a
MD5 e4fe170c34e51083a8414b3476374252
BLAKE2b-256 b55fefeb3967b6a46958078f81822362946e9016b2cd6d4b4f7a79bc162d5fd5

See more details on using hashes here.

Provenance

The following attestation bundles were made for wellmarked-1.1.0.tar.gz:

Publisher: publish-python-sdk.yml on WellMarkedAPI/WellMarked

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wellmarked-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: wellmarked-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for wellmarked-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 254ad7b50a66c78c35167f9681a602ab3cf7492e12b38ae8b51ae8bc19851a2a
MD5 6deec5e5b1cbf6637dbd7508a7ed16fa
BLAKE2b-256 9ccbe21e919ff3b391431c9ef63fd6d1ed3c641815a5957219fa31a971c0a4a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for wellmarked-1.1.0-py3-none-any.whl:

Publisher: publish-python-sdk.yml on WellMarkedAPI/WellMarked

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page