Official Python SDK for the WellMarked API — convert any URL to clean Markdown.
Project description
wellmarked
Official Python SDK for the WellMarked API — convert any URL to clean Markdown.
pip install wellmarked
Quick start
from wellmarked import WellMarked
with WellMarked(api_key="wm_...") as wm:
result = wm.extract("https://example.com/article")
print(result.markdown)
print(result.metadata.title, "by", result.metadata.author)
print("retrieved at", result.metadata.retrieved_at)
result.metadata.retrieved_at is a datetime (UTC) recording when WellMarked actually fetched the page — distinct from result.metadata.date (the article's published date, often None). Useful for cache-freshness checks on the caller's side.
The API key can also be picked up from the WELLMARKED_API_KEY environment variable, in which case WellMarked() is enough.
Get a key at wellmarked.io.
Pricing
| Free | Pro | Enterprise | |
|---|---|---|---|
| Monthly Price | $0 | $29/mo | $199/mo |
| Annual Price | — | $299/yr | $1,999/yr |
| Included Requests | 500/mo | 7,500/mo | 150,000/mo |
| Bulk Requests | ❌ | ✅ (up to 50/request) | ✅ (Unlimited) |
| Crawl | ❌ | ✅ (depth 5, 1k pages) | ✅ (Unlimited) |
| Overage Rate | — | $0.004/req | $0.002/req |
| JS Rendering | ❌ | ✅ | ✅ |
| Priority Queue | Standard | High | Highest |
See additional pricing information at wellmarked.io/#pricing.
Async
AsyncWellMarked is a drop-in async equivalent — every endpoint method is a coroutine.
import asyncio
from wellmarked import AsyncWellMarked
async def main():
async with AsyncWellMarked() as wm:
result = await wm.extract("https://example.com/article")
print(result.markdown)
asyncio.run(main())
Bulk extraction
Submit many URLs at once (Pro: up to 50; Enterprise: unlimited). The call returns immediately with a job_id. Poll with get_job or block until done with wait_for_job.
job = wm.bulk([
"https://example.com/article-1",
"https://example.com/article-2",
])
job = wm.wait_for_job(job.job_id) # blocks until status == "done"
for item in job.results:
if item.ok:
print(item.metadata.title)
else:
print(f"{item.url} failed: {item.error}")
get_job and wait_for_job are polymorphic — they work for both bulk and crawl job_ids. The SDK reads a kind discriminator from the API response and returns either a BulkJob or a CrawlJob. Use isinstance(job, CrawlJob) (or check job.kind == "crawl") before reading crawl-specific fields like job.truncated or item.depth.
Crawl
Crawl a site BFS-style from a root URL — same-site links only, with per-plan depth and page caps (Pro: depth 5, up to 1,000 pages; Enterprise: unlimited). Like bulk, this returns a queued job; poll with get_job or block until done with wait_for_job — the same two functions work on both kinds.
job = wm.crawl("https://docs.example.com", depth=2)
job = wm.wait_for_job(job.job_id) # works for crawl AND bulk job ids
for page in job.results:
if page.ok:
print(f"depth={page.depth} {page.metadata.title}")
else:
print(f"{page.url} failed: {page.error}")
if job.truncated:
print(f"crawl stopped early: {job.truncated_reason}")
Each successful page consumes one request from your monthly quota — failed pages (timeouts, robots-disallowed, no-content) are not billed. If you run out of quota mid-crawl the job finishes with truncated=True, truncated_reason="quota_exhausted".
Custom headers
Pass extra HTTP headers on every request — useful for correlation IDs, multi-tenant identifiers, or a custom user-agent suffix:
with WellMarked(
api_key="wm_...",
headers={"X-Trace-Id": "req-abc-123", "X-Tenant": "acme"},
) as wm:
wm.extract("https://example.com")
Headers can also be added or removed at runtime:
wm.set_header("X-Run-Id", "run-99")
wm.extract(...) # carries X-Run-Id
wm.remove_header("X-Run-Id")
Authorization, Content-Type, and Accept are reserved — the SDK manages them itself, and entries passed in headers= for those keys are silently ignored. To rotate the bearer token, use rotate_key().
Usage & rate limits
get_usage() is the source of truth for your current-period quota. The quota state belongs on the account, so call get_usage() when you want it:
usage = wm.get_usage()
print(f"{usage.used} / {usage.limit} used this period ({usage.plan}) — {usage.remaining} left")
GET /usage itself does not count toward your quota.
Key rotation
rotated = wm.rotate_key()
print("New key:", rotated.api_key) # shown once — store it before the program exits
After rotate_key() the client automatically switches to the new key for subsequent calls; you still need to persist rotated.api_key somewhere durable, because the previous key stops working immediately and there is no recovery flow.
Errors
Every non-2xx response is translated into a typed exception. Catch the base class to handle anything, or the specific subclass to handle one failure mode:
from wellmarked import (
WellMarked,
AuthenticationError,
PermissionDeniedError,
NotFoundError,
UnprocessableEntityError,
RateLimitError,
APIConnectionError,
)
with WellMarked() as wm:
try:
result = wm.extract("https://example.com/paywalled")
except RateLimitError as e:
print(f"Quota hit. Resets in {e.retry_after}s.")
except UnprocessableEntityError as e:
# e.code is one of: no_content, target_timeout, js_rendering_disabled, ...
print(f"Extraction failed ({e.code}): {e.message}")
| Exception | HTTP | Typical code values |
|---|---|---|
AuthenticationError |
401 | missing_api_key, invalid_api_key |
PermissionDeniedError |
403 | account_inactive, plan_not_supported, forbidden |
NotFoundError |
404 | job_not_found |
UnprocessableEntityError |
422 | no_content, target_timeout, js_rendering_disabled, bulk_cap_exceeded, crawl_depth_exceeded |
RateLimitError |
429 | rate_limit_exceeded (carries retry_after in seconds) |
InternalServerError |
5xx | — |
APIConnectionError |
— | DNS / TCP / TLS / timeout failures, raised before any HTTP round-trip |
All inherit from WellMarkedError.
Configuration
WellMarked(
api_key="wm_...", # or set WELLMARKED_API_KEY
base_url="https://api.wellmarked.io",
timeout=30.0, # seconds, per request
http_client=my_httpx_client, # optional: bring your own httpx.Client
headers={"X-Trace-Id": "..."}, # optional: extra headers on every request
)
Passing your own httpx.Client/httpx.AsyncClient is useful for custom transports, proxies, or shared connection pools. When you do, the SDK won't close it on __exit__ — you remain responsible for its lifecycle.
For Agents
If you are an agent, feel free to find additional context here!
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wellmarked-1.1.0.tar.gz.
File metadata
- Download URL: wellmarked-1.1.0.tar.gz
- Upload date:
- Size: 14.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
614e303b9f85fe7368e2d4a73d7da794a562891c6437676233ad7ceeeea2955a
|
|
| MD5 |
e4fe170c34e51083a8414b3476374252
|
|
| BLAKE2b-256 |
b55fefeb3967b6a46958078f81822362946e9016b2cd6d4b4f7a79bc162d5fd5
|
Provenance
The following attestation bundles were made for wellmarked-1.1.0.tar.gz:
Publisher:
publish-python-sdk.yml on WellMarkedAPI/WellMarked
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wellmarked-1.1.0.tar.gz -
Subject digest:
614e303b9f85fe7368e2d4a73d7da794a562891c6437676233ad7ceeeea2955a - Sigstore transparency entry: 1574180036
- Sigstore integration time:
-
Permalink:
WellMarkedAPI/WellMarked@409e8782f54a60daf47025036405063b26dc5182 -
Branch / Tag:
refs/tags/python-sdk-v1.1.0 - Owner: https://github.com/WellMarkedAPI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-python-sdk.yml@409e8782f54a60daf47025036405063b26dc5182 -
Trigger Event:
release
-
Statement type:
File details
Details for the file wellmarked-1.1.0-py3-none-any.whl.
File metadata
- Download URL: wellmarked-1.1.0-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
254ad7b50a66c78c35167f9681a602ab3cf7492e12b38ae8b51ae8bc19851a2a
|
|
| MD5 |
6deec5e5b1cbf6637dbd7508a7ed16fa
|
|
| BLAKE2b-256 |
9ccbe21e919ff3b391431c9ef63fd6d1ed3c641815a5957219fa31a971c0a4a2
|
Provenance
The following attestation bundles were made for wellmarked-1.1.0-py3-none-any.whl:
Publisher:
publish-python-sdk.yml on WellMarkedAPI/WellMarked
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wellmarked-1.1.0-py3-none-any.whl -
Subject digest:
254ad7b50a66c78c35167f9681a602ab3cf7492e12b38ae8b51ae8bc19851a2a - Sigstore transparency entry: 1574180092
- Sigstore integration time:
-
Permalink:
WellMarkedAPI/WellMarked@409e8782f54a60daf47025036405063b26dc5182 -
Branch / Tag:
refs/tags/python-sdk-v1.1.0 - Owner: https://github.com/WellMarkedAPI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-python-sdk.yml@409e8782f54a60daf47025036405063b26dc5182 -
Trigger Event:
release
-
Statement type: