The official Python SDK for WebLM, providing HTML to Markdown conversion and web content extraction capabilities
Project description
WebLM API Client
The official Python SDK for WebLM, providing HTML to Markdown conversion and web content extraction capabilities.
Features
- Convert HTML content from URLs to Markdown
- Smart conversion with AI enhancement
- Extract links from webpages
- Transform web content into structured data using Pydantic models
- Synchronous and asynchronous API support
- Simple API key authentication
Installation
You can install the WebLM Python SDK directly from PyPI:
pip install weblm-python
For development, including testing and linting tools:
pip install weblm-python[dev]
Or install the latest development version from GitHub:
pip install git+https://github.com/WebLM/weblm-python.git
Quick Start
Basic Usage
from weblm import WebLM
# Initialize with your API key
client = WebLM(api_key="your_api_key")
# Convert HTML to Markdown
result = client.convert(url="https://example.com")
print(result["markdown"])
# Smart convert with AI enhancement
smart_result = client.smart_convert(url="https://example.com")
print(smart_result["markdown"])
# Extract links from a webpage
links = client.scrape_links(url="https://example.com")
print(links["urls"])
Asynchronous Usage
import asyncio
from weblm import AsyncWebLM
async def main():
# Initialize with your API key
client = AsyncWebLM(api_key="your_api_key")
try:
# Convert HTML to Markdown
result = await client.convert(url="https://example.com")
print(result["markdown"][:100] + "...")
# Run multiple operations concurrently
smart_result, links = await asyncio.gather(
client.smart_convert(url="https://example.com"),
client.scrape_links(url="https://example.com")
)
print(f"Smart conversion: {smart_result['markdown'][:100]}...")
print(f"Found {len(links['urls'])} links")
finally:
# Always close the client when done
await client.close()
# Run the async function
asyncio.run(main())
Transform Web Content with Pydantic Models
Define a Pydantic model and transform web content directly into structured data:
from pydantic import BaseModel
from typing import List, Optional
from weblm import WebLM
# Define your data model
class Article(BaseModel):
title: str
author: Optional[str] = None
content: str
categories: List[str] = []
# Initialize client
client = WebLM(api_key="your_api_key")
# Transform web content into your model
article = client.transform(
url="https://example.com/article",
model_class=Article
)
# Work with structured data
print(f"Title: {article.title}")
print(f"Author: {article.author}")
print(f"Content preview: {article.content[:100]}...")
print(f"Categories: {', '.join(article.categories)}")
API Reference
WebLM
Initialization
client = WebLM(api_key="your_api_key")
api_key: Your API key for authentication
Methods
convert(url, return_token_count=False, model_name="gemini-2.0-flash"): Convert HTML to Markdownsmart_convert(url, return_token_count=False, model_name="gemini-2.0-flash"): Convert HTML to refined Markdown using AIscrape_links(url, include_media=False, domain_only=True): Extract links from a webpageget_models(): Get list of available language modelstransform(url, model_class): Transform web content into a Pydantic model
AsyncWebLM
Provides the same methods as WebLM but with asynchronous support. Additionally includes:
close(): Close the underlying HTTP session (should be called when done)
Error Handling
from weblm import WebLM, WebLMAPIError
client = WebLM(api_key="your_api_key")
try:
result = client.convert(url="https://example.com")
print(result["markdown"])
except WebLMAPIError as e:
print(f"API Error: {e}")
Advanced Usage
Concurrent Processing with AsyncWebLM
import asyncio
from weblm import AsyncWebLM
from pydantic import BaseModel
from typing import List
class ArticlePreview(BaseModel):
title: str
summary: str
async def process_multiple_urls(urls):
client = AsyncWebLM(api_key="your_api_key")
try:
# Create tasks for all URLs
tasks = [
client.transform(url=url, model_class=ArticlePreview)
for url in urls
]
# Process all URLs concurrently
articles = await asyncio.gather(*tasks)
# Return the processed articles
return articles
finally:
await client.close()
# Example usage
urls = [
"https://example.com/article1",
"https://example.com/article2",
"https://example.com/article3"
]
articles = asyncio.run(process_multiple_urls(urls))
for i, article in enumerate(articles):
print(f"Article {i+1}: {article.title}")
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file weblm_python-1.0.0.tar.gz.
File metadata
- Download URL: weblm_python-1.0.0.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
706ea841f817b3cbd636997550207b969330d4b44745d0301fc7be20e38c75f2
|
|
| MD5 |
d8e32afe739fc673cd002f2ddedff9d7
|
|
| BLAKE2b-256 |
aa94b6f3ebc5755154f144b95700d3195b1bf3ab1f0fd4c6e3187d05dff1f40c
|
File details
Details for the file weblm_python-1.0.0-py3-none-any.whl.
File metadata
- Download URL: weblm_python-1.0.0-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3705253c5343e21ef8c4e8d2db1aeed4f47cfd5f9a39f14d5c71f36b9e5911a5
|
|
| MD5 |
dfe229db1123089e7e072b8bd682cb0f
|
|
| BLAKE2b-256 |
874f40a913944916837da92c4a07e12b07f1c047a39df6c37ddddc5b4fef31d1
|