Every web site provides APIs.
Project description
toapi
Turn any website into a JSON API — declaratively.
toapi lets you point at a web page, declare the fields you want with CSS
selectors, and get back a clean JSON API. No crawler to babysit, no database to
maintain — pages are fetched and parsed on demand, with built‑in caching.
Install
pip install toapi
Requires Python 3.10+.
Quickstart
from htmlparsing import Attr, Text
from toapi import Api, Item
api = Api()
@api.site("https://news.ycombinator.com")
@api.list(".athing")
@api.route("/posts", "/news")
@api.route("/posts?page={page}", "/news?p={page}")
class Post(Item):
title = Text(".titleline > a")
url = Attr(".titleline > a", "href")
api.run(host="127.0.0.1", port=5000)
Run it:
python app.py
Then visit http://127.0.0.1:5000/posts and you get:
{
"Post": [
{"title": "Mathematicians Crack the Cursed Curve", "url": "https://www.quantamagazine.org/..."},
{"title": "Stuffing a Tesla Drivetrain into a 1981 Honda Accord", "url": "https://jalopnik.com/..."}
]
}
How it works
┌────────────┐ ┌────────────┐ ┌────────────┐
│ /posts │ ─▶ │ fetch │ ─▶ │ parse │ ─▶ JSON
│ (route) │ │ (cache) │ │ (Item) │
└────────────┘ └────────────┘ └────────────┘
- Route —
@api.route("/posts", "/news")maps your API path to a source URL. - Fetch — pages are fetched with
requests(or a headless browser if you passbrowser=) and cached in memory. - Parse — each
Itemextracts fields with CSS selectors viahtmlparsing. - Serve — Flask returns the result as JSON; subsequent calls hit the cache.
Features
- Declarative — describe data, not scraping logic.
- Routes — map clean API paths to messy source URLs with
{param}placeholders. - Multi-site — merge several websites behind one API.
- Cleaning hooks — define
clean_<field>methods to post-process values. - Caching — pages and parsed results are cached automatically.
- Headless browser — pass
Api(browser="/path/to/geckodriver")for JS-heavy sites.
Cleaning values
Add a clean_<fieldname> method on the Item to transform a value before it's
returned:
@api.site("https://news.ycombinator.com")
@api.route("/posts", "/news")
class Page(Item):
next_page = Attr(".morelink", "href")
def clean_next_page(self, value):
return f"/posts?{value.split('?', 1)[1]}"
Development
git clone https://github.com/elliotgao2/toapi.git
cd toapi
uv sync # install deps into .venv
uv run pytest # run tests
uv run ruff check .
We use uv for packaging and ruff for lint + format. Pre-commit hooks keep both clean:
uv run pre-commit install
Contributing
Pull requests are welcome. For non-trivial changes, please open an issue first
to discuss what you'd like to change. Make sure uv run pytest and
uv run ruff check . pass before submitting.
License
MIT © Elliot Gao
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toapi-2.2.2.tar.gz.
File metadata
- Download URL: toapi-2.2.2.tar.gz
- Upload date:
- Size: 421.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f063f43fa3cdbf47ef9da4becf028e30f741de9188db3b0715a10131d001c9f
|
|
| MD5 |
6e3874e9517d529b32a1f5991199f18d
|
|
| BLAKE2b-256 |
6c5c10a3fe941ccd176b84a2abb5c196f78de8e4bf2e00d9b93cac47edfdf034
|
File details
Details for the file toapi-2.2.2-py3-none-any.whl.
File metadata
- Download URL: toapi-2.2.2-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30783022597bf46ca0c093d3ebc42fa58b70140a9acc807422bf9079246f4105
|
|
| MD5 |
6db046d421181d4e28e5feb48433891c
|
|
| BLAKE2b-256 |
a9f527d42c41617156036705fab938aded2d96576a109b758a11734571ae4165
|