Skip to main content

Every web site provides APIs.

Project description

toapi

CI PyPI Python License Downloads Ruff uv

Turn any website into a JSON API — declaratively.

toapi lets you point at a web page, declare the fields you want with CSS selectors, and get back a clean JSON API. No crawler to babysit, no database to maintain — pages are fetched and parsed on demand, with built‑in caching.

Install

pip install toapi

Requires Python 3.10+.

Quickstart

from htmlparsing import Attr, Text
from toapi import Api, Item

api = Api()


@api.site("https://news.ycombinator.com")
@api.list(".athing")
@api.route("/posts", "/news")
@api.route("/posts?page={page}", "/news?p={page}")
class Post(Item):
    title = Text(".titleline > a")
    url = Attr(".titleline > a", "href")


api.run(host="127.0.0.1", port=5000)

Run it:

python app.py

Then visit http://127.0.0.1:5000/posts and you get:

{
  "Post": [
    {"title": "Mathematicians Crack the Cursed Curve", "url": "https://www.quantamagazine.org/..."},
    {"title": "Stuffing a Tesla Drivetrain into a 1981 Honda Accord", "url": "https://jalopnik.com/..."}
  ]
}

How it works

   ┌────────────┐    ┌────────────┐    ┌────────────┐
   │  /posts    │ ─▶ │  fetch     │ ─▶ │  parse     │ ─▶  JSON
   │  (route)   │    │  (cache)   │    │  (Item)    │
   └────────────┘    └────────────┘    └────────────┘
  1. Route@api.route("/posts", "/news") maps your API path to a source URL.
  2. Fetch — pages are fetched with requests (or a headless browser if you pass browser=) and cached in memory.
  3. Parse — each Item extracts fields with CSS selectors via htmlparsing.
  4. Serve — Flask returns the result as JSON; subsequent calls hit the cache.

Features

  • Declarative — describe data, not scraping logic.
  • Routes — map clean API paths to messy source URLs with {param} placeholders.
  • Multi-site — merge several websites behind one API.
  • Cleaning hooks — define clean_<field> methods to post-process values.
  • Caching — pages and parsed results are cached automatically.
  • Headless browser — pass Api(browser="/path/to/geckodriver") for JS-heavy sites.

Cleaning values

Add a clean_<fieldname> method on the Item to transform a value before it's returned:

@api.site("https://news.ycombinator.com")
@api.route("/posts", "/news")
class Page(Item):
    next_page = Attr(".morelink", "href")

    def clean_next_page(self, value):
        return f"/posts?{value.split('?', 1)[1]}"

Development

git clone https://github.com/elliotgao2/toapi.git
cd toapi
uv sync          # install deps into .venv
uv run pytest    # run tests
uv run ruff check .

We use uv for packaging and ruff for lint + format. Pre-commit hooks keep both clean:

uv run pre-commit install

Contributing

Pull requests are welcome. For non-trivial changes, please open an issue first to discuss what you'd like to change. Make sure uv run pytest and uv run ruff check . pass before submitting.

License

MIT © Elliot Gao

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toapi-2.2.4.tar.gz (422.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toapi-2.2.4-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file toapi-2.2.4.tar.gz.

File metadata

  • Download URL: toapi-2.2.4.tar.gz
  • Upload date:
  • Size: 422.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.9

File hashes

Hashes for toapi-2.2.4.tar.gz
Algorithm Hash digest
SHA256 418afa947e4ec0e5d8fa4c5349252c1ff10d66824ba0e6548f3b1316356ac755
MD5 0589f0b616b8aa564b6fe70585f5e346
BLAKE2b-256 bdb7db48d467983dc6c2cb6badad4add17b45218c694877e855e2743c9a9ef31

See more details on using hashes here.

File details

Details for the file toapi-2.2.4-py3-none-any.whl.

File metadata

  • Download URL: toapi-2.2.4-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.9

File hashes

Hashes for toapi-2.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 344a1cd42c93045d2ffa0e0c9355dc7b8a7aed44c9995e0d6915bd347117fd77
MD5 2deefd3a7f90d3d1374c06d6ff5eb746
BLAKE2b-256 59df7c022494741e4d7cdec79901e2fb063e3a4a97d8bd797a1df622850ec555

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page