The data-harvesting foundation for the AI era — a zero-boilerplate framework for AI agents and vibecoding

These details have not been verified by PyPI

Project links

Project description

harvex

The data-harvesting foundation for the AI era — a zero-boilerplate framework for AI agents and vibecoding.

Let the AI (or your own vibecoding) write only the one thing it's good at — how to fetch, how to parse — and leave everything else to the framework: concurrent scheduling, field consolidation, deduplicated writes, run metadata, HTTP retries, logging, alerting, data-health checks, scheduling, a web browser UI, LLM translation/enrichment, and a TUI control panel.

pip install harvex                              # core, zero heavy deps
pip install "harvex[web,llm,browser,tui]"       # opt into extras as needed

Why "a harvesting foundation for the AI era"

When an LLM writes a scraper, the model is great at "how do I turn this page/endpoint into structured data" and bad at — and most likely to get wrong — the surrounding engineering: retry/backoff, concurrency isolation, incremental dedup, schema drift, scheduling, observability. harvex turns all of that into a stable foundation and gives the AI a narrow, rock-solid contract surface:

from harvex import BaseSource, SourceProfile

class GithubTrending(BaseSource):
    profile = SourceProfile(slug="gh_trending", name="GitHub Trending")

    def fetch(self):
        return self.ctx.http.get_json("https://api.example.com/trending")

    def parse(self, raw):
        for item in raw["items"]:
            yield {"title": item["name"], "stars": item["stars"]}

The AI only needs to produce a class like this, and harvex run drives the whole chain: fetch → validate → dedup → store → run metadata → health check. New fields won't blow up your main table (they fold into an extra column automatically), one failing source won't take down the round, and dirty data is rejected before it hits the database.

Design principles

Zero heavy core deps: the core depends only on pydantic / httpx / tenacity. playwright, openai, web, and TUI are all extras installed on demand.
Field consolidation as a contract: HarvestRecord (pydantic v2) enforces the "don't let the main table become a sparse matrix" discipline — undeclared fields fold automatically, dirty data is caught before writing.
Fault isolation: one failing source never breaks the whole round.
SQLite first, Sink abstraction reserved: works out of the box, with a clean extension point.
Scheduling decoupled from the web: CLI + system launchd/cron, instead of parasitizing a timer thread inside the web process.

Layers

sources/*.py (you / AI write)   BaseSource subclass: fetch() + parse()
     ↓ raw → list[dict]
core/pipeline                   validate(pydantic) → consolidate(extra fold) → store → metadata → health
     ↓
storage/sqlite_sink             create/alter table + upsert dedup + per-round backup
     ↓
SQLite business DB + metadata DB
     ↓ (extras)
extras/web  browse    extras/llm  translate    extras/tui  control panel
orchestration: core/runner (concurrency) + cli (harvex run / health / gen-launchd)

New project skeleton

my_project/
├── config.toml         # source toggles / schedule / filters / notifications
├── .env.local          # secrets (openai key, webhook url)
├── fields.py           # your HarvestRecord subclass — standard fields
├── sources/            # one file per source, just fetch/parse
└── database/  logs/

A complete runnable template lives in templates/project/.

CLI

harvex list                 # list discovered sources
harvex run --all            # run one round over all sources
harvex run gh_trending      # run specific sources
harvex health               # data-health check (zeroed-out / sharp drop)
harvex gen-launchd          # generate a macOS launchd schedule
harvex gen-cron             # generate a crontab line
harvex web                  # start the read-only browse UI (needs [web])
harvex tui                  # start the local control panel (needs [tui])

Development

uv venv && uv pip install -e ".[dev]"
uv run pytest

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jun 24, 2026

0.1.1

Jun 24, 2026

0.1.0

Jun 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

harvex-0.2.0.tar.gz (144.0 kB view details)

Uploaded Jun 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

harvex-0.2.0-py3-none-any.whl (105.1 kB view details)

Uploaded Jun 24, 2026 Python 3

File details

Details for the file harvex-0.2.0.tar.gz.

File metadata

Download URL: harvex-0.2.0.tar.gz
Upload date: Jun 24, 2026
Size: 144.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for harvex-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a425c03ee7c06c90bc0b4834ee3c8cba2bc1aad68ff1f67aefb3f475d9447734`
MD5	`4d7322d81bd35ebe1ba69984415f6684`
BLAKE2b-256	`9d1ffa80c27d4752ae37804980aaa838e6207db4249f3ca49de292823c198550`

See more details on using hashes here.

File details

Details for the file harvex-0.2.0-py3-none-any.whl.

File metadata

Download URL: harvex-0.2.0-py3-none-any.whl
Upload date: Jun 24, 2026
Size: 105.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for harvex-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`878d032be32c9eebc1a7cbc1849236e68f136c97c11d2173cb18f7cb5b6b7c8f`
MD5	`0482eb29f84f0d4633d8f981464684b7`
BLAKE2b-256	`76807708f49b63304d801a2fab7729abfc7ad1dbb703db3eee043f5182dea786`

See more details on using hashes here.

harvex 0.2.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

harvex

Why "a harvesting foundation for the AI era"

Design principles

Layers

New project skeleton

CLI

Development

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes