AI-Powered Selector Discovery - Discover once, scrape forever
Project description
[!WARNING] Yosoi is currently in Alpha. The API is expected to change significantly. We do not expect a stable API until we are out of Beta.
Yosoi - You Only Scrape Once (iteratively)
Discover once, scrape forever
Give Yosoi a URL, domain, or group of URLs, and it uses AI to automatically discover the best selectors for structured content.
Installation
# Install yosoi using uv
uv add yosoi
Quick Start
API Key
Export your API Key or create a .env file
# Set keys for whichever providers you want to use
<PROVIDER_NAME>_KEY=your_api_key_here
GROQ_API_KEY=your_groq_key_here # groq/...
GEMINI_API_KEY=your_gemini_api_key_here # gemini/...
OPENAI_API_KEY=your_openai_api_key_here # openai/...
CEREBRAS_API_KEY=your_cerebras_api_key_here # cerebras/...
OPENROUTER_API_KEY=your_openrouter_key_here # openrouter/...
See the full list of supported providers
Basic Usage
CLI Usage
# Specify model explicitly with -m provider:model-name
uv run yosoi -m groq:llama-3.3-70b-versatile --url https://qscrape.dev/l1/eshop/catalog/?cat=Forge%20%26%20Smithing --contract Product
You can then find your scraped content, selectors and logs in ./.yosoi relative to the directory you run the CLI command from.
Python Usage
We also have example scripts, you can find them in our example docs
Citation
If you use yosoi in your research or projects, please cite it using the metadata provided in the CITATION.cff file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yosoi-0.0.1a13.tar.gz.
File metadata
- Download URL: yosoi-0.0.1a13.tar.gz
- Upload date:
- Size: 100.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91c8d93f0de750e2ca9e2a7da632657fcb2099eb87da59f9812f8b5f5e9f9043
|
|
| MD5 |
d3a0c3207a0413d633a55cbc6375ec6a
|
|
| BLAKE2b-256 |
ff33db0c847d4e7317713df6980ea0a9ce372649f94bc2dc51d597f6828be6c1
|
Provenance
The following attestation bundles were made for yosoi-0.0.1a13.tar.gz:
Publisher:
publish.yaml on CascadingLabs/Yosoi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yosoi-0.0.1a13.tar.gz -
Subject digest:
91c8d93f0de750e2ca9e2a7da632657fcb2099eb87da59f9812f8b5f5e9f9043 - Sigstore transparency entry: 1191831677
- Sigstore integration time:
-
Permalink:
CascadingLabs/Yosoi@31ee7813bfbe29f3e1988a6590535917877b9efb -
Branch / Tag:
refs/tags/0.0.1a13 - Owner: https://github.com/CascadingLabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@31ee7813bfbe29f3e1988a6590535917877b9efb -
Trigger Event:
release
-
Statement type:
File details
Details for the file yosoi-0.0.1a13-py3-none-any.whl.
File metadata
- Download URL: yosoi-0.0.1a13-py3-none-any.whl
- Upload date:
- Size: 129.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1fb3ee679448da5779a48917901dbdf1999903a5dcf7618bc3e5e36e96996f6
|
|
| MD5 |
df387c98a48cdf4389a28bb7262059d9
|
|
| BLAKE2b-256 |
28dfe7b9f712e7d4e9fa0af507d4177df06ddf0724f621410107af0ab98987b7
|
Provenance
The following attestation bundles were made for yosoi-0.0.1a13-py3-none-any.whl:
Publisher:
publish.yaml on CascadingLabs/Yosoi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yosoi-0.0.1a13-py3-none-any.whl -
Subject digest:
a1fb3ee679448da5779a48917901dbdf1999903a5dcf7618bc3e5e36e96996f6 - Sigstore transparency entry: 1191831696
- Sigstore integration time:
-
Permalink:
CascadingLabs/Yosoi@31ee7813bfbe29f3e1988a6590535917877b9efb -
Branch / Tag:
refs/tags/0.0.1a13 - Owner: https://github.com/CascadingLabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@31ee7813bfbe29f3e1988a6590535917877b9efb -
Trigger Event:
release
-
Statement type: