CLI for Anysite API - web data extraction for humans and AI agents

These details have not been verified by PyPI

Project links

Project description

Anysite CLI

Web data extraction for humans and AI agents.

Installation

pip install anysite-cli

Optional extras:

pip install "anysite-cli[data]"       # DuckDB + PyArrow for dataset pipelines
pip install "anysite-cli[postgres]"   # PostgreSQL support
pip install "anysite-cli[all]"        # All optional dependencies

Or install from source:

git clone https://github.com/anysiteio/anysite-cli.git
cd anysite-cli
python -m venv .venv
source .venv/bin/activate
pip install -e .

Quick Start

1. Configure your API key

anysite config set api_key sk-xxxxx

Or set environment variable:

export ANYSITE_API_KEY=sk-xxxxx

2. Update the schema cache

anysite schema update

3. Make your first request

anysite api /api/linkedin/user user=satyanadella

The `api` Command

A single universal command for calling any API endpoint:

anysite api <endpoint> [key=value ...] [OPTIONS]

Parameters are passed as key=value pairs. Types are auto-converted using the schema cache.

# LinkedIn
anysite api /api/linkedin/user user=satyanadella
anysite api /api/linkedin/company company=anthropic
anysite api /api/linkedin/search/users title=CTO count=50 --format csv

# Instagram
anysite api /api/instagram/user user=cristiano
anysite api /api/instagram/user/posts user=nike count=20

# Twitter/X
anysite api /api/twitter/user user=elonmusk --format table

# Web parsing
anysite api /api/web/parse url=https://example.com

# Y Combinator
anysite api /api/yc/company company=anthropic

Endpoint Discovery

Browse and search all available API endpoints:

# List all endpoints
anysite describe

# Describe a specific endpoint (input params + output fields)
anysite describe /api/linkedin/company
anysite describe linkedin.user

# Search by keyword
anysite describe --search "company"

# JSON output for scripts/agents
anysite describe --json -q

Output Formats

--format json    # Default: Pretty JSON
--format jsonl   # Newline-delimited JSON (for streaming)
--format csv     # CSV with headers
--format table   # Rich table for terminal

Field Selection

# Include specific fields (dot notation and wildcards supported)
anysite api /api/linkedin/user user=satyanadella --fields "name,headline,follower_count"

# Exclude fields
anysite api /api/linkedin/user user=satyanadella --exclude "certifications,recommendations"

# Compact JSON
anysite api /api/linkedin/user user=satyanadella --compact

Built-in field presets: minimal, contact, recruiting.

Save to File

anysite api /api/linkedin/search/users title=CTO count=100 --output ctos.json
anysite api /api/linkedin/search/users title=CTO count=100 --output ctos.csv --format csv

Pipe to jq

anysite api /api/linkedin/user user=satyanadella -q | jq '.follower_count'

Batch Processing

Process multiple inputs from a file or stdin:

# From a text file (one value per line)
anysite api /api/linkedin/user --from-file users.txt --input-key user

# From JSONL (one JSON object per line)
anysite api /api/linkedin/user --from-file users.jsonl

# From stdin
cat users.txt | anysite api /api/linkedin/user --stdin --input-key user

# Parallel execution
anysite api /api/linkedin/user --from-file users.txt --input-key user --parallel 5

# Rate limiting
anysite api /api/linkedin/user --from-file users.txt --input-key user --rate-limit "10/s"

# Error handling
anysite api /api/linkedin/user --from-file users.txt --input-key user --on-error skip

# Progress bar and stats
anysite api /api/linkedin/user --from-file users.txt --input-key user --progress --stats

Input file formats: plain text (one value per line), JSONL, CSV.

Dataset Pipelines

Collect multi-source datasets with dependency chains, store as Parquet, query with DuckDB, and load into a relational database. Includes per-source transforms, file/webhook exports, run history, scheduling, and webhook notifications.

Create a dataset

anysite dataset init my-dataset

Edit my-dataset/dataset.yaml to define sources:

name: my-dataset
sources:
  - id: companies
    endpoint: /api/linkedin/company
    from_file: companies.txt
    input_key: company
    transform:                          # Post-collection transform (for exports)
      filter: '.employee_count > 10'
      fields: [name, url, employee_count]
      add_columns:
        batch: "q1-2026"
    export:                             # Export to file/webhook after Parquet write
      - type: file
        path: ./output/companies-{{date}}.csv
        format: csv
    db_load:
      key: _input_value                    # Unique key for incremental sync
      sync: full                           # full (default) or append (no DELETE)
      fields: [name, url, employee_count]

  - id: employees
    endpoint: /api/linkedin/company/employees
    dependency:
      from_source: companies
      field: urn.value
    input_key: companies
    input_template:
      companies:
        - type: company
          value: "{value}"
      count: 5
    refresh: always                       # Re-collect every run with --incremental
    db_load:
      key: urn.value                       # Unique key for incremental sync
      sync: append                         # Keep old records (no DELETE on diff)
      fields: [name, url, headline]

storage:
  format: parquet
  path: ./data/

schedule:
  cron: "0 9 * * *"                    # Daily at 9 AM

notifications:
  on_complete:
    - url: "https://hooks.slack.com/xxx"
  on_failure:
    - url: "https://alerts.example.com/fail"

Collect, query, and load

# Preview collection plan
anysite dataset collect dataset.yaml --dry-run

# Collect data (supports --incremental to skip already-collected inputs)
anysite dataset collect dataset.yaml

# Collect and auto-load into PostgreSQL
anysite dataset collect dataset.yaml --load-db pg

# Check status
anysite dataset status dataset.yaml

# Query with SQL (DuckDB)
anysite dataset query dataset.yaml --sql "SELECT * FROM companies LIMIT 10"

# Query with dot-notation field extraction
anysite dataset query dataset.yaml --source profiles --fields "name, urn.value AS urn_id"

# Interactive SQL shell
anysite dataset query dataset.yaml --interactive

# Column stats and data profiling
anysite dataset stats dataset.yaml --source companies
anysite dataset profile dataset.yaml

# Load into PostgreSQL with automatic FK linking (incremental sync with db_load.key)
anysite dataset load-db dataset.yaml -c pg

# Drop and reload from latest snapshot
anysite dataset load-db dataset.yaml -c pg --drop-existing

# Load a specific snapshot date
anysite dataset load-db dataset.yaml -c pg --snapshot 2026-01-15

# Run history and logs
anysite dataset history my-dataset
anysite dataset logs my-dataset --run 42

# Generate cron/systemd schedule
anysite dataset schedule dataset.yaml --incremental --load-db pg

# Compare snapshots (diff two collection dates, supports dot-notation keys)
anysite dataset diff dataset.yaml --source employees --key _input_value
anysite dataset diff dataset.yaml --source profiles --key urn.value --fields "name,headline"

# Reset incremental state
anysite dataset reset-cursor dataset.yaml

Database

Manage database connections and run queries.

# Add a connection (--password auto-stores via env var reference)
anysite db add pg --type postgres --host localhost --database mydb --user app --password secret
# Or reference an existing env var
anysite db add pg --type postgres --host localhost --database mydb --user app --password-env PGPASS

# List and test connections
anysite db list
anysite db test pg

# Query
anysite db query pg --sql "SELECT * FROM companies" --format table

# Insert data (auto-create table from schema inference)
cat data.jsonl | anysite db insert pg --table users --stdin --auto-create

# Upsert with conflict handling
cat updates.jsonl | anysite db upsert pg --table users --conflict-columns id --stdin

# Inspect schema
anysite db schema pg --table users

Supports SQLite and PostgreSQL. Passwords stored as env var references.

LLM Analysis

LLM-powered analysis of collected dataset records. Summarize, classify, enrich, generate text, match records across sources, and find semantic duplicates.

pip install "anysite-cli[llm]"        # OpenAI + Anthropic SDKs

Setup

anysite llm setup

Configures provider (OpenAI or Anthropic), API key env var, and default model. Tests the connection.

Commands

# Classify records into categories (auto-detects categories if --categories omitted)
anysite llm classify dataset.yaml --source posts --categories "positive,negative,neutral" --format table

# Summarize each record
anysite llm summarize dataset.yaml --source profiles --fields "name,headline" --max-length 50

# Enrich records with LLM-extracted attributes
anysite llm enrich dataset.yaml --source posts \
  --add "sentiment:positive/negative/neutral" \
  --add "language:string" \
  --add "quality_score:1-10"

# Generate text using record fields as template variables
anysite llm generate dataset.yaml --source profiles \
  --prompt "Write a LinkedIn intro for {name} who works as {headline}" \
  --temperature 0.7

# Match records between two sources
anysite llm match dataset.yaml --source-a profiles --source-b companies --top-k 3

# Find semantic duplicates
anysite llm deduplicate dataset.yaml --source profiles --key name --threshold 0.8

Common options: --provider, --model, --fields, --format, --output, --parallel, --rate-limit, --temperature, --dry-run, --no-cache, --prompt, --prompt-file.

Cache

anysite llm cache-stats    # Show cache statistics
anysite llm cache-clear    # Clear all cached responses

Responses are cached in SQLite at ~/.anysite/llm_cache.db. Use --no-cache to skip cache lookup.

Configuration

Configuration is stored in ~/.anysite/config.yaml.

# Set a value
anysite config set api_key sk-xxxxx
anysite config set defaults.format table

# Get a value
anysite config get api_key

# List all settings
anysite config list

# Show config file path
anysite config path

# Initialize interactively
anysite config init

# Reset to defaults
anysite config reset --force

Configuration Priority

CLI arguments (--api-key)
Environment variables (ANYSITE_API_KEY)
Config file (~/.anysite/config.yaml)
Defaults

Global Options

anysite [OPTIONS] COMMAND

Options:
  --api-key TEXT     API key (or set ANYSITE_API_KEY)
  --base-url TEXT    API base URL
  --debug            Enable debug output
  --no-color         Disable colored output
  --version, -v      Show version
  --help             Show help

Claude Code Skill

Install the anysite-cli skill for Claude Code to get AI-assisted data collection:

# Add marketplace
/plugin marketplace add https://github.com/anysiteio/agent-skills

# Install skill
/plugin install anysite-cli@anysite-skills

The skill gives Claude Code knowledge of all anysite commands, dataset pipeline configuration, and database operations.

Development

Setup

git clone https://github.com/anysiteio/anysite-cli.git
cd anysite-cli
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# With dataset + database support
pip install -e ".[dev,data]"

Run Tests

pytest
pytest --cov=anysite --cov-report=term-missing

Linting

ruff check src/
ruff format src/
mypy src/

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.23

Mar 13, 2026

0.3.22

Mar 13, 2026

0.3.21

Mar 12, 2026

0.3.20

Mar 9, 2026

0.3.19

Mar 9, 2026

0.3.17

Mar 5, 2026

0.3.16

Mar 5, 2026

0.3.15

Mar 4, 2026

0.3.14

Mar 4, 2026

0.3.13

Feb 16, 2026

0.3.12

Feb 16, 2026

0.3.11

Feb 16, 2026

0.3.10

Feb 15, 2026

0.3.9

Feb 15, 2026

0.3.8

Feb 15, 2026

0.3.7

Feb 15, 2026

0.3.6

Feb 15, 2026

0.3.5

Feb 14, 2026

0.3.4

Feb 12, 2026

0.3.3

Feb 12, 2026

0.3.2

Feb 8, 2026

0.3.1

Feb 6, 2026

0.3.0

Feb 6, 2026

0.2.0

Feb 6, 2026

0.1.12

Feb 3, 2026

This version

0.1.11

Feb 3, 2026

0.1.10

Feb 3, 2026

0.1.9

Feb 2, 2026

0.1.8

Feb 2, 2026

0.1.7

Feb 2, 2026

0.1.6

Feb 2, 2026

0.1.5

Feb 2, 2026

0.1.4

Feb 2, 2026

0.1.3

Feb 2, 2026

0.1.2

Feb 1, 2026

0.1.1

Feb 1, 2026

0.1.0

Feb 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anysite_cli-0.1.11.tar.gz (160.0 kB view details)

Uploaded Feb 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anysite_cli-0.1.11-py3-none-any.whl (125.7 kB view details)

Uploaded Feb 3, 2026 Python 3

File details

Details for the file anysite_cli-0.1.11.tar.gz.

File metadata

Download URL: anysite_cli-0.1.11.tar.gz
Upload date: Feb 3, 2026
Size: 160.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for anysite_cli-0.1.11.tar.gz
Algorithm	Hash digest
SHA256	`54cef6a0a2acd6f38aa8e52d5f5d82184d6240b0fa504ddc7ab6e084eaab2a06`
MD5	`7a7f04b771ddb08f4db5960870ceac85`
BLAKE2b-256	`2e35059104e0a8fb2768af640c7d458fad4c1e47889b2e2b5370eca07f21c53d`

See more details on using hashes here.

File details

Details for the file anysite_cli-0.1.11-py3-none-any.whl.

File metadata

Download URL: anysite_cli-0.1.11-py3-none-any.whl
Upload date: Feb 3, 2026
Size: 125.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for anysite_cli-0.1.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eab7b9602d7e1a21cdcd2f8981186393f60f6b49637d95e33bae3f53bbbd50a7`
MD5	`51ce4139dd9fb8764d0d58ea32c28806`
BLAKE2b-256	`600c9cc263b945844615ecde62920358a00d83b97ffe0a7fb930a184d310f9d6`

See more details on using hashes here.

anysite-cli 0.1.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Anysite CLI

Installation

Quick Start

1. Configure your API key

2. Update the schema cache

3. Make your first request

The api Command

Endpoint Discovery

Output Formats

Field Selection

Save to File

Pipe to jq

Batch Processing

Dataset Pipelines

Create a dataset

Collect, query, and load

Database

LLM Analysis

Setup

Commands

Cache

Configuration

Configuration Priority

Global Options

Claude Code Skill

Development

Setup

Run Tests

Linting

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

The `api` Command