Skip to main content

Parallel Tools: CLI and data enrichment utilities for the Parallel API

Project description

Parallel-Web-Tools

CLI and data enrichment utilities for the Parallel API.

Note: This package provides the parallel-cli command-line tool and data enrichment utilities in the parallel-web-tools package. It depends on parallel-web, the official Parallel Python SDK, but does not contain it. Install parallel-web separately if you need direct SDK access.

Features

  • CLI for Humans & AI Agents - Works interactively or fully via command-line arguments
  • Web Search - AI-powered search with domain filtering and date ranges
  • Content Extraction - Extract clean markdown from any URL
  • Data Enrichment - Enrich CSV, JSON, DuckDB, and BigQuery data with AI
  • AI-Assisted Planning - Use natural language to define what data you want
  • Multiple Integrations - Polars, DuckDB, Snowflake, BigQuery, Spark

Installation

Requires Python 3.10+.

Standalone CLI (Recommended)

Install the standalone parallel-cli binary for search, extract, enrichment, and deep research (no Python required):

curl -fsSL https://parallel.ai/install.sh | bash

This automatically detects your platform (macOS/Linux, x64/arm64) and installs to ~/.local/bin.

Note: The standalone binary supports search, extract, research, and enrich run with CLI arguments, CSV files, and JSON files. For YAML config files, interactive planner, DuckDB/BigQuery sources, or deployment commands, use pip install.

npm

npm install -g parallel-web-cli

This downloads the pre-built binary for your platform. No Python or Go required.

Python Package

For programmatic usage or additional features:

# Minimal CLI (search, extract, enrich with CLI args)
pip install parallel-web-tools

# + YAML config files and interactive planner
pip install parallel-web-tools[cli]

# + Data integrations
pip install parallel-web-tools[duckdb]       # DuckDB (includes cli, polars)
pip install parallel-web-tools[bigquery]     # BigQuery (includes cli)
pip install parallel-web-tools[spark]        # Apache Spark

# Full install with all features
pip install parallel-web-tools[all]

CLI Overview

parallel-cli
├── auth                    # Check authentication status
├── login                   # OAuth login (or use PARALLEL_API_KEY env var)
├── logout                  # Remove stored credentials
├── search                  # Web search
├── extract / fetch         # Extract content from URLs
├── research                # Deep research commands
│   ├── run                 # Run deep research on a question or topic
│   ├── status              # Check status of a research task
│   ├── poll                # Poll until completion
│   └── processors          # List available research processors
├── enrich                  # Data enrichment commands
│   ├── run                 # Run enrichment
│   ├── status              # Check status of a task group
│   ├── poll                # Poll until completion and collect results
│   ├── plan                # Create YAML config
│   ├── suggest             # AI suggests output columns
│   └── deploy              # Deploy to cloud systems (requires pip install)
├── findall                 # Web-scale entity discovery
│   ├── run                 # Discover entities matching a natural language objective
│   ├── ingest              # Preview the schema before running
│   ├── status              # Check status of a FindAll run
│   ├── poll                # Poll until completion
│   ├── result              # Fetch results of a completed run
│   └── cancel              # Cancel a running FindAll
└── monitor                 # Continuous web change tracking
    ├── create              # Create a new web monitor
    ├── list                # List all monitors
    ├── get                 # Get monitor details
    ├── update              # Update monitor configuration
    ├── delete              # Delete a monitor
    ├── events              # List events for a monitor
    ├── event-group         # Get event group details
    └── simulate            # Simulate webhook event for testing

Quick Start

1. Authenticate

# Interactive OAuth login
parallel-cli login

# Or set environment variable
export PARALLEL_API_KEY=your_api_key

2. Search the Web

# Natural language search
parallel-cli search "What is Anthropic's latest AI model?" --json

# Keyword search with filters
parallel-cli search -q "bitcoin price" --after-date 2026-01-01 --json

# Search specific domains
parallel-cli search "SEC filings for Apple" --include-domains sec.gov --json

3. Extract Content from URLs

# Extract content as markdown
parallel-cli extract https://example.com --json

# Extract with a specific focus
parallel-cli extract https://company.com --objective "Find pricing info" --json

# Get full page content
parallel-cli extract https://example.com --full-content --json

4. Enrich Data

# Let AI suggest what columns to add
parallel-cli enrich suggest "Find the CEO and annual revenue" --json

# Create a config file (interactive)
parallel-cli enrich plan -o config.yaml

# Create a config file (non-interactive, for AI agents)
parallel-cli enrich plan -o config.yaml \
    --source-type csv \
    --source companies.csv \
    --target enriched.csv \
    --source-columns '[{"name": "company", "description": "Company name"}]' \
    --intent "Find the CEO and annual revenue"

# Run enrichment from config
parallel-cli enrich run config.yaml

# Run enrichment directly (no config file needed)
parallel-cli enrich run \
    --source-type csv \
    --source companies.csv \
    --target enriched.csv \
    --source-columns '[{"name": "company", "description": "Company name"}]' \
    --intent "Find the CEO and annual revenue"

# Enrich a JSON file
parallel-cli enrich run \
    --source-type json \
    --source companies.json \
    --target enriched.json \
    --source-columns '[{"name": "company", "description": "Company name"}]' \
    --enriched-columns '[{"name": "ceo", "description": "CEO name"}]'

5. Deploy to Cloud Systems

# Deploy to BigQuery for SQL-native enrichment
parallel-cli enrich deploy --system bigquery --project my-gcp-project

Non-Interactive Mode (for AI Agents & Scripts)

All commands support --json output and can be fully controlled via CLI arguments.

Key patterns for agents

# Every command supports --json for structured output
parallel-cli search "query" --json
parallel-cli auth --json
parallel-cli research processors --json

# Read input from stdin with "-"
echo "What is the latest funding for Anthropic?" | parallel-cli search - --json
echo "Research question" | parallel-cli research run - --json

# Async: launch then poll separately
parallel-cli research run "question" --no-wait --json   # returns run_id
parallel-cli research status trun_xxx --json             # check status
parallel-cli research poll trun_xxx --json               # wait and get result

# Exit codes: 0=ok, 2=bad input, 3=auth error, 4=api error, 5=timeout

More examples

# Search with JSON output
parallel-cli search "query" --json

# Extract with JSON output
parallel-cli extract https://url.com --json

# Suggest columns with JSON output
parallel-cli enrich suggest "Find CEO" --json

# FindAll: discover entities
parallel-cli findall run "AI startups in healthcare" --json

# Monitor: track web changes
parallel-cli monitor create "Track Tesla SEC filings" --cadence daily --json

# Plan without prompts (provide all args)
parallel-cli enrich plan -o config.yaml \
    --source-type csv \
    --source input.csv \
    --target output.csv \
    --source-columns '[{"name": "company", "description": "Company name"}]' \
    --enriched-columns '[{"name": "ceo", "description": "CEO name"}]'

# Or use --intent to let AI determine the columns
parallel-cli enrich plan -o config.yaml \
    --source-type csv \
    --source input.csv \
    --target output.csv \
    --source-columns '[{"name": "company", "description": "Company name"}]' \
    --intent "Find CEO, revenue, and headquarters"

Integrations

Integration Type Install Documentation
Polars Python DataFrame pip install parallel-web-tools[polars] Setup Guide
DuckDB SQL + Python pip install parallel-web-tools[duckdb] Setup Guide
Snowflake SQL UDF pip install parallel-web-tools[snowflake] Setup Guide
BigQuery Cloud Function pip install parallel-web-tools[bigquery] Setup Guide
Spark SQL UDF pip install parallel-web-tools[spark] Demo Notebook

Quick Integration Examples

Polars:

import polars as pl
from parallel_web_tools.integrations.polars import parallel_enrich

df = pl.DataFrame({"company": ["Google", "Microsoft"]})
result = parallel_enrich(
    df,
    input_columns={"company_name": "company"},
    output_columns=["CEO name", "Founding year"],
)
print(result.result)

DuckDB:

import duckdb
from parallel_web_tools.integrations.duckdb import enrich_table

conn = duckdb.connect()
conn.execute("CREATE TABLE companies AS SELECT 'Google' as name")
result = enrich_table(
    conn,
    source_table="companies",
    input_columns={"company_name": "name"},
    output_columns=["CEO name", "Founding year"],
)
print(result.result.fetchdf())

Programmatic Usage

from parallel_web_tools import run_enrichment, run_enrichment_from_dict

# From YAML file
run_enrichment("config.yaml")

# From dictionary
run_enrichment_from_dict({
    "source": "data.csv",
    "target": "enriched.csv",
    "source_type": "csv",
    "source_columns": [{"name": "company", "description": "Company name"}],
    "enriched_columns": [{"name": "ceo", "description": "CEO name"}]
})

YAML Configuration Format

source: input.csv
target: output.csv
source_type: csv  # csv, json, duckdb, or bigquery
processor: core-fast  # lite, base, core, pro, ultra (add -fast for speed)

source_columns:
  - name: company_name
    description: The name of the company

enriched_columns:
  - name: ceo
    description: The CEO of the company
    type: str  # str, int, float, bool
  - name: revenue
    description: Annual revenue in USD
    type: float

Environment Variables

Variable Description
PARALLEL_API_KEY API key for authentication (alternative to parallel-cli login)
DUCKDB_FILE Default DuckDB file path
BIGQUERY_PROJECT Default BigQuery project ID

Related Packages

  • parallel-web - Official Parallel Python SDK (this package depends on it)

Development

git clone https://github.com/parallel-web/parallel-web-tools.git
cd parallel-web-tools
uv sync --all-extras
uv run pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parallel_web_tools-0.1.0rc1.tar.gz (74.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parallel_web_tools-0.1.0rc1-py3-none-any.whl (96.4 kB view details)

Uploaded Python 3

File details

Details for the file parallel_web_tools-0.1.0rc1.tar.gz.

File metadata

  • Download URL: parallel_web_tools-0.1.0rc1.tar.gz
  • Upload date:
  • Size: 74.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for parallel_web_tools-0.1.0rc1.tar.gz
Algorithm Hash digest
SHA256 6f60cbad4e307baf4ec49dfee94f5c8b4ded6aafd7b5b37d12fb78b3d2433411
MD5 daba8e151aa53a7acc7e593336424728
BLAKE2b-256 d8fb12a346bc5143e3436d2ac23b6273497ce4b9c475bc9c5f5b207e3598e909

See more details on using hashes here.

Provenance

The following attestation bundles were made for parallel_web_tools-0.1.0rc1.tar.gz:

Publisher: publish.yml on parallel-web/parallel-web-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file parallel_web_tools-0.1.0rc1-py3-none-any.whl.

File metadata

File hashes

Hashes for parallel_web_tools-0.1.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 a198a39db182838f6a04e3568d57fa100b35323954d03c4c03b89946420bad68
MD5 89819a301ca31119f46c053f68debf48
BLAKE2b-256 3ccfd28884b0b1bed621d27724d0f85d705b057838ef8126ba915ed1f96288e8

See more details on using hashes here.

Provenance

The following attestation bundles were made for parallel_web_tools-0.1.0rc1-py3-none-any.whl:

Publisher: publish.yml on parallel-web/parallel-web-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page