Skip to main content

AI-powered CLI for WordLift knowledge graph and SEO workflows.

Project description

worai

Command-line toolkit for WordLift operations and SEO checks. Pronunciation: "waw-RYE"

Docs: https://docs.wordlift.io/worai/

Install

  • One-line installer (macOS/Linux):
    • curl -fsSL https://raw.githubusercontent.com/wordlift/worai-install/main/install-worai.sh | bash
  • One-line installer (Windows PowerShell):
    • irm https://raw.githubusercontent.com/wordlift/worai-install/main/install-worai.ps1 | iex

After install:

  • The script automatically persists pipx/worai PATH entries in your shell profile when needed.
  • For immediate use in the current shell:
    • zsh: source ~/.zshrc
    • bash: source ~/.bashrc
  • Then run: worai --help

Verify script before running (recommended in stricter environments):

  • macOS/Linux:
    • curl -fsSL -o /tmp/install-worai.sh https://raw.githubusercontent.com/wordlift/worai-install/main/install-worai.sh && less /tmp/install-worai.sh && bash /tmp/install-worai.sh
  • Windows PowerShell:
    • irm https://raw.githubusercontent.com/wordlift/worai-install/main/install-worai.ps1 -OutFile $env:TEMP\install-worai.ps1; notepad $env:TEMP\install-worai.ps1; & $env:TEMP\install-worai.ps1

Manual install:

  • pipx install worai
  • pip install worai

Full docs: https://docs.wordlift.io/worai/

Runtime dependency note:

  • wordlift-sdk>=8.0.2,<9.0.0 (installed automatically by pip)
  • copier (required by worai graph sync create, installed automatically by pip)

If you plan to run seocheck, install Playwright browsers:

  • playwright install chromium

Optional output format extras:

  • pip install 'worai[parquet]' — enables --format parquet (Apache Parquet via pyarrow)
  • pip install 'worai[gsheets]' — enables --format gsheets (write to Google Sheets via gspread)

Quick Start

  • worai --help
  • worai seocheck https://example.com/sitemap.xml
  • worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json
  • worai canonicals dedupe --input pages_with_titles.csv --site sc-domain:example.com --service-account ./service-account.json
  • worai <command> --help

Configuration

Config file (TOML) discovery order:

  • --config
  • WORAI_CONFIG
  • nearest worai.toml from current directory upward (for example ./worai.toml, ../worai.toml, ../../worai.toml)
  • ~/.config/worai/config.toml
  • ~/.worai.toml

Profiles:

  • [profiles.<name>] with --profile or WORAI_PROFILE

Common keys:

  • profiles.<name>.api_key
  • log_level (global default logging level: debug|info|warning|error)
  • profiles.<name>.log_level (profile-specific override for root logging level)
  • profiles._base.log_level (shared profile fallback when selected profile has no log_level)
  • profiles.<name>.mapping (SDK profile contract)
  • profiles.<name>.gsc_site_id (GSC property for commands that query Search Console)
  • profiles.<name>.oauth.client_secrets (OAuth Desktop app client file)
  • profiles.<name>.oauth.token (shared OAuth token file path)
  • profiles.<name>.oauth.service_account (service account credential as file path or inline JSON)
  • profiles.<name>.ga_property_id (preferred GA4 property key for analytics; ga.id remains supported)
  • profiles.<name>.canonicals.output (supports {profile}, {date}, {seq} interpolation)
  • profiles.<name>.canonicals.interval
  • profiles.<name>.canonicals.concurrency
  • profiles.<name>.canonicals.request_timeout_sec
  • one source per profile (urls, sitemap_url, or sheets_url + sheets_name + sheets_service_account) for SDK profile validity
  • postprocessor_runtime (graph sync runtime: oneshot or persistent; profile override supported)
  • graph_write_strategy (graph sync write mode: patch or put; defaults to patch; profile override supported)
  • ingest.source (auto|urls|sitemap|sheets|local)
  • ingest.loader (auto|simple|proxy|playwright|premium_scraper|web_scrape_api|passthrough)
  • ingest.passthrough_when_html (default: true)
  • ingest.timeout_ms (optional override; SDK default: 30000)
  • ingest.playwright_wait_until (optional override; SDK default: domcontentloaded)
  • command-specific OAuth/GSC/GA options should be passed via CLI flags or environment variables.

Supported environment variables:

  • WORAI_CONFIG — path to a config TOML file (overrides discovery order).
  • WORAI_PROFILE — profile name under [profiles.<name>].
  • WORAI_LOG_LEVEL — default log level (debug|info|warning|error).
  • WORAI_LOG_FORMAT — default log format (text|json).
  • WORDLIFT_API_KEY — WordLift API key for entity operations.
  • GSC_CLIENT_SECRETS — path to OAuth client secrets JSON for GSC.
  • GSC_ID — GSC property URL.
  • OAUTH_TOKEN — path to store the shared OAuth token (GSC + GA).
  • GSC_OUTPUT — default output CSV path for GSC export.
  • GA_ID — GA4 property ID for Analytics sections.
  • GSC_TOKEN / GA_TOKEN — legacy aliases for OAUTH_TOKEN (must point to the same file if used).
  • WORAI_DISABLE_UPDATE_CHECK — set to 1|true|yes|on to disable startup update checks.

.env support:

  • worai loads .env from the current working directory (and parent lookup) at startup.
  • values from .env are treated as environment variables.
  • existing environment variables take precedence over .env values.

Logging level precedence:

  • --log-level (highest)
  • WORAI_LOG_LEVEL
  • profiles.<name>.log_level in worai.toml (when a profile is selected)
  • profiles._base.log_level in worai.toml (when a profile is selected and no profile-specific value is set)
  • global log_level in worai.toml
  • info (default)
  • Selected level configures root logging; dependencies may still emit their own INFO logs when they set explicit logger levels.
  • For graph sync run, the effective level is forwarded to morph-kgc as logging_level (DEBUG|INFO|WARNING|ERROR|CRITICAL).

Example environment setup:

export WORDLIFT_API_KEY="wl_..."
export WORAI_CONFIG="~/worai.toml"
export WORAI_PROFILE="dev"
export GSC_CLIENT_SECRETS="~/client_secrets.json"
export OAUTH_TOKEN="~/oauth_token.json"

Example worai.toml:

[profiles.default]
api_key = "${WORDLIFT_API_KEY}"
mapping = "default.yarrrml"
sitemap_url = "https://example.com/sitemap.xml"
ingest_loader = "web_scrape_api"

Ingestion profile examples:

[profiles.inventory_local]
api_key = "${WORDLIFT_API_KEY}"
mapping = "default.yarrrml"
urls = ["https://example.com/page"]
ingest_source = "local"
ingest_loader = "passthrough"

[profiles.inventory_remote]
api_key = "${WORDLIFT_API_KEY}"
mapping = "default.yarrrml"
sitemap_url = "https://example.com/sitemap.xml"
ingest_source = "sitemap"
ingest_loader = "web_scrape_api"

[profiles.graph_sync_proxy]
api_key = "${WORDLIFT_API_KEY}"
mapping = "default.yarrrml"
urls = ["https://example.com/a", "https://example.com/b"]
ingest_source = "urls"
ingest_loader = "proxy"
ingest_timeout_ms = 30000
playwright_wait_until = "domcontentloaded"

Commands

Full docs: https://docs.wordlift.io/worai/

  • seocheck — run SEO checks for sitemap URLs and URL lists.
  • google-search-console — export GSC page metrics (7d/28d/3m) in table, JSON, CSV, TSV, Parquet, or Google Sheets format.
  • canonicals dedupe — dedupe canonical URLs by title using GSC impressions.
  • dedupe — deduplicate WordLift entities by schema:url.
  • entity-matrix — build a URL × entity-type pivot table from a graph file.
  • canonicalize-duplicate-pages — select canonical URLs using GSC KPIs.
  • delete-entities-from-csv — delete entities listed in a CSV.
  • find-faq-page-wrong-type — find and patch FAQPage typing issues.
  • find-missing-names — find entities missing schema:name/headline.
  • find-url-by-type — list schema:url values by type from RDF.
  • graph — run graph-specific workflows (sync, create, export, validate, property delete, audit, reset).
  • list-entities-outside-dataset — list entity IRIs that fall outside the account dataset.
  • link-groups — build or apply LinkGroup data from CSV.
  • patch — patch entities from RDF.
  • structured-data — generate JSON-LD/YARRRML mappings or materialize RDF from YARRRML.
  • agent — launch codex/claude/gemini with worai MCP + skill guidance.
  • web-pages — run ingestion-backed web page workflows.
  • validate — deprecated JSON-LD validator command (use graph validate for RDF files/URLs; use structured-data validate page for webpage URLs).
  • self update — check for new worai versions and optionally run the upgrade command.
  • upload-entities-from-turtle — upload .ttl files with resume.
  • dil-import - upload DILs from a CSV file.

Command help:

  • worai <command> --help

Autocompletion:

  • worai --install-completion
  • worai --show-completion

Updates:

  • worai checks for new versions periodically and prints a non-blocking notice when an update is available.
  • run worai self update to check manually and see/apply the suggested upgrade command.

Examples

seocheck

  • worai seocheck https://example.com/sitemap.xml
  • worai seocheck https://example.com/sitemap.xml --output-dir ./seocheck-report --save-html
  • worai seocheck https://example.com/sitemap.xml --output-dir ./seocheck-report --no-open-report
  • worai seocheck https://example.com/sitemap.xml --user-agent "Mozilla/5.0 ..."
  • worai seocheck https://example.com/sitemap.xml --sitemap-fetch-mode browser
  • worai seocheck https://example.com/sitemap.xml --no-report-ui
  • worai seocheck https://example.com/sitemap.xml --recheck-failed --recheck-from ./seocheck-report

google-search-console

  • worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json
    • Writes to gsc_pages.csv by default (CSV format).
  • worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json --format table
    • Prints a rich table to stdout.
  • worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json --format json
    • Prints a JSON array to stdout.
  • worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json --format tsv --output gsc.tsv
  • worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json --format parquet --output gsc.parquet
    • Requires pip install 'worai[parquet]'.
  • worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json --output custom.csv
    • Uses OAuth redirect port 8080 by default.

canonicals dedupe

  • worai canonicals dedupe --input pages_with_titles.csv --site sc-domain:example.com --service-account ./service-account.json
  • worai canonicals dedupe --input pages_with_titles.csv --site sc-domain:example.com --token oauth_token.json

seoreport (with Analytics)

  • worai seoreport --site sc-domain:example.com --ga-id 123456789 --format html

canonicalize-duplicate-pages

  • worai canonicalize-duplicate-pages --input gsc_pages.csv --output canonical_targets.csv --kpi-window 28d --kpi-metric clicks
  • worai canonicalize-duplicate-pages --input gsc_pages.csv --entity-type Product

dedupe

  • worai dedupe --dry-run

entity-matrix

  • worai entity-matrix graph.ttl
  • worai entity-matrix graph.ttl --exclude-type WebPage --exclude-type BreadcrumbList --format csv --output matrix.csv
  • worai entity-matrix graph.ttl --cluster --format tsv

find-faq-page-wrong-type

  • worai find-faq-page-wrong-type ./data.ttl --dry-run --replace-type
  • worai find-faq-page-wrong-type ./data.ttl --patch --replace-type

find-missing-names

  • worai find-missing-names ./data.ttl

find-url-by-type

  • worai find-url-by-type ./data.ttl schema:Service schema:Product

link-groups

  • worai link-groups ./links.csv --format turtle
  • worai link-groups ./links.csv --apply --dry-run --concurrency 4

graph

  • worai --config ./worai.toml --profile acme graph sync run
  • worai --profile acme graph sync run --debug
  • worai graph sync create ./acme-graph
  • worai graph sync create ./acme-graph --template ./graph-sync-template --defaults
  • worai graph sync create ./acme-graph --data-file ./answers.yml --non-interactive
  • worai graph sync create ./acme-graph --vcs-ref v1.2.3
  • worai graph export
  • worai --profile acme graph export
  • worai --profile acme graph export ./acme-export.jsonld
  • worai graph export ./acme-export.ttl --validate
  • worai graph validate ./graph.ttl ./graph.jsonld --builtin-shape google-required --level warning --format text
  • worai graph property delete seovoc:html --dry-run
  • worai graph property delete https://w3id.org/seovoc/html --yes --workers 4
  • worai graph audit ./acme-export.ttl
  • worai graph audit ./acme-export.ttl --format json --show-url-violations
  • worai graph audit ./acme-export.ttl --rich-snippets-granularity entities --issue-level error
  • worai --profile acme graph reset --yes
  • worai --profile acme graph reset --keep-country --keep-language
    • graph export reads API key from worai.toml profile (root --profile, then WORAI_PROFILE, then default) and calls /dataset/export.
    • graph export output format is inferred from extension: .ttl, .nt, .nq, .rdf/.xml, .jsonld/.json.
    • graph export default filename: export_<profile>_<yyyyMMdd>_<seq>.ttl (sequence starts at 1).
    • graph export --validate runs SHACL validation on the exported file and fails on SHACL errors/warnings.
    • graph validate accepts one or more local files or URLs and supports shape composition with:
      • --builtin-shape <name>
      • --exclude-builtin-shape <name>
      • --shape <file-or-url>
    • graph validate --level warning|error controls failure threshold; --format text|json controls output.
    • graph property delete sends X-include-Private: true by default for both GraphQL match discovery and entity PATCH requests.
    • graph sync create runs Copier in trusted mode by default so template _tasks execute.
    • graph sync run profile resolution is: root --profile, then WORAI_PROFILE, then default.
    • Mapping docs (for [profiles.<name>]): docs/graph-sync-mappings-reference.md, docs/graph-sync-mappings-guide.md, docs/graph-sync-mappings-examples.md
    • Internal template-agent workflow docs: specs/graph-sync/AGENTS.md, specs/graph-sync/INDEX.md, specs/graph-sync/developer-agent-workflow.md
    • Profile loading standard for non-sync commands: specs/profile-loading-standard.md
    • Configure exactly one source mode per run: urls, sitemap_url (+ optional pattern), or sheets_url + sheets_name.
    • With wordlift-sdk>=6.10.0, Playwright-backed ingestion defaults are owned by the SDK: INGEST_TIMEOUT_MS=30000 and PLAYWRIGHT_WAIT_UNTIL="domcontentloaded".
    • web_page_import_timeout remains supported for graph sync as a legacy seconds-based alias.
    • SDK 6 defaults to persistent postprocessor runtime.
    • set postprocessor_runtime = "oneshot" in worai.toml to keep old one-process-per-callback behavior.
    • SDK wordlift-sdk 5.1.1+ postprocessor context migration:
      • context.settings -> context.profile (for example context.profile["settings"]["api_url"])
      • context.account.key -> context.account_key
      • context.account remains the clean /me account object
    • SDK 6 ingestion uses explicit keys:
      • INGEST_SOURCE (urls|sitemap|sheets|local|auto)
      • INGEST_LOADER (web_scrape_api|proxy|premium_scraper|playwright|simple|passthrough|auto)
      • INGEST_TIMEOUT_MS (milliseconds)
      • PLAYWRIGHT_WAIT_UNTIL (domcontentloaded|load|networkidle) when explicitly configured
    • SDK 6 migration deprecates integration use of WEB_PAGE_IMPORT_MODE and WEB_PAGE_IMPORT_TIMEOUT.
    • graph sync run uses run_cloud_workflow and emits per-graph progress and final KPI summaries through CLI logs (on_info, on_progress, on_kpi).
    • graph sync run --debug writes SDK callback artifacts under output/debug_cloud/<profile>/ from the current working directory:
      • static_templates.ttl
      • cloud_<sha256(url)>.ttl for each callback URL.
    • SHACL validation settings mapping for SDK 6.2+:
      • use shacl_validate_mode = "warn"|"fail"|"off"
      • use shacl_builtin_shapes, shacl_exclude_builtin_shapes, shacl_extra_shapes
      • shacl_validate_sync and shacl_shape_specs are no longer supported

patch

  • worai patch ./data.ttl --dry-run --add-types

structured-data

  • worai structured-data create https://example.com/article Review --output-dir ./structured-data
  • worai structured-data create https://example.com/article --type Review --output-dir ./structured-data
  • worai structured-data create https://example.com/article --type Review --debug
  • worai structured-data create https://example.com/article --type Review --max-xhtml-chars 40000 --max-nesting-depth 2
  • worai structured-data generate https://example.com/sitemap.xml --yarrrml ./mapping.yarrrml --output-dir ./out
  • worai structured-data generate https://example.com/page --yarrrml ./mapping.yarrrml --format jsonld
  • worai structured-data inventory https://example.com/sitemap.xml --output ./structured-data-inventory.csv
  • worai structured-data inventory ./urls.txt --output ./structured-data-inventory.csv
  • worai structured-data inventory https://docs.google.com/spreadsheets/d/<id>/edit --sheet-name URLs_US --output ./structured-data-inventory.csv
  • worai structured-data inventory https://example.com/sitemap.xml --destination-sheet-id <spreadsheet_id> --destination-sheet-name Inventory
  • worai structured-data inventory https://example.com/sitemap.xml --output ./structured-data-inventory.csv --concurrency auto
  • worai structured-data inventory https://example.com/sitemap.xml --url-regex "/blog/" --output ./structured-data-inventory.csv
  • worai structured-data inventory /path/to/debug_cloud/us --source-type debug-cloud --output ./structured-data-inventory.csv
  • worai structured-data inventory /path/to/debug_cloud/us --ingest-source local --ingest-loader passthrough --output ./structured-data-inventory.csv
  • worai structured-data inventory https://example.com/sitemap.xml --ingest-loader web_scrape_api --output ./structured-data-inventory.csv

agent

  • worai agent --agent-cli codex
  • worai agent --agent-cli codex -- --yolo --search
  • worai agent --agent-cli claude --profile acme
  • worai agent --agent-cli gemini --config ./worai.toml --profile acme
  • worai agent mcp serve --profile acme

web-pages

  • worai web-pages classify-types https://example.com/sitemap.xml --ingest-source sitemap --ingest-loader playwright --url-regex "/blog/" --output ./types.csv
  • worai web-pages classify-types ./urls.txt --ingest-source urls --output ./types.csv
  • worai web-pages classify-types https://docs.google.com/spreadsheets/d/<id>/edit --ingest-source sheets --sheet-name URLs --service-account ./service-account.json --output ./types.csv
  • worai web-pages classify-types https://example.com/sitemap.xml --ingest-source sitemap --output ./types.csv --yes (skip credit-consumption confirmation)
  • worai web-pages classify-types https://example.com/sitemap.xml --ingest-source sitemap --output ./types.csv --no-resume --yes (force a full rerun instead of resuming)

validate

  • worai graph validate ./data.jsonld --builtin-shape review-snippet --shape ./custom.ttl --level warning --format json
  • worai validate jsonld --shape review-snippet --shape schema-review ./data.jsonld
  • worai validate jsonld --format raw https://api.wordlift.io/data/example.jsonld
  • worai structured-data validate page https://example.com/article --shape review-snippet

self update

  • worai self update --check-only
  • worai self update --yes

upload-entities-from-turtle

  • worai upload-entities-from-turtle ./entities --recursive --limit 50

list-entities-outside-dataset

  • worai list-entities-outside-dataset
  • worai --profile acme list-entities-outside-dataset
  • worai --profile acme list-entities-outside-dataset --limit 100
  • worai --profile acme list-entities-outside-dataset --dataset-uri https://data.example.com/

dil-import

  • worai dil-import <wordlift_key> <path_to_csv_file>

Troubleshooting

  • Playwright missing browsers:
    • playwright install chromium
  • YARRRML conversion:
    • npm install -g @rmlio/yarrrml-parser
  • RML execution:
    • morph-kgc is included in project dependencies
  • Dependency notes:
    • Common runtime libs (e.g., requests, rdflib, tqdm, advertools, Google auth helpers) are provided transitively by wordlift-sdk.
    • --format parquet requires pip install 'worai[parquet]' (pyarrow>=14).
    • --format gsheets requires pip install 'worai[gsheets]' (gspread>=6).
  • OAuth token issues:
    • Remove the token file and re-run worai google-search-console or worai canonicals dedupe.
    • If you are prompted to re-auth every run, delete the token file to force a new consent flow that includes a refresh token.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

worai-6.17.6.tar.gz (211.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

worai-6.17.6-py3-none-any.whl (174.2 kB view details)

Uploaded Python 3

File details

Details for the file worai-6.17.6.tar.gz.

File metadata

  • Download URL: worai-6.17.6.tar.gz
  • Upload date:
  • Size: 211.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for worai-6.17.6.tar.gz
Algorithm Hash digest
SHA256 0f5f722c852513cd95f774672449741f024c5052488353ba7425524b61b31555
MD5 e81f2d24c4f6f9f5ea8f475a7b5bdc0b
BLAKE2b-256 b4c4d631f4f81e0f04bb86a7e6204be83f289b5cc6aa75ddd36fe334332d83f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for worai-6.17.6.tar.gz:

Publisher: publish.yml on wordlift/worai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file worai-6.17.6-py3-none-any.whl.

File metadata

  • Download URL: worai-6.17.6-py3-none-any.whl
  • Upload date:
  • Size: 174.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for worai-6.17.6-py3-none-any.whl
Algorithm Hash digest
SHA256 0f51439cc13c5da341b0c88d5647f599d5cf910577f23b40c92eb083a8f37987
MD5 3485318857998cdf886bdd5a65429c34
BLAKE2b-256 36b7efc8c6061c76b39cd3f80644e7d097cc280f9af3a699686a5888b0761c77

See more details on using hashes here.

Provenance

The following attestation bundles were made for worai-6.17.6-py3-none-any.whl:

Publisher: publish.yml on wordlift/worai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page