Skip to main content

AI-powered CLI for WordLift knowledge graph and SEO workflows.

Project description

worai

Command-line toolkit for WordLift operations and SEO checks. Pronunciation: "waw-RYE"

Docs: https://docs.wordlift.io/worai/

Install

  • pipx install worai
  • pip install worai

Full docs: https://docs.wordlift.io/worai/

Runtime dependency note:

  • wordlift-sdk>=6.10.0,<7.0.0 (installed automatically by pip)
  • copier (required by worai graph sync create, installed automatically by pip)

If you plan to run seocheck, install Playwright browsers:

  • playwright install chromium

Quick Start

  • worai --help
  • worai seocheck https://example.com/sitemap.xml
  • worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json
  • worai canonicals dedupe --input pages_with_titles.csv --site sc-domain:example.com --service-account ./service-account.json
  • worai <command> --help

Configuration

Config file (TOML) discovery order:

  • --config
  • WORAI_CONFIG
  • nearest worai.toml from current directory upward (for example ./worai.toml, ../worai.toml, ../../worai.toml)
  • ~/.config/worai/config.toml
  • ~/.worai.toml

Profiles:

  • [profiles.<name>] with --profile or WORAI_PROFILE

Common keys:

  • profiles.<name>.api_key
  • log_level (global default logging level: debug|info|warning|error)
  • profiles.<name>.log_level (profile-specific override for root logging level)
  • profiles._base.log_level (shared profile fallback when selected profile has no log_level)
  • profiles.<name>.mapping (SDK profile contract)
  • profiles.<name>.gsc_site_id (GSC property for commands that query Search Console)
  • profiles.<name>.oauth.client_secrets (OAuth Desktop app client file)
  • profiles.<name>.oauth.token (shared OAuth token file path)
  • profiles.<name>.oauth.service_account (service account credential as file path or inline JSON)
  • profiles.<name>.ga_property_id (preferred GA4 property key for analytics; ga.id remains supported)
  • profiles.<name>.canonicals.output (supports {profile}, {date}, {seq} interpolation)
  • profiles.<name>.canonicals.interval
  • profiles.<name>.canonicals.concurrency
  • profiles.<name>.canonicals.request_timeout_sec
  • one source per profile (urls, sitemap_url, or sheets_url + sheets_name + sheets_service_account) for SDK profile validity
  • postprocessor_runtime (graph sync runtime: oneshot or persistent; profile override supported)
  • ingest.source (auto|urls|sitemap|sheets|local)
  • ingest.loader (auto|simple|proxy|playwright|premium_scraper|web_scrape_api|passthrough)
  • ingest.passthrough_when_html (default: true)
  • ingest.timeout_ms (optional override; SDK default: 30000)
  • ingest.playwright_wait_until (optional override; SDK default: domcontentloaded)
  • command-specific OAuth/GSC/GA options should be passed via CLI flags or environment variables.

Supported environment variables:

  • WORAI_CONFIG — path to a config TOML file (overrides discovery order).
  • WORAI_PROFILE — profile name under [profiles.<name>].
  • WORAI_LOG_LEVEL — default log level (debug|info|warning|error).
  • WORAI_LOG_FORMAT — default log format (text|json).
  • WORDLIFT_API_KEY — WordLift API key for entity operations.
  • GSC_CLIENT_SECRETS — path to OAuth client secrets JSON for GSC.
  • GSC_ID — GSC property URL.
  • OAUTH_TOKEN — path to store the shared OAuth token (GSC + GA).
  • GSC_OUTPUT — default output CSV path for GSC export.
  • GA_ID — GA4 property ID for Analytics sections.
  • GSC_TOKEN / GA_TOKEN — legacy aliases for OAUTH_TOKEN (must point to the same file if used).
  • WORAI_DISABLE_UPDATE_CHECK — set to 1|true|yes|on to disable startup update checks.

.env support:

  • worai loads .env from the current working directory (and parent lookup) at startup.
  • values from .env are treated as environment variables.
  • existing environment variables take precedence over .env values.

Logging level precedence:

  • --log-level (highest)
  • WORAI_LOG_LEVEL
  • profiles.<name>.log_level in worai.toml (when a profile is selected)
  • profiles._base.log_level in worai.toml (when a profile is selected and no profile-specific value is set)
  • global log_level in worai.toml
  • info (default)
  • Selected level is enforced on both root logger and active handlers, so dependency INFO logs are suppressed when using warning or error.

Example environment setup:

export WORDLIFT_API_KEY="wl_..."
export WORAI_CONFIG="~/worai.toml"
export WORAI_PROFILE="dev"
export GSC_CLIENT_SECRETS="~/client_secrets.json"
export OAUTH_TOKEN="~/oauth_token.json"

Example worai.toml:

[profiles.default]
api_key = "${WORDLIFT_API_KEY}"
mapping = "default.yarrrml"
sitemap_url = "https://example.com/sitemap.xml"
ingest_loader = "web_scrape_api"

Ingestion profile examples:

[profiles.inventory_local]
api_key = "${WORDLIFT_API_KEY}"
mapping = "default.yarrrml"
urls = ["https://example.com/page"]
ingest_source = "local"
ingest_loader = "passthrough"

[profiles.inventory_remote]
api_key = "${WORDLIFT_API_KEY}"
mapping = "default.yarrrml"
sitemap_url = "https://example.com/sitemap.xml"
ingest_source = "sitemap"
ingest_loader = "web_scrape_api"

[profiles.graph_sync_proxy]
api_key = "${WORDLIFT_API_KEY}"
mapping = "default.yarrrml"
urls = ["https://example.com/a", "https://example.com/b"]
ingest_source = "urls"
ingest_loader = "proxy"
ingest_timeout_ms = 30000
playwright_wait_until = "domcontentloaded"

Commands

Full docs: https://docs.wordlift.io/worai/

  • seocheck — run SEO checks for sitemap URLs and URL lists.
  • google-search-console — export GSC page metrics as CSV.
  • canonicals dedupe — dedupe canonical URLs by title using GSC impressions.
  • dedupe — deduplicate WordLift entities by schema:url.
  • canonicalize-duplicate-pages — select canonical URLs using GSC KPIs.
  • delete-entities-from-csv — delete entities listed in a CSV.
  • find-faq-page-wrong-type — find and patch FAQPage typing issues.
  • find-missing-names — find entities missing schema:name/headline.
  • find-url-by-type — list schema:url values by type from RDF.
  • graph — run graph-specific workflows.
  • link-groups — build or apply LinkGroup data from CSV.
  • patch — patch entities from RDF.
  • structured-data — generate JSON-LD/YARRRML mappings or materialize RDF from YARRRML.
  • agent — launch codex/claude/gemini with worai MCP + skill guidance.
  • web-pages — run ingestion-backed web page workflows.
  • validate — deprecated JSON-LD validator command (use graph validate for RDF files/URLs; use structured-data validate page for webpage URLs).
  • self update — check for new worai versions and optionally run the upgrade command.
  • upload-entities-from-turtle — upload .ttl files with resume.
  • dil-import - upload DILs from a CSV file.

Command help:

  • worai <command> --help

Autocompletion:

  • worai --install-completion
  • worai --show-completion

Updates:

  • worai checks for new versions periodically and prints a non-blocking notice when an update is available.
  • run worai self update to check manually and see/apply the suggested upgrade command.

Examples

seocheck

  • worai seocheck https://example.com/sitemap.xml
  • worai seocheck https://example.com/sitemap.xml --output-dir ./seocheck-report --save-html
  • worai seocheck https://example.com/sitemap.xml --output-dir ./seocheck-report --no-open-report
  • worai seocheck https://example.com/sitemap.xml --user-agent "Mozilla/5.0 ..."
  • worai seocheck https://example.com/sitemap.xml --sitemap-fetch-mode browser
  • worai seocheck https://example.com/sitemap.xml --no-report-ui
  • worai seocheck https://example.com/sitemap.xml --recheck-failed --recheck-from ./seocheck-report

google-search-console

  • worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json
    • Uses OAuth redirect port 8080 by default.

canonicals dedupe

  • worai canonicals dedupe --input pages_with_titles.csv --site sc-domain:example.com --service-account ./service-account.json
  • worai canonicals dedupe --input pages_with_titles.csv --site sc-domain:example.com --token oauth_token.json

seoreport (with Analytics)

  • worai seoreport --site sc-domain:example.com --ga-id 123456789 --format html

canonicalize-duplicate-pages

  • worai canonicalize-duplicate-pages --input gsc_pages.csv --output canonical_targets.csv --kpi-window 28d --kpi-metric clicks
  • worai canonicalize-duplicate-pages --input gsc_pages.csv --entity-type Product

dedupe

  • worai dedupe --dry-run

find-faq-page-wrong-type

  • worai find-faq-page-wrong-type ./data.ttl --dry-run --replace-type
  • worai find-faq-page-wrong-type ./data.ttl --patch --replace-type

find-missing-names

  • worai find-missing-names ./data.ttl

find-url-by-type

  • worai find-url-by-type ./data.ttl schema:Service schema:Product

link-groups

  • worai link-groups ./links.csv --format turtle
  • worai link-groups ./links.csv --apply --dry-run --concurrency 4

graph

  • worai --config ./worai.toml --profile acme graph sync run
  • worai --profile acme graph sync run --debug
  • worai graph sync create ./acme-graph
  • worai graph sync create ./acme-graph --template ./graph-sync-template --defaults
  • worai graph sync create ./acme-graph --data-file ./answers.yml --non-interactive
  • worai graph sync create ./acme-graph --vcs-ref v1.2.3
  • worai graph export
  • worai --profile acme graph export
  • worai --profile acme graph export ./acme-export.jsonld
  • worai graph export ./acme-export.ttl --validate
  • worai graph validate ./graph.ttl ./graph.jsonld --builtin-shape google-required --level warning --format text
  • worai graph property delete seovoc:html --dry-run
  • worai graph property delete https://w3id.org/seovoc/html --yes --workers 4
    • graph export reads API key from worai.toml profile (root --profile, then WORAI_PROFILE, then default) and calls /dataset/export.
    • graph export output format is inferred from extension: .ttl, .nt, .nq, .rdf/.xml, .jsonld/.json.
    • graph export default filename: export_<profile>_<yyyyMMdd>_<seq>.ttl (sequence starts at 1).
    • graph export --validate runs SHACL validation on the exported file and fails on SHACL errors/warnings.
    • graph validate accepts one or more local files or URLs and supports shape composition with:
      • --builtin-shape <name>
      • --exclude-builtin-shape <name>
      • --shape <file-or-url>
    • graph validate --level warning|error controls failure threshold; --format text|json controls output.
    • graph property delete sends X-include-Private: true by default for both GraphQL match discovery and entity PATCH requests.
    • graph sync create runs Copier in trusted mode by default so template _tasks execute.
    • graph sync run profile resolution is: root --profile, then WORAI_PROFILE, then default.
    • Mapping docs (for [profiles.<name>]): docs/graph-sync-mappings-reference.md, docs/graph-sync-mappings-guide.md, docs/graph-sync-mappings-examples.md
    • Internal template-agent workflow docs: specs/graph-sync/AGENTS.md, specs/graph-sync/INDEX.md, specs/graph-sync/developer-agent-workflow.md
    • Profile loading standard for non-sync commands: specs/profile-loading-standard.md
    • Configure exactly one source mode per run: urls, sitemap_url (+ optional pattern), or sheets_url + sheets_name.
    • With wordlift-sdk>=6.10.0, Playwright-backed ingestion defaults are owned by the SDK: INGEST_TIMEOUT_MS=30000 and PLAYWRIGHT_WAIT_UNTIL="domcontentloaded".
    • web_page_import_timeout remains supported for graph sync as a legacy seconds-based alias.
    • SDK 6 defaults to persistent postprocessor runtime.
    • set postprocessor_runtime = "oneshot" in worai.toml to keep old one-process-per-callback behavior.
    • SDK wordlift-sdk 5.1.1+ postprocessor context migration:
      • context.settings -> context.profile (for example context.profile["settings"]["api_url"])
      • context.account.key -> context.account_key
      • context.account remains the clean /me account object
    • SDK 6 ingestion uses explicit keys:
      • INGEST_SOURCE (urls|sitemap|sheets|local|auto)
      • INGEST_LOADER (web_scrape_api|proxy|premium_scraper|playwright|simple|passthrough|auto)
      • INGEST_TIMEOUT_MS (milliseconds)
      • PLAYWRIGHT_WAIT_UNTIL (domcontentloaded|load|networkidle) when explicitly configured
    • SDK 6 migration deprecates integration use of WEB_PAGE_IMPORT_MODE and WEB_PAGE_IMPORT_TIMEOUT.
    • graph sync run uses run_cloud_workflow and emits per-graph progress and final KPI summaries through CLI logs (on_info, on_progress, on_kpi).
    • graph sync run --debug writes SDK callback artifacts under output/debug_cloud/<profile>/ from the current working directory:
      • static_templates.ttl
      • cloud_<sha256(url)>.ttl for each callback URL.
    • SHACL validation settings mapping for SDK 6.2+:
      • use shacl_validate_mode = "warn"|"fail"|"off"
      • use shacl_builtin_shapes, shacl_exclude_builtin_shapes, shacl_extra_shapes
      • shacl_validate_sync and shacl_shape_specs are no longer supported

patch

  • worai patch ./data.ttl --dry-run --add-types

structured-data

  • worai structured-data create https://example.com/article Review --output-dir ./structured-data
  • worai structured-data create https://example.com/article --type Review --output-dir ./structured-data
  • worai structured-data create https://example.com/article --type Review --debug
  • worai structured-data create https://example.com/article --type Review --max-xhtml-chars 40000 --max-nesting-depth 2
  • worai structured-data generate https://example.com/sitemap.xml --yarrrml ./mapping.yarrrml --output-dir ./out
  • worai structured-data generate https://example.com/page --yarrrml ./mapping.yarrrml --format jsonld
  • worai structured-data inventory https://example.com/sitemap.xml --output ./structured-data-inventory.csv
  • worai structured-data inventory ./urls.txt --output ./structured-data-inventory.csv
  • worai structured-data inventory https://docs.google.com/spreadsheets/d/<id>/edit --sheet-name URLs_US --output ./structured-data-inventory.csv
  • worai structured-data inventory https://example.com/sitemap.xml --destination-sheet-id <spreadsheet_id> --destination-sheet-name Inventory
  • worai structured-data inventory https://example.com/sitemap.xml --output ./structured-data-inventory.csv --concurrency auto
  • worai structured-data inventory https://example.com/sitemap.xml --url-regex "/blog/" --output ./structured-data-inventory.csv
  • worai structured-data inventory /path/to/debug_cloud/us --source-type debug-cloud --output ./structured-data-inventory.csv
  • worai structured-data inventory /path/to/debug_cloud/us --ingest-source local --ingest-loader passthrough --output ./structured-data-inventory.csv
  • worai structured-data inventory https://example.com/sitemap.xml --ingest-loader web_scrape_api --output ./structured-data-inventory.csv

agent

  • worai agent --agent-cli codex
  • worai agent --agent-cli codex -- --yolo --search
  • worai agent --agent-cli claude --profile acme
  • worai agent --agent-cli gemini --config ./worai.toml --profile acme
  • worai agent mcp serve --profile acme

web-pages

  • worai web-pages classify-types https://example.com/sitemap.xml --ingest-source sitemap --ingest-loader playwright --url-regex "/blog/" --output ./types.csv
  • worai web-pages classify-types ./urls.txt --ingest-source urls --output ./types.csv
  • worai web-pages classify-types https://docs.google.com/spreadsheets/d/<id>/edit --ingest-source sheets --sheet-name URLs --service-account ./service-account.json --output ./types.csv
  • worai web-pages classify-types https://example.com/sitemap.xml --ingest-source sitemap --output ./types.csv --yes (skip credit-consumption confirmation)

validate

  • worai graph validate ./data.jsonld --builtin-shape review-snippet --shape ./custom.ttl --level warning --format json
  • worai validate jsonld --shape review-snippet --shape schema-review ./data.jsonld
  • worai validate jsonld --format raw https://api.wordlift.io/data/example.jsonld
  • worai structured-data validate page https://example.com/article --shape review-snippet

self update

  • worai self update --check-only
  • worai self update --yes

upload-entities-from-turtle

  • worai upload-entities-from-turtle ./entities --recursive --limit 50

dil-import

  • worai dil-import <wordlift_key> <path_to_csv_file>

Troubleshooting

  • Playwright missing browsers:
    • playwright install chromium
  • YARRRML conversion:
    • npm install -g @rmlio/yarrrml-parser
  • RML execution:
    • morph-kgc is included in project dependencies
  • Dependency notes:
    • Common runtime libs (e.g., requests, rdflib, tqdm, advertools, Google auth helpers) are provided transitively by wordlift-sdk.
  • OAuth token issues:
    • Remove the token file and re-run worai google-search-console or worai canonicals dedupe.
    • If you are prompted to re-auth every run, delete the token file to force a new consent flow that includes a refresh token.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

worai-6.13.0.tar.gz (190.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

worai-6.13.0-py3-none-any.whl (158.1 kB view details)

Uploaded Python 3

File details

Details for the file worai-6.13.0.tar.gz.

File metadata

  • Download URL: worai-6.13.0.tar.gz
  • Upload date:
  • Size: 190.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for worai-6.13.0.tar.gz
Algorithm Hash digest
SHA256 5d50820ca1fedbc1b83ee60fbb101f9d2b9c12264bd6d2cae91328b2f5a5fa67
MD5 b4b10f3a9e244e8c9dad0689abd02208
BLAKE2b-256 93abcbaf2ee23cc124bad9fb86f6813ab55a3748797e8ae7a0b51bf7c4c78064

See more details on using hashes here.

Provenance

The following attestation bundles were made for worai-6.13.0.tar.gz:

Publisher: publish.yml on wordlift/worai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file worai-6.13.0-py3-none-any.whl.

File metadata

  • Download URL: worai-6.13.0-py3-none-any.whl
  • Upload date:
  • Size: 158.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for worai-6.13.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8b898b57392f35c8cb27f24345e1878525e26968c73964d6c15665c6a05dd0c7
MD5 588ddfcf873fb3b08b2cb03760878995
BLAKE2b-256 eeb15ad50daddee846cd8f491256ab984a6dc65d72bf6347c05948a9ee245941

See more details on using hashes here.

Provenance

The following attestation bundles were made for worai-6.13.0-py3-none-any.whl:

Publisher: publish.yml on wordlift/worai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page