AI-powered CLI for WordLift knowledge graph and SEO workflows.
Project description
worai
Command-line toolkit for WordLift operations and SEO checks. Pronunciation: "waw-RYE"
Docs: https://docs.wordlift.io/worai/
Install
- One-line installer (macOS/Linux):
curl -fsSL https://raw.githubusercontent.com/wordlift/worai-install/main/install-worai.sh | bash
- One-line installer (Windows PowerShell):
irm https://raw.githubusercontent.com/wordlift/worai-install/main/install-worai.ps1 | iex
After install:
- The script automatically persists
pipx/woraiPATH entries in your shell profile when needed. - For immediate use in the current shell:
- zsh:
source ~/.zshrc - bash:
source ~/.bashrc
- zsh:
- Then run:
worai --help
Verify script before running (recommended in stricter environments):
- macOS/Linux:
curl -fsSL -o /tmp/install-worai.sh https://raw.githubusercontent.com/wordlift/worai-install/main/install-worai.sh && less /tmp/install-worai.sh && bash /tmp/install-worai.sh
- Windows PowerShell:
irm https://raw.githubusercontent.com/wordlift/worai-install/main/install-worai.ps1 -OutFile $env:TEMP\install-worai.ps1; notepad $env:TEMP\install-worai.ps1; & $env:TEMP\install-worai.ps1
Manual install:
pipx install woraipip install worai
Full docs: https://docs.wordlift.io/worai/
Runtime dependency note:
wordlift-sdk>=8.0.2,<9.0.0(installed automatically by pip)copier(required byworai graph sync create, installed automatically by pip)
If you plan to run seocheck, install Playwright browsers:
playwright install chromium
Optional output format extras:
pip install 'worai[parquet]'— enables--format parquet(Apache Parquet via pyarrow)pip install 'worai[gsheets]'— enables--format gsheets(write to Google Sheets via gspread)
Quick Start
worai --helpworai seocheck https://example.com/sitemap.xmlworai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.jsonworai canonicals dedupe --input pages_with_titles.csv --site sc-domain:example.com --service-account ./service-account.jsonworai <command> --help
Configuration
Config file (TOML) discovery order:
--configWORAI_CONFIG- nearest
worai.tomlfrom current directory upward (for example./worai.toml,../worai.toml,../../worai.toml) ~/.config/worai/config.toml~/.worai.toml
Profiles:
[profiles.<name>]with--profileorWORAI_PROFILE
Common keys:
profiles.<name>.api_keylog_level(global default logging level:debug|info|warning|error)profiles.<name>.log_level(profile-specific override for root logging level)profiles._base.log_level(shared profile fallback when selected profile has nolog_level)profiles.<name>.mapping(SDK profile contract)profiles.<name>.gsc_site_id(GSC property for commands that query Search Console)profiles.<name>.oauth.client_secrets(OAuth Desktop app client file)profiles.<name>.oauth.token(shared OAuth token file path)profiles.<name>.oauth.service_account(service account credential as file path or inline JSON)profiles.<name>.ga_property_id(preferred GA4 property key for analytics;ga.idremains supported)profiles.<name>.canonicals.output(supports{profile},{date},{seq}interpolation)profiles.<name>.canonicals.intervalprofiles.<name>.canonicals.concurrencyprofiles.<name>.canonicals.request_timeout_sec- one source per profile (
urls,sitemap_url, orsheets_url+sheets_name+sheets_service_account) for SDK profile validity postprocessor_runtime(graph sync runtime:oneshotorpersistent; profile override supported)graph_write_strategy(graph sync write mode:patchorput; defaults topatch; profile override supported)ingest.source(auto|urls|sitemap|sheets|local)ingest.loader(auto|simple|proxy|playwright|premium_scraper|web_scrape_api|passthrough)ingest.passthrough_when_html(default:true)ingest.timeout_ms(optional override; SDK default:30000)ingest.playwright_wait_until(optional override; SDK default:domcontentloaded)- command-specific OAuth/GSC/GA options should be passed via CLI flags or environment variables.
Supported environment variables:
WORAI_CONFIG— path to a config TOML file (overrides discovery order).WORAI_PROFILE— profile name under[profiles.<name>].WORAI_LOG_LEVEL— default log level (debug|info|warning|error).WORAI_LOG_FORMAT— default log format (text|json).WORDLIFT_API_KEY— WordLift API key for entity operations.GSC_CLIENT_SECRETS— path to OAuth client secrets JSON for GSC.GSC_ID— GSC property URL.OAUTH_TOKEN— path to store the shared OAuth token (GSC + GA).GSC_OUTPUT— default output CSV path for GSC export.GA_ID— GA4 property ID for Analytics sections.GSC_TOKEN/GA_TOKEN— legacy aliases forOAUTH_TOKEN(must point to the same file if used).WORAI_DISABLE_UPDATE_CHECK— set to1|true|yes|onto disable startup update checks.
.env support:
worailoads.envfrom the current working directory (and parent lookup) at startup.- values from
.envare treated as environment variables. - existing environment variables take precedence over
.envvalues.
Logging level precedence:
--log-level(highest)WORAI_LOG_LEVELprofiles.<name>.log_levelinworai.toml(when a profile is selected)profiles._base.log_levelinworai.toml(when a profile is selected and no profile-specific value is set)- global
log_levelinworai.toml info(default)- Selected level configures root logging; dependencies may still emit their own
INFOlogs when they set explicit logger levels. - For
graph sync run, the effective level is forwarded tomorph-kgcaslogging_level(DEBUG|INFO|WARNING|ERROR|CRITICAL).
Example environment setup:
export WORDLIFT_API_KEY="wl_..."
export WORAI_CONFIG="~/worai.toml"
export WORAI_PROFILE="dev"
export GSC_CLIENT_SECRETS="~/client_secrets.json"
export OAUTH_TOKEN="~/oauth_token.json"
Example worai.toml:
[profiles.default]
api_key = "${WORDLIFT_API_KEY}"
mapping = "default.yarrrml"
sitemap_url = "https://example.com/sitemap.xml"
ingest_loader = "web_scrape_api"
Ingestion profile examples:
[profiles.inventory_local]
api_key = "${WORDLIFT_API_KEY}"
mapping = "default.yarrrml"
urls = ["https://example.com/page"]
ingest_source = "local"
ingest_loader = "passthrough"
[profiles.inventory_remote]
api_key = "${WORDLIFT_API_KEY}"
mapping = "default.yarrrml"
sitemap_url = "https://example.com/sitemap.xml"
ingest_source = "sitemap"
ingest_loader = "web_scrape_api"
[profiles.graph_sync_proxy]
api_key = "${WORDLIFT_API_KEY}"
mapping = "default.yarrrml"
urls = ["https://example.com/a", "https://example.com/b"]
ingest_source = "urls"
ingest_loader = "proxy"
ingest_timeout_ms = 30000
playwright_wait_until = "domcontentloaded"
Commands
Full docs: https://docs.wordlift.io/worai/
seocheck— run SEO checks for sitemap URLs and URL lists.google-search-console— export GSC page metrics (7d/28d/3m) in table, JSON, CSV, TSV, Parquet, or Google Sheets format.canonicals dedupe— dedupe canonical URLs by title using GSC impressions.dedupe— deduplicate WordLift entities by schema:url.entity-matrix— build a URL × entity-type pivot table from a graph file.canonicalize-duplicate-pages— select canonical URLs using GSC KPIs.delete-entities-from-csv— delete entities listed in a CSV.find-faq-page-wrong-type— find and patch FAQPage typing issues.find-missing-names— find entities missing schema:name/headline.find-url-by-type— list schema:url values by type from RDF.graph— run graph-specific workflows (sync, create, export, validate, property delete, audit, reset).list-entities-outside-dataset— list entity IRIs that fall outside the account dataset.link-groups— build or apply LinkGroup data from CSV.patch— patch entities from RDF.structured-data— generate JSON-LD/YARRRML mappings or materialize RDF from YARRRML.agent— launch codex/claude/gemini with worai MCP + skill guidance.web-pages— run ingestion-backed web page workflows.validate— deprecated JSON-LD validator command (usegraph validatefor RDF files/URLs; usestructured-data validate pagefor webpage URLs).self update— check for new worai versions and optionally run the upgrade command.upload-entities-from-turtle— upload .ttl files with resume.dil-import- upload DILs from a CSV file.
Command help:
worai <command> --help
Autocompletion:
worai --install-completionworai --show-completion
Updates:
woraichecks for new versions periodically and prints a non-blocking notice when an update is available.- run
worai self updateto check manually and see/apply the suggested upgrade command.
Examples
seocheck
worai seocheck https://example.com/sitemap.xmlworai seocheck https://example.com/sitemap.xml --output-dir ./seocheck-report --save-htmlworai seocheck https://example.com/sitemap.xml --output-dir ./seocheck-report --no-open-reportworai seocheck https://example.com/sitemap.xml --user-agent "Mozilla/5.0 ..."worai seocheck https://example.com/sitemap.xml --sitemap-fetch-mode browserworai seocheck https://example.com/sitemap.xml --no-report-uiworai seocheck https://example.com/sitemap.xml --recheck-failed --recheck-from ./seocheck-report
google-search-console
worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json- Writes to
gsc_pages.csvby default (CSV format).
- Writes to
worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json --format table- Prints a rich table to stdout.
worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json --format json- Prints a JSON array to stdout.
worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json --format tsv --output gsc.tsvworai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json --format parquet --output gsc.parquet- Requires
pip install 'worai[parquet]'.
- Requires
worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json --output custom.csv- Uses OAuth redirect port 8080 by default.
canonicals dedupe
worai canonicals dedupe --input pages_with_titles.csv --site sc-domain:example.com --service-account ./service-account.jsonworai canonicals dedupe --input pages_with_titles.csv --site sc-domain:example.com --token oauth_token.json
seoreport (with Analytics)
worai seoreport --site sc-domain:example.com --ga-id 123456789 --format html
canonicalize-duplicate-pages
worai canonicalize-duplicate-pages --input gsc_pages.csv --output canonical_targets.csv --kpi-window 28d --kpi-metric clicksworai canonicalize-duplicate-pages --input gsc_pages.csv --entity-type Product
dedupe
worai dedupe --dry-run
entity-matrix
worai entity-matrix graph.ttlworai entity-matrix graph.ttl --exclude-type WebPage --exclude-type BreadcrumbList --format csv --output matrix.csvworai entity-matrix graph.ttl --cluster --format tsv
find-faq-page-wrong-type
worai find-faq-page-wrong-type ./data.ttl --dry-run --replace-typeworai find-faq-page-wrong-type ./data.ttl --patch --replace-type
find-missing-names
worai find-missing-names ./data.ttl
find-url-by-type
worai find-url-by-type ./data.ttl schema:Service schema:Product
link-groups
worai link-groups ./links.csv --format turtleworai link-groups ./links.csv --apply --dry-run --concurrency 4
graph
worai --config ./worai.toml --profile acme graph sync runworai --profile acme graph sync run --debugworai graph sync create ./acme-graphworai graph sync create ./acme-graph --template ./graph-sync-template --defaultsworai graph sync create ./acme-graph --data-file ./answers.yml --non-interactiveworai graph sync create ./acme-graph --vcs-ref v1.2.3worai graph exportworai --profile acme graph exportworai --profile acme graph export ./acme-export.jsonldworai graph export ./acme-export.ttl --validateworai graph validate ./graph.ttl ./graph.jsonld --builtin-shape google-required --level warning --format textworai graph property delete seovoc:html --dry-runworai graph property delete https://w3id.org/seovoc/html --yes --workers 4worai graph audit ./acme-export.ttlworai graph audit ./acme-export.ttl --format json --show-url-violationsworai graph audit ./acme-export.ttl --rich-snippets-granularity entities --issue-level errorworai --profile acme graph reset --yesworai --profile acme graph reset --keep-country --keep-languagegraph exportreads API key fromworai.tomlprofile (root--profile, thenWORAI_PROFILE, thendefault) and calls/dataset/export.graph exportoutput format is inferred from extension:.ttl,.nt,.nq,.rdf/.xml,.jsonld/.json.graph exportdefault filename:export_<profile>_<yyyyMMdd>_<seq>.ttl(sequence starts at1).graph export --validateruns SHACL validation on the exported file and fails on SHACL errors/warnings.graph validateaccepts one or more local files or URLs and supports shape composition with:--builtin-shape <name>--exclude-builtin-shape <name>--shape <file-or-url>
graph validate --level warning|errorcontrols failure threshold;--format text|jsoncontrols output.graph property deletesendsX-include-Private: trueby default for both GraphQL match discovery and entity PATCH requests.graph sync createruns Copier in trusted mode by default so template_tasksexecute.graph sync runprofile resolution is: root--profile, thenWORAI_PROFILE, thendefault.- Mapping docs (for
[profiles.<name>]):docs/graph-sync-mappings-reference.md,docs/graph-sync-mappings-guide.md,docs/graph-sync-mappings-examples.md - Internal template-agent workflow docs:
specs/graph-sync/AGENTS.md,specs/graph-sync/INDEX.md,specs/graph-sync/developer-agent-workflow.md - Profile loading standard for non-sync commands:
specs/profile-loading-standard.md - Configure exactly one source mode per run:
urls,sitemap_url(+ optional pattern), orsheets_url+sheets_name. - With
wordlift-sdk>=6.10.0, Playwright-backed ingestion defaults are owned by the SDK:INGEST_TIMEOUT_MS=30000andPLAYWRIGHT_WAIT_UNTIL="domcontentloaded". web_page_import_timeoutremains supported forgraph syncas a legacy seconds-based alias.- SDK 6 defaults to persistent postprocessor runtime.
- set
postprocessor_runtime = "oneshot"inworai.tomlto keep old one-process-per-callback behavior. - SDK
wordlift-sdk5.1.1+ postprocessor context migration:context.settings->context.profile(for examplecontext.profile["settings"]["api_url"])context.account.key->context.account_keycontext.accountremains the clean/meaccount object
- SDK 6 ingestion uses explicit keys:
INGEST_SOURCE(urls|sitemap|sheets|local|auto)INGEST_LOADER(web_scrape_api|proxy|premium_scraper|playwright|simple|passthrough|auto)INGEST_TIMEOUT_MS(milliseconds)PLAYWRIGHT_WAIT_UNTIL(domcontentloaded|load|networkidle) when explicitly configured
- SDK 6 migration deprecates integration use of
WEB_PAGE_IMPORT_MODEandWEB_PAGE_IMPORT_TIMEOUT. graph sync runusesrun_cloud_workflowand emits per-graph progress and final KPI summaries through CLI logs (on_info,on_progress,on_kpi).graph sync run --debugwrites SDK callback artifacts underoutput/debug_cloud/<profile>/from the current working directory:static_templates.ttlcloud_<sha256(url)>.ttlfor each callback URL.
- SHACL validation settings mapping for SDK 6.2+:
- use
shacl_validate_mode = "warn"|"fail"|"off" - use
shacl_builtin_shapes,shacl_exclude_builtin_shapes,shacl_extra_shapes shacl_validate_syncandshacl_shape_specsare no longer supported
- use
patch
worai patch ./data.ttl --dry-run --add-types
structured-data
worai structured-data create https://example.com/article Review --output-dir ./structured-dataworai structured-data create https://example.com/article --type Review --output-dir ./structured-dataworai structured-data create https://example.com/article --type Review --debugworai structured-data create https://example.com/article --type Review --max-xhtml-chars 40000 --max-nesting-depth 2worai structured-data generate https://example.com/sitemap.xml --yarrrml ./mapping.yarrrml --output-dir ./outworai structured-data generate https://example.com/page --yarrrml ./mapping.yarrrml --format jsonldworai structured-data inventory https://example.com/sitemap.xml --output ./structured-data-inventory.csvworai structured-data inventory ./urls.txt --output ./structured-data-inventory.csvworai structured-data inventory https://docs.google.com/spreadsheets/d/<id>/edit --sheet-name URLs_US --output ./structured-data-inventory.csvworai structured-data inventory https://example.com/sitemap.xml --destination-sheet-id <spreadsheet_id> --destination-sheet-name Inventoryworai structured-data inventory https://example.com/sitemap.xml --output ./structured-data-inventory.csv --concurrency autoworai structured-data inventory https://example.com/sitemap.xml --url-regex "/blog/" --output ./structured-data-inventory.csvworai structured-data inventory /path/to/debug_cloud/us --source-type debug-cloud --output ./structured-data-inventory.csvworai structured-data inventory /path/to/debug_cloud/us --ingest-source local --ingest-loader passthrough --output ./structured-data-inventory.csvworai structured-data inventory https://example.com/sitemap.xml --ingest-loader web_scrape_api --output ./structured-data-inventory.csv
agent
worai agent --agent-cli codexworai agent --agent-cli codex -- --yolo --searchworai agent --agent-cli claude --profile acmeworai agent --agent-cli gemini --config ./worai.toml --profile acmeworai agent mcp serve --profile acme
web-pages
worai web-pages classify-types https://example.com/sitemap.xml --ingest-source sitemap --ingest-loader playwright --url-regex "/blog/" --output ./types.csvworai web-pages classify-types ./urls.txt --ingest-source urls --output ./types.csvworai web-pages classify-types https://docs.google.com/spreadsheets/d/<id>/edit --ingest-source sheets --sheet-name URLs --service-account ./service-account.json --output ./types.csvworai web-pages classify-types https://example.com/sitemap.xml --ingest-source sitemap --output ./types.csv --yes(skip credit-consumption confirmation)worai web-pages classify-types https://example.com/sitemap.xml --ingest-source sitemap --output ./types.csv --no-resume --yes(force a full rerun instead of resuming)
validate
worai graph validate ./data.jsonld --builtin-shape review-snippet --shape ./custom.ttl --level warning --format jsonworai validate jsonld --shape review-snippet --shape schema-review ./data.jsonldworai validate jsonld --format raw https://api.wordlift.io/data/example.jsonldworai structured-data validate page https://example.com/article --shape review-snippet
self update
worai self update --check-onlyworai self update --yes
upload-entities-from-turtle
worai upload-entities-from-turtle ./entities --recursive --limit 50
list-entities-outside-dataset
worai list-entities-outside-datasetworai --profile acme list-entities-outside-datasetworai --profile acme list-entities-outside-dataset --limit 100worai --profile acme list-entities-outside-dataset --dataset-uri https://data.example.com/
dil-import
worai dil-import <wordlift_key> <path_to_csv_file>
Troubleshooting
- Playwright missing browsers:
playwright install chromium
- YARRRML conversion:
npm install -g @rmlio/yarrrml-parser
- RML execution:
morph-kgcis included in project dependencies
- Dependency notes:
- Common runtime libs (e.g.,
requests,rdflib,tqdm,advertools, Google auth helpers) are provided transitively bywordlift-sdk. --format parquetrequirespip install 'worai[parquet]'(pyarrow>=14).--format gsheetsrequirespip install 'worai[gsheets]'(gspread>=6).
- Common runtime libs (e.g.,
- OAuth token issues:
- Remove the token file and re-run
worai google-search-consoleorworai canonicals dedupe. - If you are prompted to re-auth every run, delete the token file to force a new consent flow that includes a refresh token.
- Remove the token file and re-run
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file worai-6.17.6.tar.gz.
File metadata
- Download URL: worai-6.17.6.tar.gz
- Upload date:
- Size: 211.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f5f722c852513cd95f774672449741f024c5052488353ba7425524b61b31555
|
|
| MD5 |
e81f2d24c4f6f9f5ea8f475a7b5bdc0b
|
|
| BLAKE2b-256 |
b4c4d631f4f81e0f04bb86a7e6204be83f289b5cc6aa75ddd36fe334332d83f9
|
Provenance
The following attestation bundles were made for worai-6.17.6.tar.gz:
Publisher:
publish.yml on wordlift/worai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
worai-6.17.6.tar.gz -
Subject digest:
0f5f722c852513cd95f774672449741f024c5052488353ba7425524b61b31555 - Sigstore transparency entry: 1185763405
- Sigstore integration time:
-
Permalink:
wordlift/worai@6b9b8f8c7b15ef2f02d8bbfc4548e519c2216acb -
Branch / Tag:
refs/tags/v6.17.6 - Owner: https://github.com/wordlift
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6b9b8f8c7b15ef2f02d8bbfc4548e519c2216acb -
Trigger Event:
push
-
Statement type:
File details
Details for the file worai-6.17.6-py3-none-any.whl.
File metadata
- Download URL: worai-6.17.6-py3-none-any.whl
- Upload date:
- Size: 174.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f51439cc13c5da341b0c88d5647f599d5cf910577f23b40c92eb083a8f37987
|
|
| MD5 |
3485318857998cdf886bdd5a65429c34
|
|
| BLAKE2b-256 |
36b7efc8c6061c76b39cd3f80644e7d097cc280f9af3a699686a5888b0761c77
|
Provenance
The following attestation bundles were made for worai-6.17.6-py3-none-any.whl:
Publisher:
publish.yml on wordlift/worai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
worai-6.17.6-py3-none-any.whl -
Subject digest:
0f51439cc13c5da341b0c88d5647f599d5cf910577f23b40c92eb083a8f37987 - Sigstore transparency entry: 1185763414
- Sigstore integration time:
-
Permalink:
wordlift/worai@6b9b8f8c7b15ef2f02d8bbfc4548e519c2216acb -
Branch / Tag:
refs/tags/v6.17.6 - Owner: https://github.com/wordlift
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6b9b8f8c7b15ef2f02d8bbfc4548e519c2216acb -
Trigger Event:
push
-
Statement type: