Skip to main content

Microsoft Purview CLI with comprehensive automation capabilities

Project description

pvw-cli — Microsoft Purview Command-Line Interface

Version Status Docs

A Python CLI and library for automating Microsoft Purview. Covers the Data Map, Unified Catalog, Collections, Search, Lineage, Scan, and Management APIs.


📊 API Coverage (auto-updated 2026-05-20)

Static Coverage — Defined vs Implemented

Metric Value
Service groups defined in endpoints.py 17
Groups with a CLI file 17 / 17 (100%)
Total operations defined 415
Total CLI commands wired 383 (~92% op-level)
Group Defined ops CLI file(s) Commands
account 10 account.py 9
collections 12 collections.py 11
data_quality 35 quality.py 35
devops_policies 7 policystore.py 10
discovery 11 search.py 8
entity 40 entity.py 56
glossary 44 glossary.py 32
lineage 12 lineage.py 22
management 28 management.py 11
metadata_policies 6 policystore.py 10
relationship 9 relationship.py 5
scanning 36 scan.py 23
self_service_policies 5 policystore.py 10
sharing 16 share.py 31
types 27 types.py 23
unified_catalog 66 unified_catalog.py, domain.py 76
workflow 51 workflow.py 11

Live Probe Results

Endpoint Method Status Result

Live probe summary: 0 live, 0 deployed (needs payload/auth fix), 0 missing/404

Install

pip install pvw-cli

For the latest development version:

git clone https://github.com/Keayoub/pvw-cli.git
cd pvw-cli
pip install -r requirements.txt
pip install -e .

Configuration

Set these three environment variables before running any command:

Variable Description
PURVIEW_ACCOUNT_NAME Your Purview account name (e.g. mycompany-purview)
PURVIEW_ACCOUNT_ID Your Azure Tenant ID (used as the Purview account ID for UC APIs)
PURVIEW_RESOURCE_GROUP The resource group containing your Purview account

PowerShell:

$env:PURVIEW_ACCOUNT_NAME = "your-purview-account"
$env:PURVIEW_ACCOUNT_ID   = "your-tenant-id-guid"
$env:PURVIEW_RESOURCE_GROUP = "your-resource-group"

Bash / Linux / macOS:

export PURVIEW_ACCOUNT_NAME=your-purview-account
export PURVIEW_ACCOUNT_ID=your-tenant-id-guid
export PURVIEW_RESOURCE_GROUP=your-resource-group

To find your Tenant ID:

az account show --query tenantId -o tsv

Authentication

The CLI uses DefaultAzureCredential and tries methods in this order:

  1. Azure CLI — run az login (easiest for local use)
  2. Service Principal — set AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET
  3. Managed Identity — works automatically on Azure VMs, App Service, etc.

Legacy tenant note: If you get AADSTS500011: resource principal https://purview.azure.com not found, your tenant uses the older service principal. Set:

export PURVIEW_AUTH_SCOPE=https://purview.azure.net/.default

Check which your tenant uses:

az ad sp show --id "73c2949e-da2d-457a-9607-fcc665198967" --query servicePrincipalNames -o json

Command Groups

pvw account          Account management
pvw collections      Collections CRUD and permissions
pvw entity           Entity read, create, update, bulk operations
pvw glossary         Classic glossary terms
pvw lineage          Lineage creation and CSV import
pvw scan             Data source scanning
pvw search           Search and discovery
pvw types            Type definitions
pvw uc               Unified Catalog (domains, terms, data products, OKRs, CDEs, quality)
pvw workflow         Approval workflows
pvw diagnostics      Cache stats and profile info

Run pvw <command> --help for full options on any command.


📚 Quick Start & Documentation

Quick Reference Guide

For a comprehensive command reference with examples, see docs/quick-reference.md

This guide covers:

  • All Unified Catalog commands (terms, domains, data products, CDEs, OKRs)
  • Data Quality commands and workflow examples
  • Facets, hierarchy, and relationship operations
  • Common patterns and troubleshooting tips

Additional Documentation



Examples

Search

# Search by keyword
pvw search query --keywords "customer" --limit 10

# Table output (default), JSON, or colored JSON
pvw search query --keywords "sales" --limit 5
pvw search query --keywords "sales" --limit 5 --output json
pvw search query --keywords "sales" --limit 5 --output jsonc

# Show GUIDs in output (useful for follow-up operations)
pvw search query --keywords "customer" --show-ids

# Autocomplete and suggestions
pvw search autocomplete --keywords "ord" --limit 5
pvw search suggest --keywords "prod" --limit 5

Entity

# List all entities
pvw entity list --limit 25

# Filter by type
pvw entity list --type-name azure_sql_table --limit 10

# Read entity by GUID
pvw entity read --guid "4fae348b-e960-42f7-834c-38f6f6f60000"

# Update a single attribute
pvw entity update-attribute \
  --guid "4fae348b-e960-42f7-834c-38f6f6f60000" \
  --attribute description \
  --value "Customer address data - SalesLT schema"

# Add a classification
pvw entity add-classification \
  --guid "ea3412c3-7387-4bc1-9923-11f6f6f60000" \
  --classification "MICROSOFT.PERSONAL.EMAIL"

# Business metadata
pvw entity add-business-metadata \
  --guid "entity-guid" \
  --bm-name "Compliance" \
  --attr-name "DataOwner" \
  --attr-value "finance-team"

Collections

# List collections and hierarchy
pvw collections list
pvw collections read-hierarchy --collection-name "Data Engineering"

# Create a collection
pvw collections create \
  --name "analytics" \
  --friendly-name "Analytics Team" \
  --description "Assets for the analytics team"

# View permissions
pvw collections read-permissions --collection-name "analytics"

Unified Catalog (UC)

# Domains
pvw uc domain list
pvw uc domain create --name "Finance" --description "Financial data governance"
pvw uc domain get --domain-id "abc-123"

# Glossary terms
pvw uc term list --domain-id "abc-123"
pvw uc term list --domain-id "abc-123" --output json
pvw uc term create --name "Customer" --domain-id "abc-123" --description "A person who purchases products"
pvw uc term show --term-id "term-456"
pvw uc term update --term-id "term-456" --description "Updated definition"
pvw uc term delete --term-id "term-456" --confirm

# Bulk term import from CSV
pvw uc term import-csv --csv-file samples/csv/uc_terms_bulk_example.csv --domain-id "abc-123" --dry-run
pvw uc term import-csv --csv-file samples/csv/uc_terms_bulk_example.csv --domain-id "abc-123"

# Bulk term import from JSON
pvw uc term import-json --json-file samples/json/term/uc_terms_bulk_example.json --domain-id "abc-123"

# Sync UC terms to a classic glossary
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid"
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid" --update-existing
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid" --update-existing --delete-removed
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid" --update-existing --dry-run

# Data products
pvw uc dataproduct list --domain-id "abc-123"
pvw uc dataproduct create --name "Customer Analytics" --domain-id "abc-123" --type Analytical --status Draft
pvw uc dataproduct update --product-id "prod-789" --status Published --endorsed

# Link a data product to an entity
pvw uc dataproduct link-entity \
  --id "prod-789" \
  --entity-id "4fae348b-e960-42f7-834c-38f6f6f60000" \
  --type-name azure_sql_table

# Business metadata cleanup
pvw uc metadata list
pvw uc metadata cleanup --name "SecteursActivite" --check-only --verbose
pvw uc metadata cleanup --name "SecteursActivite" --verbose

# Delete a definition directly (definition/group name)
pvw uc metadata delete-definition --name "Glossaire" --dry-run
pvw uc metadata delete-definition --name "Glossaire"

# Objectives (OKRs)
pvw uc objective list --domain-id "abc-123"
pvw uc objective create --definition "Improve data quality score to 95%" --domain-id "abc-123"

# Critical Data Elements (CDEs)
pvw uc cde list --domain-id "abc-123"
pvw uc cde create --name "Social Security Number" --data-type String --domain-id "abc-123"
pvw uc cde link-entity --id "cde-789" --entity-id "ea3412c3-7387-4bc1-9923-11f6f6f60000"

# Facets and analytics
pvw uc term facets --output table
pvw uc dataproduct facets --domain-id "abc-123" --output json
pvw uc cde facets --output table

# Governance health
pvw uc health query
pvw uc health query --severity High
pvw uc health summary
pvw uc health update --action-id "action-guid" --status InProgress

Lineage

# Create column-level lineage
pvw lineage create-column \
  --process-name "ETL_Sales_Transform" \
  --source-table-guid "9ebbd583-4987-4d1b-b4f5-d8f6f6f60000" \
  --target-table-guids "c88126ba-5fb5-4d33-bbe2-5ff6f6f60000" \
  --column-mapping "ProductID:ProductID,Name:Name"

# Import from CSV
pvw lineage validate lineage_data.csv
pvw lineage import lineage_data.csv
pvw lineage sample output.csv --num-samples 10 --template detailed

Lineage CSV columns: source_entity_guid, target_entity_guid, relationship_type, process_name, description, confidence_score, owner, metadata

Classic Glossary

pvw glossary list-terms --glossary-guid "your-glossary-guid"
pvw glossary create-term --payload-file term.json

Workflows

pvw workflow list
pvw workflow get --workflow-id "workflow-123"
pvw workflow create --workflow-id "approval-1" --payload-file workflow-definition.json
pvw workflow execute --workflow-id "workflow-123"
pvw workflow executions --workflow-id "workflow-123"

Diagnostics

pvw diagnostics cache-stats
pvw diagnostics profile-info
pvw diagnostics clear-cache

Output Formats

Most list commands support --output:

Format Use case
table Default — human-readable Rich table
json Plain JSON for piping to PowerShell, bash, jq
jsonc Colored JSON for viewing in terminal

PowerShell example:

$terms = pvw uc term list --domain-id $domainId --output json | ConvertFrom-Json
$terms | Where-Object { $_.status -eq "Draft" } | Export-Csv draft_terms.csv -NoTypeInformation

Bash / jq example:

pvw uc term list --domain-id $DOMAIN_ID --output json | jq '.[] | .name'

Bulk Import CSV Format (Terms)

name,description,status,acronym,owner_id,resource_name,resource_url
Customer Acquisition Cost,Cost to acquire a new customer,Draft,CAC,<entra-object-id-guid>,Metrics Guide,https://docs.example.com

Notes:

  • owner_id must be an Entra ID Object ID (GUID), not an email address
  • Terms in unpublished domains must use Draft status
  • Sample files: samples/csv/uc_terms_bulk_example.csv, samples/json/term/uc_terms_bulk_example.json

Sample Files

Path Contents
samples/csv/uc_terms_bulk_example.csv 8 sample UC terms for import
samples/json/term/uc_terms_bulk_example.json 8 data management terms (JSON format)
samples/csv/lineage_example.csv Sample lineage relationships
samples/notebooks (basic)/ Basic Purview CLI notebook examples
samples/notebooks (plus)/ Advanced examples including bulk import

Documentation


Requirements

  • Python 3.8+
  • Microsoft Purview account
  • Azure CLI (az login) or Service Principal credentials

Support


License

This project is licensed under the Apache License 2.0.

See LICENSE for the full license text.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pvw_cli-1.12.0.tar.gz (303.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pvw_cli-1.12.0-py3-none-any.whl (325.4 kB view details)

Uploaded Python 3

File details

Details for the file pvw_cli-1.12.0.tar.gz.

File metadata

  • Download URL: pvw_cli-1.12.0.tar.gz
  • Upload date:
  • Size: 303.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pvw_cli-1.12.0.tar.gz
Algorithm Hash digest
SHA256 800c37427dfeaf54bd6e67f458a2447d8f045f4c0a254a22f2e8e2647439e3ba
MD5 88f433176f3df9edadf1d343839a1952
BLAKE2b-256 8154cfdef6cb153c0c2eab5918907715c4ea353e22450ad06e9f4c47d15df9f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for pvw_cli-1.12.0.tar.gz:

Publisher: publish-to-pypi.yml on Keayoub/pvw-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pvw_cli-1.12.0-py3-none-any.whl.

File metadata

  • Download URL: pvw_cli-1.12.0-py3-none-any.whl
  • Upload date:
  • Size: 325.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pvw_cli-1.12.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bf18fd01aab69fd144dcca2088f4d6b3ebd79216a1207cc8099606b288026c83
MD5 a0a5ffe40a2c81ae2b8c46db7b643ba3
BLAKE2b-256 78dc94ce82eb0d9cef514795a4f224af1eb249a9f5683d0ae5f8bfb4e7e43a71

See more details on using hashes here.

Provenance

The following attestation bundles were made for pvw_cli-1.12.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on Keayoub/pvw-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page