Skip to main content

Microsoft Purview CLI with comprehensive automation capabilities

Project description

pvw-cli — Microsoft Purview Command-Line Interface

Version Status Docs

A Python CLI and library for automating Microsoft Purview. Covers the Data Map, Unified Catalog, Collections, Search, Lineage, Scan, and Management APIs.


Install

pip install pvw-cli

For the latest development version:

git clone https://github.com/Keayoub/pvw-cli.git
cd pvw-cli
pip install -r requirements.txt
pip install -e .

Configuration

Set these three environment variables before running any command:

Variable Description
PURVIEW_ACCOUNT_NAME Your Purview account name (e.g. mycompany-purview)
PURVIEW_ACCOUNT_ID Your Azure Tenant ID (used as the Purview account ID for UC APIs)
PURVIEW_RESOURCE_GROUP The resource group containing your Purview account

PowerShell:

$env:PURVIEW_ACCOUNT_NAME = "your-purview-account"
$env:PURVIEW_ACCOUNT_ID   = "your-tenant-id-guid"
$env:PURVIEW_RESOURCE_GROUP = "your-resource-group"

Bash / Linux / macOS:

export PURVIEW_ACCOUNT_NAME=your-purview-account
export PURVIEW_ACCOUNT_ID=your-tenant-id-guid
export PURVIEW_RESOURCE_GROUP=your-resource-group

To find your Tenant ID:

az account show --query tenantId -o tsv

Authentication

The CLI uses DefaultAzureCredential and tries methods in this order:

  1. Azure CLI — run az login (easiest for local use)
  2. Service Principal — set AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET
  3. Managed Identity — works automatically on Azure VMs, App Service, etc.

Legacy tenant note: If you get AADSTS500011: resource principal https://purview.azure.com not found, your tenant uses the older service principal. Set:

export PURVIEW_AUTH_SCOPE=https://purview.azure.net/.default

Check which your tenant uses:

az ad sp show --id "73c2949e-da2d-457a-9607-fcc665198967" --query servicePrincipalNames -o json

Command Groups

pvw account          Account management
pvw collections      Collections CRUD and permissions
pvw entity           Entity read, create, update, bulk operations
pvw glossary         Classic glossary terms
pvw lineage          Lineage creation and CSV import
pvw scan             Data source scanning
pvw search           Search and discovery
pvw types            Type definitions
pvw uc               Unified Catalog (domains, terms, data products, OKRs, CDEs, quality)
pvw workflow         Approval workflows
pvw diagnostics      Cache stats and profile info

Run pvw <command> --help for full options on any command.


📚 Quick Start & Documentation

Quick Reference Guide

For a comprehensive command reference with examples, see docs/quick-reference.md

This guide covers:

  • All Unified Catalog commands (terms, domains, data products, CDEs, OKRs)
  • Data Quality commands and workflow examples
  • Facets, hierarchy, and relationship operations
  • Common patterns and troubleshooting tips

Additional Documentation



Examples

Search

# Search by keyword
pvw search query --keywords "customer" --limit 10

# Table output (default), JSON, or colored JSON
pvw search query --keywords "sales" --limit 5
pvw search query --keywords "sales" --limit 5 --output json
pvw search query --keywords "sales" --limit 5 --output jsonc

# Show GUIDs in output (useful for follow-up operations)
pvw search query --keywords "customer" --show-ids

# Autocomplete and suggestions
pvw search autocomplete --keywords "ord" --limit 5
pvw search suggest --keywords "prod" --limit 5

Entity

# List all entities
pvw entity list --limit 25

# Filter by type
pvw entity list --type-name azure_sql_table --limit 10

# Read entity by GUID
pvw entity read --guid "4fae348b-e960-42f7-834c-38f6f6f60000"

# Update a single attribute
pvw entity update-attribute \
  --guid "4fae348b-e960-42f7-834c-38f6f6f60000" \
  --attribute description \
  --value "Customer address data - SalesLT schema"

# Add a classification
pvw entity add-classification \
  --guid "ea3412c3-7387-4bc1-9923-11f6f6f60000" \
  --classification "MICROSOFT.PERSONAL.EMAIL"

# Business metadata
pvw entity add-business-metadata \
  --guid "entity-guid" \
  --bm-name "Compliance" \
  --attr-name "DataOwner" \
  --attr-value "finance-team"

Collections

# List collections and hierarchy
pvw collections list
pvw collections read-hierarchy --collection-name "Data Engineering"

# Create a collection
pvw collections create \
  --name "analytics" \
  --friendly-name "Analytics Team" \
  --description "Assets for the analytics team"

# View permissions
pvw collections read-permissions --collection-name "analytics"

Unified Catalog (UC)

# Domains
pvw uc domain list
pvw uc domain create --name "Finance" --description "Financial data governance"
pvw uc domain get --domain-id "abc-123"

# Glossary terms
pvw uc term list --domain-id "abc-123"
pvw uc term list --domain-id "abc-123" --output json
pvw uc term create --name "Customer" --domain-id "abc-123" --description "A person who purchases products"
pvw uc term show --term-id "term-456"
pvw uc term update --term-id "term-456" --description "Updated definition"
pvw uc term delete --term-id "term-456" --confirm

# Bulk term import from CSV
pvw uc term import-csv --csv-file samples/csv/uc_terms_bulk_example.csv --domain-id "abc-123" --dry-run
pvw uc term import-csv --csv-file samples/csv/uc_terms_bulk_example.csv --domain-id "abc-123"

# Bulk term import from JSON
pvw uc term import-json --json-file samples/json/term/uc_terms_bulk_example.json --domain-id "abc-123"

# Sync UC terms to a classic glossary
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid"
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid" --update-existing
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid" --update-existing --delete-removed
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid" --update-existing --dry-run

# Data products
pvw uc dataproduct list --domain-id "abc-123"
pvw uc dataproduct create --name "Customer Analytics" --domain-id "abc-123" --type Analytical --status Draft
pvw uc dataproduct update --product-id "prod-789" --status Published --endorsed

# Link a data product to an entity
pvw uc dataproduct link-entity \
  --id "prod-789" \
  --entity-id "4fae348b-e960-42f7-834c-38f6f6f60000" \
  --type-name azure_sql_table

# Objectives (OKRs)
pvw uc objective list --domain-id "abc-123"
pvw uc objective create --definition "Improve data quality score to 95%" --domain-id "abc-123"

# Critical Data Elements (CDEs)
pvw uc cde list --domain-id "abc-123"
pvw uc cde create --name "Social Security Number" --data-type String --domain-id "abc-123"
pvw uc cde link-entity --id "cde-789" --entity-id "ea3412c3-7387-4bc1-9923-11f6f6f60000"

# Facets and analytics
pvw uc term facets --output table
pvw uc dataproduct facets --domain-id "abc-123" --output json
pvw uc cde facets --output table

# Governance health
pvw uc health query
pvw uc health query --severity High
pvw uc health summary
pvw uc health update --action-id "action-guid" --status InProgress

Lineage

# Create column-level lineage
pvw lineage create-column \
  --process-name "ETL_Sales_Transform" \
  --source-table-guid "9ebbd583-4987-4d1b-b4f5-d8f6f6f60000" \
  --target-table-guids "c88126ba-5fb5-4d33-bbe2-5ff6f6f60000" \
  --column-mapping "ProductID:ProductID,Name:Name"

# Import from CSV
pvw lineage validate lineage_data.csv
pvw lineage import lineage_data.csv
pvw lineage sample output.csv --num-samples 10 --template detailed

Lineage CSV columns: source_entity_guid, target_entity_guid, relationship_type, process_name, description, confidence_score, owner, metadata

Classic Glossary

pvw glossary list-terms --glossary-guid "your-glossary-guid"
pvw glossary create-term --payload-file term.json

Workflows

pvw workflow list
pvw workflow get --workflow-id "workflow-123"
pvw workflow create --workflow-id "approval-1" --payload-file workflow-definition.json
pvw workflow execute --workflow-id "workflow-123"
pvw workflow executions --workflow-id "workflow-123"

Diagnostics

pvw diagnostics cache-stats
pvw diagnostics profile-info
pvw diagnostics clear-cache

Output Formats

Most list commands support --output:

Format Use case
table Default — human-readable Rich table
json Plain JSON for piping to PowerShell, bash, jq
jsonc Colored JSON for viewing in terminal

PowerShell example:

$terms = pvw uc term list --domain-id $domainId --output json | ConvertFrom-Json
$terms | Where-Object { $_.status -eq "Draft" } | Export-Csv draft_terms.csv -NoTypeInformation

Bash / jq example:

pvw uc term list --domain-id $DOMAIN_ID --output json | jq '.[] | .name'

Bulk Import CSV Format (Terms)

name,description,status,acronym,owner_id,resource_name,resource_url
Customer Acquisition Cost,Cost to acquire a new customer,Draft,CAC,<entra-object-id-guid>,Metrics Guide,https://docs.example.com

Notes:

  • owner_id must be an Entra ID Object ID (GUID), not an email address
  • Terms in unpublished domains must use Draft status
  • Sample files: samples/csv/uc_terms_bulk_example.csv, samples/json/term/uc_terms_bulk_example.json

Sample Files

Path Contents
samples/csv/uc_terms_bulk_example.csv 8 sample UC terms for import
samples/json/term/uc_terms_bulk_example.json 8 data management terms (JSON format)
samples/csv/lineage_example.csv Sample lineage relationships
samples/notebooks (basic)/ Basic Purview CLI notebook examples
samples/notebooks (plus)/ Advanced examples including bulk import

Documentation


Requirements

  • Python 3.8+
  • Microsoft Purview account
  • Azure CLI (az login) or Service Principal credentials

Support


License

See LICENSE for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pvw_cli-1.11.10.tar.gz (289.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pvw_cli-1.11.10-py3-none-any.whl (312.3 kB view details)

Uploaded Python 3

File details

Details for the file pvw_cli-1.11.10.tar.gz.

File metadata

  • Download URL: pvw_cli-1.11.10.tar.gz
  • Upload date:
  • Size: 289.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pvw_cli-1.11.10.tar.gz
Algorithm Hash digest
SHA256 e5d2df1b55c1790b4054bb27241e7713791c063028ef182ad09093955c1c935d
MD5 20a356dfbfcfd6437ff49ec132e6a008
BLAKE2b-256 3ef635f165e0ef3fb4ee7c9e2c2d10b2bbedafdd2f3cf72029bca8259376d55e

See more details on using hashes here.

Provenance

The following attestation bundles were made for pvw_cli-1.11.10.tar.gz:

Publisher: publish-to-pypi.yml on Keayoub/pvw-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pvw_cli-1.11.10-py3-none-any.whl.

File metadata

  • Download URL: pvw_cli-1.11.10-py3-none-any.whl
  • Upload date:
  • Size: 312.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pvw_cli-1.11.10-py3-none-any.whl
Algorithm Hash digest
SHA256 a0ef8c4404a1d18603e1613bb94d8073777be5d5313efd52cc6ac75ec7576c99
MD5 76e5bf966af3ec3730b6988e1f9b2324
BLAKE2b-256 ba30955045036741e7f46bff7164f9338223c1e0873c680d641c4c319d7e9591

See more details on using hashes here.

Provenance

The following attestation bundles were made for pvw_cli-1.11.10-py3-none-any.whl:

Publisher: publish-to-pypi.yml on Keayoub/pvw-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page