Microsoft Purview CLI with comprehensive automation capabilities
Project description
pvw-cli — Microsoft Purview Command-Line Interface
A Python CLI and library for automating Microsoft Purview. Covers the Data Map, Unified Catalog, Collections, Search, Lineage, Scan, and Management APIs.
Install
pip install pvw-cli
For the latest development version:
git clone https://github.com/Keayoub/pvw-cli.git
cd pvw-cli
pip install -r requirements.txt
pip install -e .
Configuration
Set these three environment variables before running any command:
| Variable | Description |
|---|---|
PURVIEW_ACCOUNT_NAME |
Your Purview account name (e.g. mycompany-purview) |
PURVIEW_ACCOUNT_ID |
Your Azure Tenant ID (used as the Purview account ID for UC APIs) |
PURVIEW_RESOURCE_GROUP |
The resource group containing your Purview account |
PowerShell:
$env:PURVIEW_ACCOUNT_NAME = "your-purview-account"
$env:PURVIEW_ACCOUNT_ID = "your-tenant-id-guid"
$env:PURVIEW_RESOURCE_GROUP = "your-resource-group"
Bash / Linux / macOS:
export PURVIEW_ACCOUNT_NAME=your-purview-account
export PURVIEW_ACCOUNT_ID=your-tenant-id-guid
export PURVIEW_RESOURCE_GROUP=your-resource-group
To find your Tenant ID:
az account show --query tenantId -o tsv
Authentication
The CLI uses DefaultAzureCredential and tries methods in this order:
- Azure CLI — run
az login(easiest for local use) - Service Principal — set
AZURE_CLIENT_ID,AZURE_TENANT_ID,AZURE_CLIENT_SECRET - Managed Identity — works automatically on Azure VMs, App Service, etc.
Legacy tenant note: If you get AADSTS500011: resource principal https://purview.azure.com not found, your tenant uses the older service principal. Set:
export PURVIEW_AUTH_SCOPE=https://purview.azure.net/.default
Check which your tenant uses:
az ad sp show --id "73c2949e-da2d-457a-9607-fcc665198967" --query servicePrincipalNames -o json
Command Groups
pvw account Account management
pvw collections Collections CRUD and permissions
pvw entity Entity read, create, update, bulk operations
pvw glossary Classic glossary terms
pvw lineage Lineage creation and CSV import
pvw scan Data source scanning
pvw search Search and discovery
pvw types Type definitions
pvw uc Unified Catalog (domains, terms, data products, OKRs, CDEs, quality)
pvw workflow Approval workflows
pvw diagnostics Cache stats and profile info
Run pvw <command> --help for full options on any command.
📚 Quick Start & Documentation
Quick Reference Guide
For a comprehensive command reference with examples, see docs/quick-reference.md
This guide covers:
- All Unified Catalog commands (terms, domains, data products, CDEs, OKRs)
- Data Quality commands and workflow examples
- Facets, hierarchy, and relationship operations
- Common patterns and troubleshooting tips
Additional Documentation
- API Implementation Status - Complete API coverage analysis
- Performance Guide - Optimization techniques and caching
- Authentication Troubleshooting - Fix auth issues
- Sample Notebooks - Jupyter notebooks with working examples
- Advanced Notebooks - Data visualization and analytics
Examples
Search
# Search by keyword
pvw search query --keywords "customer" --limit 10
# Table output (default), JSON, or colored JSON
pvw search query --keywords "sales" --limit 5
pvw search query --keywords "sales" --limit 5 --output json
pvw search query --keywords "sales" --limit 5 --output jsonc
# Show GUIDs in output (useful for follow-up operations)
pvw search query --keywords "customer" --show-ids
# Autocomplete and suggestions
pvw search autocomplete --keywords "ord" --limit 5
pvw search suggest --keywords "prod" --limit 5
Entity
# List all entities
pvw entity list --limit 25
# Filter by type
pvw entity list --type-name azure_sql_table --limit 10
# Read entity by GUID
pvw entity read --guid "4fae348b-e960-42f7-834c-38f6f6f60000"
# Update a single attribute
pvw entity update-attribute \
--guid "4fae348b-e960-42f7-834c-38f6f6f60000" \
--attribute description \
--value "Customer address data - SalesLT schema"
# Add a classification
pvw entity add-classification \
--guid "ea3412c3-7387-4bc1-9923-11f6f6f60000" \
--classification "MICROSOFT.PERSONAL.EMAIL"
# Business metadata
pvw entity add-business-metadata \
--guid "entity-guid" \
--bm-name "Compliance" \
--attr-name "DataOwner" \
--attr-value "finance-team"
Collections
# List collections and hierarchy
pvw collections list
pvw collections read-hierarchy --collection-name "Data Engineering"
# Create a collection
pvw collections create \
--name "analytics" \
--friendly-name "Analytics Team" \
--description "Assets for the analytics team"
# View permissions
pvw collections read-permissions --collection-name "analytics"
Unified Catalog (UC)
# Domains
pvw uc domain list
pvw uc domain create --name "Finance" --description "Financial data governance"
pvw uc domain get --domain-id "abc-123"
# Glossary terms
pvw uc term list --domain-id "abc-123"
pvw uc term list --domain-id "abc-123" --output json
pvw uc term create --name "Customer" --domain-id "abc-123" --description "A person who purchases products"
pvw uc term show --term-id "term-456"
pvw uc term update --term-id "term-456" --description "Updated definition"
pvw uc term delete --term-id "term-456" --confirm
# Bulk term import from CSV
pvw uc term import-csv --csv-file samples/csv/uc_terms_bulk_example.csv --domain-id "abc-123" --dry-run
pvw uc term import-csv --csv-file samples/csv/uc_terms_bulk_example.csv --domain-id "abc-123"
# Bulk term import from JSON
pvw uc term import-json --json-file samples/json/term/uc_terms_bulk_example.json --domain-id "abc-123"
# Sync UC terms to a classic glossary
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid"
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid" --update-existing
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid" --update-existing --delete-removed
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid" --update-existing --dry-run
# Data products
pvw uc dataproduct list --domain-id "abc-123"
pvw uc dataproduct create --name "Customer Analytics" --domain-id "abc-123" --type Analytical --status Draft
pvw uc dataproduct update --product-id "prod-789" --status Published --endorsed
# Link a data product to an entity
pvw uc dataproduct link-entity \
--id "prod-789" \
--entity-id "4fae348b-e960-42f7-834c-38f6f6f60000" \
--type-name azure_sql_table
# Objectives (OKRs)
pvw uc objective list --domain-id "abc-123"
pvw uc objective create --definition "Improve data quality score to 95%" --domain-id "abc-123"
# Critical Data Elements (CDEs)
pvw uc cde list --domain-id "abc-123"
pvw uc cde create --name "Social Security Number" --data-type String --domain-id "abc-123"
pvw uc cde link-entity --id "cde-789" --entity-id "ea3412c3-7387-4bc1-9923-11f6f6f60000"
# Facets and analytics
pvw uc term facets --output table
pvw uc dataproduct facets --domain-id "abc-123" --output json
pvw uc cde facets --output table
# Governance health
pvw uc health query
pvw uc health query --severity High
pvw uc health summary
pvw uc health update --action-id "action-guid" --status InProgress
Lineage
# Create column-level lineage
pvw lineage create-column \
--process-name "ETL_Sales_Transform" \
--source-table-guid "9ebbd583-4987-4d1b-b4f5-d8f6f6f60000" \
--target-table-guids "c88126ba-5fb5-4d33-bbe2-5ff6f6f60000" \
--column-mapping "ProductID:ProductID,Name:Name"
# Import from CSV
pvw lineage validate lineage_data.csv
pvw lineage import lineage_data.csv
pvw lineage sample output.csv --num-samples 10 --template detailed
Lineage CSV columns: source_entity_guid, target_entity_guid, relationship_type, process_name, description, confidence_score, owner, metadata
Classic Glossary
pvw glossary list-terms --glossary-guid "your-glossary-guid"
pvw glossary create-term --payload-file term.json
Workflows
pvw workflow list
pvw workflow get --workflow-id "workflow-123"
pvw workflow create --workflow-id "approval-1" --payload-file workflow-definition.json
pvw workflow execute --workflow-id "workflow-123"
pvw workflow executions --workflow-id "workflow-123"
Diagnostics
pvw diagnostics cache-stats
pvw diagnostics profile-info
pvw diagnostics clear-cache
Output Formats
Most list commands support --output:
| Format | Use case |
|---|---|
table |
Default — human-readable Rich table |
json |
Plain JSON for piping to PowerShell, bash, jq |
jsonc |
Colored JSON for viewing in terminal |
PowerShell example:
$terms = pvw uc term list --domain-id $domainId --output json | ConvertFrom-Json
$terms | Where-Object { $_.status -eq "Draft" } | Export-Csv draft_terms.csv -NoTypeInformation
Bash / jq example:
pvw uc term list --domain-id $DOMAIN_ID --output json | jq '.[] | .name'
Bulk Import CSV Format (Terms)
name,description,status,acronym,owner_id,resource_name,resource_url
Customer Acquisition Cost,Cost to acquire a new customer,Draft,CAC,<entra-object-id-guid>,Metrics Guide,https://docs.example.com
Notes:
owner_idmust be an Entra ID Object ID (GUID), not an email address- Terms in unpublished domains must use
Draftstatus - Sample files:
samples/csv/uc_terms_bulk_example.csv,samples/json/term/uc_terms_bulk_example.json
Sample Files
| Path | Contents |
|---|---|
samples/csv/uc_terms_bulk_example.csv |
8 sample UC terms for import |
samples/json/term/uc_terms_bulk_example.json |
8 data management terms (JSON format) |
samples/csv/lineage_example.csv |
Sample lineage relationships |
samples/notebooks (basic)/ |
Basic Purview CLI notebook examples |
samples/notebooks (plus)/ |
Advanced examples including bulk import |
Documentation
- Full docs
- Unified Catalog commands
- Term bulk import guide
- Performance optimization guide
- Release archive
Requirements
- Python 3.8+
- Microsoft Purview account
- Azure CLI (
az login) or Service Principal credentials
Support
- Issues: GitHub Issues
- Email: keayoub@msn.com
License
See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pvw_cli-1.11.5.tar.gz.
File metadata
- Download URL: pvw_cli-1.11.5.tar.gz
- Upload date:
- Size: 286.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f333192773c29c13ddfbf6d7e63edfb72589bba65ce89a7cd602ee7c55b78a79
|
|
| MD5 |
fc4604874711bbf73e26695e682a7bde
|
|
| BLAKE2b-256 |
5cb69500e4a766e7f7930a6125038e90487b7c322400e8a3b3ab0189ef6e9ba2
|
Provenance
The following attestation bundles were made for pvw_cli-1.11.5.tar.gz:
Publisher:
publish-to-pypi.yml on Keayoub/pvw-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pvw_cli-1.11.5.tar.gz -
Subject digest:
f333192773c29c13ddfbf6d7e63edfb72589bba65ce89a7cd602ee7c55b78a79 - Sigstore transparency entry: 1420849030
- Sigstore integration time:
-
Permalink:
Keayoub/pvw-cli@80e1a0f0bfdd9ba55f06c07a8e73cf82301b7ff2 -
Branch / Tag:
refs/tags/v1.11.5 - Owner: https://github.com/Keayoub
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@80e1a0f0bfdd9ba55f06c07a8e73cf82301b7ff2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pvw_cli-1.11.5-py3-none-any.whl.
File metadata
- Download URL: pvw_cli-1.11.5-py3-none-any.whl
- Upload date:
- Size: 310.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9481303b511ff225df0df027b340bf263de88399ba7883143c1122fe11e9c33a
|
|
| MD5 |
d46b2266bafbb868109dd7b4361276b1
|
|
| BLAKE2b-256 |
50e973072a19f96174ef86d9625456032f0c0a360a0016e2d7f7e565f710180d
|
Provenance
The following attestation bundles were made for pvw_cli-1.11.5-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on Keayoub/pvw-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pvw_cli-1.11.5-py3-none-any.whl -
Subject digest:
9481303b511ff225df0df027b340bf263de88399ba7883143c1122fe11e9c33a - Sigstore transparency entry: 1420849178
- Sigstore integration time:
-
Permalink:
Keayoub/pvw-cli@80e1a0f0bfdd9ba55f06c07a8e73cf82301b7ff2 -
Branch / Tag:
refs/tags/v1.11.5 - Owner: https://github.com/Keayoub
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@80e1a0f0bfdd9ba55f06c07a8e73cf82301b7ff2 -
Trigger Event:
push
-
Statement type: