Skip to main content

CLI tool for building and querying ArcGIS item dependency graphs

Project description

ArcGIS Item Dependency Management

Overview

This tool builds and maintains an organization-wide ArcGIS item dependency graph, showing what Web Maps, Dashboards, Feature Services, and other items depend on each other. You can query the graph by item ID or portal search string and receive CSV, Excel, interactive HTML, and GML outputs — making it safe to audit, migrate, and clean up portal content without breaking downstream items.


Quick Start

1. Install

Standard Python / Mac / Linux:

pip install arcgis-item-graph

ArcGIS Pro (Windows) — uses Pro's bundled Python:

"%PROGRAMFILES%\ArcGIS\Pro\bin\Python\Scripts\pip.exe" install arcgis-item-graph

Windows one-click installer (end-user deployment):

Your GIS admin provides a tool folder containing install.bat, install.ps1, launch_query.py, and a pre-configured config/config.yaml. Double-click install.bat to install and launch.

What it does automatically:

  1. Detects conda (Miniconda / Anaconda / ArcGIS Pro)
  2. Downloads and installs Miniconda silently if conda is not found (~75 MB, one-time)
  3. Creates an arcgis-graph conda environment with Python 3.11
  4. Installs the latest ArcGIS API for Python and arcgis-item-graph
  5. Launches the interactive query tool

Subsequent runs are fast — existing environments and packages are reused, and pip upgrades to the latest version automatically.

2. Configure

arcgis-graph setup

The wizard prompts for your portal URL, authentication method (named profile or username/password), and output preferences. Your credentials are never stored in config.yaml — they go to a gitignored .env file.

3. Build the graph (run once)

arcgis-graph create

This crawls your portal, saves a dependency graph (outputs/graph.gml), and builds a SQLite metadata cache (outputs/meta.sqlite) used by the health subcommand and governance risk scoring. For large organizations (5,000+ items) it can take 30–90 minutes.

4. Query

arcgis-graph query --item-id abc123
arcgis-graph query --search "owner:jsmith type:Dashboard"

Prerequisites

  • Python 3.9 or later
  • ArcGIS API for Python 2.4.0 or later (arcgis>=2.4.0)

Setup

See Quick Start above for installation and configuration.

For development setup, see For Contributors below.


Configuration

config/config.yaml controls authentication and all run-time settings. Two auth options are available:

Option 1 — Named ArcGIS profile (recommended for GIS admins)

Set the auth.profile key to the name of a saved ArcGIS credential profile:

auth:
  profile: "my_portal_profile"   # created via arcgis.gis.GIS(profile=...)
  verify_cert: true

Run python -c "from arcgis.gis import GIS; GIS(profile='my_portal_profile')" to verify the profile name is correct.

Option 2 — Environment variables

Leave auth.profile blank and create a .env file in the project root:

ARCGIS_URL=https://your-portal/portal
ARCGIS_USER=your_username
ARCGIS_PASSWORD=your_password

The CLI loads .env automatically when a profile is not set.

Other settings

Key Default Description
paths.output_dir outputs/ Where all output files are written
paths.gml_file outputs/graph.gml Persistent graph file
create.max_items 10000 Upper limit on items indexed
update.max_retries 5 Retries on transient API errors
query.output_formats excel, html, gml Default outputs for each query (excel, csv, html, gml)
query.traversal_direction upstream Controls which graph edges are followed: upstream — items that reference X (what breaks if X is removed); downstream — items X depends on; both — union of both without cross-contamination

Usage

All commands are run via the unified CLI entry point:

python -m cli [--config /path/to/config.yaml] {create,update,query} [options]

Build the graph (run once)

Crawls the entire portal and saves a GML snapshot. For large organizations (5,000+ items) this can take 30–90 minutes.

python -m cli create

Keep the graph current (run on a schedule)

Finds items modified since the last run and merges changes into the existing GML. Designed for a daily cron job.

python -m cli update

Query the graph

# Query by item ID
python -m cli query --item-id abc123

# Query by portal search string
python -m cli query --search "owner:jsmith type:Dashboard"

# Request specific output formats for a single run
python -m cli query --item-id abc123 --format excel
python -m cli query --item-id abc123 --format csv --format html

# Use a different config file
python -m cli --config /path/to/other/config.yaml query --item-id abc123

Interactive dashboard (live server mode)

Add --serve to any query command to start a local HTTP server and open the dashboard in your browser automatically:

arcgis-graph query --item-id abc123 --serve
arcgis-graph query --search "owner:jsmith" --serve

# Use a different port if 8765 is taken
arcgis-graph query --item-id abc123 --serve --port 9000

The server runs at http://localhost:8765/ by default. It exposes:

  • GET / — the interactive HTML dashboard
  • GET /query?id=<item_id> — live re-query from inside the dashboard (click any node)
  • GET /export/excel?ids=<id1>,<id2> — download an Excel report for selected items

Press Ctrl+C in the terminal to stop the server.

Note: Opening the saved .html file directly (file://...) will not work for node re-queries or Excel exports because those features require the live server. Always use --serve for the full interactive experience.

Run python -m cli --help or python -m cli <command> --help for the full list of options and overrides.

Triage (migration planning)

Identify the highest-traffic consumer items in your portal and classify the services they depend on — prioritized by view count and dependency breadth. Designed for migration planning and portal housekeeping.

arcgis-graph triage                    # rank top 50 items (config default)
arcgis-graph triage --top-n 20         # rank top 20
arcgis-graph triage --min-dependents 2 # only items with 2+ service dependencies
arcgis-graph triage --deep             # Tier 3 layer introspection (slower, more accurate)
arcgis-graph triage --no-usage-stats   # rank by dependency count only (skip portal API)
arcgis-graph triage --force-refresh    # bypass the triage_cache_hours window and re-run

Outputs to outputs/reports/triage/<timestamp>/:

File Contents
triage_report.xlsx 5-sheet workbook (see below)
triage_manifest.json Machine-readable version of all triage data

Excel workbook sheets:

Sheet Description
High Traffic Items Ranked consumer items (Web Maps, Dashboards, Apps) by composite score (view count + dependency breadth)
Service Inventory All map/feature services those items consume, with data_source_type (egdb / hosted / fgdb / external) and combined_view_impact
Dependency Matrix One row per item × service pair — shows which item uses which service
Migration Hotspots Services referenced by 2+ items (configurable), sorted by combined view impact — highest-risk services to touch during a migration
Consumer Chain Items in the graph that depend on each ranked item — useful for understanding blast radius before deprecating or migrating a service

Note: data_source_type classification uses URL pattern matching (Tier 1), service JSON inspection (Tier 2), and optionally layer-level inspection (Tier 3 with --deep). Enterprise ArcGIS Server services backed by an Enterprise Geodatabase are classified as egdb; hosted services as hosted_relational; file-based data as fgdb.

Health check (broken references and orphan candidates)

After running arcgis-graph create, a SQLite metadata cache (outputs/meta.sqlite) is built alongside the GML graph. Use the health subcommand to query it for org-wide quality issues:

arcgis-graph health

Prints a summary of broken node references (items in the graph that no longer exist in the portal) and orphan candidates (items with no dependents and low recent activity). Writes a health_report_<timestamp>.xlsx workbook to outputs/ with two sheets:

Sheet Contents
Broken References Node IDs whose items could not be resolved, sorted by governance risk score (🔴 red → 🟡 yellow → 🟢 green)
Orphan Candidates Items with zero dependents and below the inactivity threshold (configurable via cache.orphan_inactive_days)

Note: health requires the metadata cache. Run arcgis-graph create (or update) first to build outputs/meta.sqlite.

Remap item references

When an item is replaced (e.g., a service migrated from one URL to another), use remap to update all dependent items that reference the old item:

# Preview what would change (dry run)
arcgis-graph remap --from-id <old-item-id> --to-id <new-item-id> --preview

# Execute the remap
arcgis-graph remap --from-id <old-item-id> --to-id <new-item-id>

# Remap all broken nodes in the health cache (bulk repair workflow)
arcgis-graph remap --from-health-report

The --from-health-report flag reads broken node IDs directly from the metadata cache and walks you through a remap for each one. A JSON manifest is written to outputs/ recording every item updated, the old and new references, and success/failure status.


Shared Deployment (Team Use)

For team environments, point paths.gml_file and paths.output_dir at a UNC share so all users read from the same graph without running create individually.

1. Admin: initial setup

# On the admin machine, configure config.yaml to point at the share:
#   paths.gml_file: "\\\\server\\share\\arcgis-graph\\graph.gml"
#   paths.output_dir: "\\\\server\\share\\arcgis-graph\\outputs"

arcgis-graph create   # one-time full crawl (~30-60 min for large orgs)

2. Automation: scheduled updates

Windows Task Scheduler (hourly):

arcgis-graph update --config \\server\share\arcgis-graph\config.yaml --skip-if-fresh

Linux/macOS cron (hourly):

0 * * * *  arcgis-graph update --config /mnt/share/arcgis-graph/config.yaml --skip-if-fresh

--skip-if-fresh prevents double-runs if automation fires while a manual update is in progress.

3. Users: install and run

Option A — Windows installer (no Python required):

Distribute the tool folder (install.bat, install.ps1, launch_query.py, config/) to users. They double-click install.bat. The installer handles everything: conda, packages, and launch.

The tool folder can live on a UNC share — users can run it directly from there:

\\server\share\arcgis-graph\install.bat

Option B — CLI (Python already installed):

Users point their local config.yaml at the share paths and run:

arcgis-graph query --item-id <id>

If the same item was queried within 24 hours, the cached outputs are returned instantly. Use --force-refresh to bypass the cache and re-run the query.

Freshness thresholds (configurable)

cache:
  update_warn_hours: 24    # Warn in query if graph is older than this (24 = daily, the default)
  query_cache_hours: 24    # Reuse cached query outputs within this window

Output files

All output files land in the directory set by paths.output_dir (default: outputs/).

Command Output files
create graph.gml, graph.timestamp
update Updates graph.gml in place
query dependency_report_<timestamp>.csv — tabular summary; dependency_report_<timestamp>.xlsx — 3-sheet Excel workbook (All Items, Dependency Edges, Broken Dependencies); dependency_graph_<timestamp>.html — interactive visualization; query_subgraph_<timestamp>.gml — sub-graph for further analysis

Project structure

arcgis_item_graph/   Core library
  creator.py           Full org crawl → graph.gml
  updater.py           Incremental update since last run
  query.py             Direction-aware graph traversal + live hydration
  reporter.py          DataFrame → CSV, Excel (All Items, Dependency Edges, Broken Deps, External Refs)
  visualizer.py        Jinja2 + Cytoscape.js → interactive HTML dashboard
  cache.py             SQLite metadata cache (outputs/meta.sqlite) — broken nodes, orphan detection, risk scoring
  parsers.py           Custom JSON parsers (View Admin, Dashboard, ExB) that augment graph edges
  risk.py              Governance risk scoring (RiskScore, score_item) — 0-100 score, green/yellow/red tier
  remapper.py          ItemGraphRemapper — remap item references across all dependents
  auditor.py           ItemDependencyAuditor — audit accuracy via dependent_to() API
  triage.py            ItemTriageRunner — rank high-traffic items and classify service dependencies
  utils.py             Shared helpers (URL classification, batch fetch, retry)
cli/                 Unified CLI entry point (python -m cli ...)
config/              config.example.yaml template — copy to config.yaml and fill in credentials
docs/                Documentation and design plans
lib/                 Vendored frontend assets (cytoscape.js, dagre, cytoscape-dagre) for offline HTML
outputs/             Generated output files (gitignored); outputs/meta.sqlite is the metadata cache
tests/               Unit and integration tests (pytest) — 442 tests

The CLI uses Rich for terminal output. Progress bars, error panels, and completion summaries all go through arcgis_item_graph/console.py — the single file in the project that imports Rich. Library modules (creator, updater, triage, etc.) remain UI-free and communicate with the CLI via on_progress/on_warning callback kwargs.


For Contributors

1. Clone the repository

git clone https://github.com/your-org/ArcGIS-Item-Dependency-Management.git
cd ArcGIS-Item-Dependency-Management

2. Install in editable mode with dev dependencies

pip install -e ".[dev]"

3. Activate the commit-message hook

git config core.hooksPath .githooks

4. Create your configuration file

cp config/config.example.yaml config/config.yaml
# or just run: arcgis-graph setup

Running tests

pytest tests/ -v

Performance & Architecture Notes

Graph Traversal

The query BFS uses collections.deque for O(1) popleft (O(V+E) total). Seed items not found in the cached GML file are fetched live in parallel via ThreadPoolExecutor (default 10 workers, configurable via fetch_workers on ItemGraphQuery).

Traversal direction is controlled by query.traversal_direction in config:

  • upstream (default) — follows contained_by() edges: finds items that reference the queried item. Answers "what breaks if X is removed?" — the correct mode for migration impact analysis.
  • downstream — follows contains() edges: finds items the queried item depends on. Answers "what does X need to function?"
  • both — runs separate upstream and downstream passes. No cross-directional contamination (forward deps of upstream-reached nodes are not included).

Update Hydration

ItemGraphUpdater hydrates all cached graph nodes concurrently using ThreadPoolExecutor (default 10 workers, configurable via hydration_workers). Graph mutations (node removal) happen serially on the main thread after all fetches complete. The modified-items search enforces a max_items cap (defaults to create.max_items from config) and warns when results may be truncated.

Timestamps

All timestamps are stored in milliseconds with sub-second precision (int(t.timestamp() * 1000)).

Excel Reports

ItemGraphReporter.to_excel() builds all four sheets from a single pass through to_dataframe()node.contains() is called once per node.

The dep_status column in the Dependency Edges sheet reflects the state of each dependency reference:

dep_status value Meaning
healthy Dependency resolved to a live portal item
broken Portal item ID could not be hydrated (deleted, permission error, or malformed ID)
not_in_result Dep ID is referenced in an edge but was outside the traversal scope
live_no_item URL-type dep is reachable but is not a registered portal item
inaccessible URL-type dep returned an auth or permission error
dead URL-type dep returned a 404 or connection failure
unchecked URL-type dep is present in the graph but has not yet been probed

URL-type dep nodes show the richer statuses (live_no_item, inaccessible, dead, unchecked) only when the metadata cache is available (i.e., after running arcgis-graph create and then arcgis-graph query --service-check). Without the cache they fall back to broken.

The Broken Dependencies sheet includes rows where dep_status is broken, inaccessible, or dead.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arcgis_item_graph-0.3.2.tar.gz (440.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arcgis_item_graph-0.3.2-py3-none-any.whl (388.1 kB view details)

Uploaded Python 3

File details

Details for the file arcgis_item_graph-0.3.2.tar.gz.

File metadata

  • Download URL: arcgis_item_graph-0.3.2.tar.gz
  • Upload date:
  • Size: 440.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arcgis_item_graph-0.3.2.tar.gz
Algorithm Hash digest
SHA256 50dc54ad0f3d539cc08aed5ae35c2063aff07d13a3ee1b553f6a958d147bd4a0
MD5 eee65b70dd418391369acc9c5d1b5c76
BLAKE2b-256 4563bea5f94b51560afffd0e99a1b8b1df590350f9e646b80a982bf5c7215a40

See more details on using hashes here.

File details

Details for the file arcgis_item_graph-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for arcgis_item_graph-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a08d15d2131e4072364d0ccef5c5f2673dd5466de2852260c8c88256be34d943
MD5 a4e0a0b7206e03d29d60ba1ca5723fda
BLAKE2b-256 6b1ffbba9a74f97abbb5ed7e5d5cfcdd362959bf27c48310eae126857efd7840

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page