Skip to main content

Static analysis tool for tracing Python API calls to their origin library

Project description

PCResolve — Python Project-Level Third-Party Library Usage Provenance Analyzer

PCResolve is an explainable static analysis engine that answers: which third-party libraries does a Python project use, and through which imports, symbols, call chains, return values, and container propagation paths?

It is designed for CI pipelines, IDE integration, audit workflows, and large-scale codebase scanning. Every classification comes with a traceable chain, a reason, a confidence score, and alternatives when the analysis cannot decide between multiple candidates.

Zero runtime third-party dependencies. Python 3.9+.

PyPI License: MIT

Why PCResolve

import numpy as np is just the entry point. The real question is: how does np become df, how does df flow into function parameters, how does a method return a pandas object through three layers of local wrappers, and how does a container element carry library provenance through iteration?

PCResolve tracks two kinds of provenance:

Object Question
API call provenance Which top-level library does this call expression belong to?
Symbol provenance Where did this variable / return value / attribute / container element come from?

What It Produces

Output Description
all_api_calls Every call expression with top_library, reason, confidence, alternatives, decorated_by, trace chain, and source location.
all_symbol_provenance Per-symbol origin records: import aliases, variable assignments, function returns, parameters, attributes, container items, decorator evidence.
library_usage Aggregated per-library index with call/symbol counts, file lists, reason distributions, and confidence ranges.
diagnostics Structured parse errors, encoding failures, and trace warnings (non-fatal by default).

Quick Start

Installation

pip install pcresolve        # >= 1.0.4

CLI

pcresolve /path/to/project                         # human summary (v2 default)
pcresolve /path/to/project --json                  # full provenance JSON
pcresolve /path/to/project --json-summary          # compact CI summary
pcresolve /path/to/project --explain-library numpy
pcresolve /path/to/project --explain-call "np.array"
pcresolve /path/to/project --scope-model v1        # legacy mode

Library API

from pcresolve import analyze_project

result = analyze_project("/path/to/project")
for call in result.all_api_calls:
    print(f"{call.expression} -> {call.top_library}")
    print(f"  reason={call.reason} confidence={call.confidence}")
    print(f"  alternatives={call.alternatives}")

Output Profiles

Flag Profile Use Case
(default) Human summary Terminal browsing
--json Full provenance Machine consumption, debugging
--json-summary Compact aggregate CI pipelines, dashboards
--debug-dump Legacy full text Debugging

1.0.4 Stable Contract

PCResolve 1.0.4 is the first stable provenance contract release. JSON outputs before 1.0.4 are experimental and not guaranteed compatible.

JSON output excerpt (--json)

Abbreviated for readability; see docs/output-contract.md for the complete stable field list.

{
  "schema_version": "1.0",
  "profile": "full",
  "project_root": ".",
  "stats": {
    "total_modules": 4,
    "parsed_modules": 4,
    "skipped_modules": 0,
    "scope_model": "v2",
    "api_call_count": 18,
    "library_count": 3
  },
  "diagnostics": [],
  "files": [{
    "file_path": "main.py",
    "module_name": "main",
    "api_calls": [{
      "expression": "np.array([1,2,3])",
      "func_name": "np.array",
      "parameters": "[1,2,3]",
      "top_library": "numpy",
      "reason": "DIRECT_IMPORT",
      "confidence": 1.0,
      "alternatives": ["numpy"],
      "decorated_by": [],
      "file_path": "main.py",
      "lineno": 4,
      "col_offset": 0,
      "end_lineno": 4,
      "end_col_offset": 18,
      "chain": ["np.array([1,2,3])", "np", "numpy"],
      "resolved_func": "numpy.array",
      "resolved_chain": ["np.array", "numpy.array", "numpy"]
    }],
    "symbol_provenance": [{
      "symbol": "np",
      "kind": "import",
      "top_library": "numpy",
      "reason": "DIRECT_IMPORT",
      "confidence": 1.0
    }]
  }],
  "all_api_calls": [
    {
      "expression": "np.array([1,2,3])",
      "top_library": "numpy",
      "reason": "DIRECT_IMPORT",
      "confidence": 1.0,
      "alternatives": ["numpy"],
      "decorated_by": [],
      "file_path": "main.py",
      "lineno": 4
    }
  ],
  "all_symbol_provenance": [
    {
      "symbol": "np",
      "kind": "import",
      "top_library": "numpy",
      "reason": "DIRECT_IMPORT",
      "confidence": 1.0
    }
  ],
  "library_usage": {
    "numpy": {
      "library": "numpy",
      "api_call_count": 6,
      "symbol_count": 3,
      "files": ["main.py", "utils.py"],
      "imports": ["np"],
      "reason_counts": { "DIRECT_IMPORT": 6, "RETURN_PROPAGATION": 3 },
      "kind_counts": { "import": 2, "variable": 4 },
      "has_evidence": true,
      "min_confidence": 0.9,
      "max_confidence": 1.0
    }
  }
}

Contract highlights

  • Default scope model: v2 (lexical scopes). --scope-model v1 available.
  • Path normalization: all paths are relative POSIX (/). External paths use <external>/....
  • alternatives: individual third-party library names. No merged display labels.
  • decorated_by: list of library names that decorate the call target. Filtered: no local/python/unknown.
  • reason and confidence: every call has both. See docs/output-contract.md for the full table.

Breaking changes from pre-1.0.4

  • Default scope_model changed from v1 to v2.
  • --json changed from legacy dataclass dump to full provenance schema.
  • --json-stable deprecated and hidden.
  • Pre-1.0.4 JSON is experimental; no backward compatibility.

Supported Analysis Patterns

Imports

Direct imports, from/as aliases, wildcard imports, cross-file re-exports, transitive imports through local modules.

Data Propagation

Variable assignment, function return values, parameter binding at call sites, constructor argument → self.attr → method call propagation, container item access (dict subscript, list index).

Containers

Dict / list / tuple / set literal tracking, container iteration (for x in items), merged candidates for ambiguous iteration (reported as alternatives), container-mutating method arg-source tracking (append, extend).

Cross-File

Symbol tracing across module boundaries, imported local functions, class method resolution through constructor call sites, wildcard import resolution with candidate merging.

Decorators

Decorator expressions counted as API calls, decorated targets keep local primary identity, decorated_by evidence recorded independently, stacked decorators all preserved.

Reporting

Per-library explain views (--explain-library), per-symbol provenance traces (--explain-symbol), per-call query matching (--explain-call), library usage aggregation with reason counts and confidence distribution.

Architecture

scanner.py → module_mapper.py → single_file.py → cross_file.py → cli.py
                                      ↑               ↑
                                symbol_table.py   views.py
                                scope.py          classification.py
                                sources.py        source_resolution.py
Layer Module Role
Scan scanner.py Discover .py/.pyi files, filter venvs
Map module_mapper.py File path ↔ module name
Parse single_file.py AST visitor, per-file symbol table, scope model
Trace cross_file.py Cross-module symbol resolution, call classification
Classify classification.py Priority-ordered classification pipeline
Index library_usage.py Per-library usage aggregation
Output cli.py, views.py Human, JSON full, JSON summary, explain views

Validation

PCResolve is continuously validated against a multi-project regression gate:

Unit tests:        557 passed, 0 failed
Hard baselines:    21 projects, 0 exceeded
Full audit:        42 real-world projects, 0 crashes, 0 illegal keys
v1/v2 differential regressions: 303 (all classified by taxonomy)

Key invariants enforced on every change:

  • illegal_keys == 0 — no dataclass repr or structured source display leaks into top_library or library_usage keys.
  • All 21 hard baselines must not exceed their recorded regression counts.
  • Golden JSON output tests lock the 1.0.4 contract.
  • Diff taxonomy breaks regressions into third_party_api_loss, local_to_unknown, and precision changes.

Known Limitations

PCResolve is a static analysis tool. It is conservative by design: when it cannot uniquely determine a library, it reports alternatives rather than guessing.

Limitation Behavior
Multi-third-party returns Conservative primary (local/unknown), complete alternatives in alternatives and library_usage.
Dynamic import_module(name) Only resolves string-literal arguments. Variables produce unresolved results.
@classmethod / @staticmethod Method resolution is deferred. Treated conservatively.
Descriptors / properties Not tracked. Attribute access on descriptor results may be unresolved.
Runtime reflection / monkey-patching Not modeled. Conservative fallback.
Multi-threading / async control flow Not modeled. All reachable branches contribute to SourceSet.
Third-party library internals Not analyzed. Only same-project local functions and classes are traced.

These limitations mean that top_library can be conservative (local or unknown) while alternatives still contains the third-party candidates. Downstream tools should check both.

Consuming the Output

Downstream tools should treat top_library as the primary classification and alternatives / decorated_by as additional evidence. A call whose top_library is local or unknown may still reference a third-party library through alternatives (e.g. multi-source returns or container iteration) or decorator evidence.

For library-level aggregation, use library_usage which already incorporates alternatives and decorated_by evidence.

Development

pip install -e .

# Run tests
python -m pytest tests/ -v

# Baseline gate
python scripts/diff_v1_v2.py tests/fixtures/tested_projects/

# Full audit
python scripts/audit_tested_projects.py

See docs/architecture.md for the pipeline design, docs/output-contract.md for the stable JSON schema, and docs/source-semantics.md for Source IR and convergence rules.

License

MIT. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pcresolve-1.0.4.tar.gz (102.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pcresolve-1.0.4-py3-none-any.whl (62.9 kB view details)

Uploaded Python 3

File details

Details for the file pcresolve-1.0.4.tar.gz.

File metadata

  • Download URL: pcresolve-1.0.4.tar.gz
  • Upload date:
  • Size: 102.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for pcresolve-1.0.4.tar.gz
Algorithm Hash digest
SHA256 37fc5d3bd8f51fae01a166e263a2da4c8773cc4bf4090f5671f8884197e80f5b
MD5 36e5e24bac15dbe81d2046ffdb45232c
BLAKE2b-256 f66481e4f9c7c98978e08550a523a8a846887a2b459a6af1f6669fc23566cdde

See more details on using hashes here.

File details

Details for the file pcresolve-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: pcresolve-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 62.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for pcresolve-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 be8f7813c132d9814dee4980428ce5946db5977da662e59c18c4b79abad80515
MD5 ecf20bf21337e52373091cf48feb86ec
BLAKE2b-256 0a9d74666d503737af1853c9a545e52fead497791ea1954571bd6bce48c3040a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page