Skip to main content

A comprehensive Python package for unified performance profiling, visualization, and optimization

Project description

PyPerfOptimizer

Make Python faster. Automatically.

PyPerfOptimizer detects performance anti-patterns in your Python code and transforms them into faster equivalents — with verified, reproducible speedups.

pip install pyperfoptimizer
pyperfoptimizer fix myapp.py --verify

Verified Results

All benchmarks run 3× for stability. Python 3.11, Ubuntu 24.04. Reproduce with python benchmarks/run_benchmarks.py.

Optimization Speedup (mean of 3 runs) Variance
Regex precompile (re.match → compiled) 2.04× ±0.10
Set membership (in [list]in {set}) 4.22× ±0.15
Combined (regex + set on realistic function) 1.69× ±0.02
Auto-memoize (recursive → @lru_cache) 9,674× stable

Real-World Validation

I scanned 3 major open-source projects to verify these patterns exist in production code:

Project     Files Scanned    Issues Found    Top Pattern
──────────────────────────────────────────────────────────────
Django      47               36              regex_precompile (7)
FastAPI     48               35              auto_memoize (12)
Flask       24               15              defaultdict_opportunity (5)
──────────────────────────────────────────────────────────────
Total       119              86 optimizations

These are well-maintained projects by experienced developers. If they have these issues, most codebases do.

How It Works

# Scan — find anti-patterns, report expected speedups
pyperfoptimizer scan myapp.py

# Fix — apply safe transformations
pyperfoptimizer fix myapp.py

# Fix with proof — benchmark before/after, reject if not faster
pyperfoptimizer fix --verify myapp.py

# Focus on hot paths only
pyperfoptimizer fix myapp.py --profile profile.speedscope

Example

Input:

import re

def process_users(users):
    results = []
    for user in users:
        if user["role"] in ["admin", "editor", "moderator", "reviewer", "manager"]:
            name = user["first"] + " " + user["last"]
            if re.match(r"^[A-Z]", name):
                results.append({"name": name, "role": user["role"]})
    return results

Output (fully automated):

import re

_RE_0 = re.compile(r"^[A-Z]")

def process_users(users):
    results = []
    for user in users:
        if user["role"] in {"admin", "editor", "moderator", "reviewer", "manager"}:
            name = user["first"] + " " + user["last"]
            if _RE_0.match(name):
                results.append({"name": name, "role": user["role"]})
    return results

Measured speedup: 1.69× (2000 users, 300 iterations, mean of 3 runs)

Why These Optimizations Work

Regex Precompilation (2×)

re.match(pattern, string) recompiles the pattern on every call. CPython caches the last few patterns, but in loops with multiple patterns or high call frequency, recompilation dominates. Precompiling once eliminates this entirely.

Set Membership (4×)

x in [1, 2, 3] creates a new list and does O(n) linear scan every time. x in {1, 2, 3} uses a frozen set with O(1) hash lookup. The gap grows with collection size — 4× at 10 items, 42× at 100 items.

Memoization (9,674×)

Recursive functions like fib(n) have O(2ⁿ) call trees. @lru_cache stores results, reducing to O(n) unique computations. This is the single highest-impact optimization possible for recursive pure functions.

All 17 Patterns

Auto-Fix Patterns (applied automatically)

Pattern What it does Speedup
regex_precompile re.match(str, x) → precompiled at module level
membership_test_set x in [literals]x in {literals}
auto_memoize Pure recursive functions → @lru_cache 9,674×
append_to_comprehension Append-in-loop → list comprehension 1.4×
string_concat_to_join s += x in loop → ''.join() 1.2×
dict_get try: d[k] except KeyErrord.get(k, default)
multiple_isinstance Chained isinstance() → tuple form 1.4×
generator_instead_of_list sum([x for x])sum(x for x) 1.1×
unnecessary_list for x in list(gen)for x in gen 1.3×
unnecessary_copy list([1,2,3])[1,2,3] 1.5×
chained_comparison x > 0 and x < 100 < x < 10 1.1×
loop_invariant_hoist Hoist list.append lookup out of loop 1.1×

Detection-Only Patterns (reported, not auto-fixed)

Pattern What it detects Why not auto-fix
defaultdict_opportunity if k not in d: d[k] = [] Requires import + type change
repeated_attr_in_loop self.config.x accessed 5× in loop Too many edge cases
exception_control_flow try/except in loop for type conversion Intent-dependent
loop_to_any_all for+if+return Trueany() No speedup (generator overhead)
dataframe_vectorize df.iterrows() in loop Complex transform

Honest Finding: any()/all() Is NOT Faster

My benchmarks revealed that any(x < 0 for x in items) is slower than a manual for loop in CPython due to generator creation overhead. I mark this as readability-only, not a performance improvement. This contradicts common advice — I reported what I measure, not what's assumed.

Profile-Guided Optimization

Don't optimize cold code. Feed profiling data to focus on what matters:

py-spy record -o profile.speedscope -- python myapp.py
pyperfoptimizer fix myapp.py --profile profile.speedscope

Supports: py-spy (speedscope JSON), cProfile (pstats), Scalene (JSON).

How to Verify Our Claims

Every claim in this README is reproducible:

git clone https://github.com/AnnasMazhar/PyPerfOptimizer
cd PyPerfOptimizer
pip install -e .
python benchmarks/run_benchmarks.py        # Reproduce all speedup numbers
python benchmarks/bench_regex.py           # Regex-specific benchmarks
python -c "
from pyperfoptimizer.autofix import scan_file
import glob
files = glob.glob('/path/to/your/project/**/*.py', recursive=True)
for f in files:
    opts = scan_file(f)
    if opts:
        print(f'{f}: {len(opts)} optimizations')
"

What This Tool Is Good At

  • Catching uncompiled regex in functions (the #1 hidden performance killer)
  • Converting list membership to set (scales from 4× to 42×)
  • Finding memoization candidates in recursive functions
  • Providing verified speedups — every auto-fix is benchmarkable

What This Tool Is Not

  • Not a profiler (use py-spy or Scalene for that, then feed output here)
  • Not an algorithmic optimizer (won't change your O(n²) sort to O(n log n))
  • Not an LLM (deterministic AST transforms — same input always gives same output)
  • Not a replacement for understanding your code (it catches patterns, not design issues)

Installation

pip install pyperfoptimizer

Python 3.9+. Core dependency: libcst.

Contributing

See CONTRIBUTING.md. Run tests: python -m pytest tests/ -v (123 tests).

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyperfoptimizer-0.2.1.tar.gz (69.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyperfoptimizer-0.2.1-py3-none-any.whl (93.4 kB view details)

Uploaded Python 3

File details

Details for the file pyperfoptimizer-0.2.1.tar.gz.

File metadata

  • Download URL: pyperfoptimizer-0.2.1.tar.gz
  • Upload date:
  • Size: 69.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyperfoptimizer-0.2.1.tar.gz
Algorithm Hash digest
SHA256 bbea9f26a4fc076ab45764bc1611a9890f4ec732930eee9f1b744194396c0fb5
MD5 a589fbe1a35d65b1ea46f2088abc5dcd
BLAKE2b-256 461e2cac49a2500115a093a3a3ae12fc6263afd8aba5aed73b6d0a1e01fe76df

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyperfoptimizer-0.2.1.tar.gz:

Publisher: ci.yml on AnnasMazhar/PyPerfOptimizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyperfoptimizer-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: pyperfoptimizer-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 93.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyperfoptimizer-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b17555f766dcf8482e4a76b54fc7c395c5ebf4f1644b2e6beb5abc4841c06aef
MD5 a73b5ff85fca6ed5c58fbdf6ebc97dff
BLAKE2b-256 791f4130ee3e6dca2dfd1e833083b0e349fda9cf3461f9a4c699b9bf21fcee76

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyperfoptimizer-0.2.1-py3-none-any.whl:

Publisher: ci.yml on AnnasMazhar/PyPerfOptimizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page