Skip to main content

One CLI. Zero code changes. Full system flow visibility for any Python service or distributed system.

Project description

pylow

One CLI. Zero code changes. Full system flow visibility for any Python service or distributed system.

Installation & Setup

You can install pylow globally via pip:

pip install pylow

[!IMPORTANT] If you get Command 'pylow' not found after installation, make sure Python's user bin directory is in your PATH. Run the following commands to add it to your profile:

echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

How it works under the hood

Three layers working together:

  • Layer 1 → OTel auto-instrumentation (HTTP, DB, gRPC, queues — zero code changes)
  • Layer 2 → bpftrace USDT (Python function call tree, syscalls)
  • Layer 3 → pylow CLI (stitches both, renders the flow)

Complete CLI Command Reference & Outputs

Here is the usage documentation and sample outputs for every command in the pylow tool:

1. pylow attach <pid>

Attach to any running Python process and start collecting trace logs immediately.

pylow attach 4821

Output:

✓ Attached to process 4821. Monitoring execution events...

2. pylow flow

Renders the complete execution flow tree (local/distributed spans & events) from the trace repository.

pylow flow --last

Output:

handle_request 450ms
├── authenticate_user 20ms
│   └── [redis GET session_id] 15ms
└── call_llm_chain 410ms
    └── [POST api.openai.com/v1/chat/completions] 400ms
        └── waiting (epoll_wait) 390ms   ← bottleneck

3. pylow stitch

Stitch distributed traces together across service boundaries using traceparent headers.

pylow stitch --services api,worker,ml-service

Output:

REQUEST trace-id: t_demo_flow_123

  api-gateway          450ms  handle_request
  └── api-gateway          20ms  authenticate_user
      └── api-gateway          15ms  redis GET session_id
  └── api-gateway          410ms  call_llm_chain
      └── api-gateway          400ms  POST api.openai.com/v1/chat/completions
          └── api-gateway          390ms  waiting (epoll_wait)

4. pylow slow

Continuously daemonize/monitor and surface slow execution paths exceeding a latency threshold.

pylow slow --threshold 200ms --watch

Output:

Continuous monitoring daemon started. Threshold: 200ms, Watch: False
SLOW PATHS detected (last 5 min):

  #1  handle_request → call_llm_chain → POST api.openai.com/v1/chat/completions → [waiting (epoll_wait)]
      avg: 390ms  occurrences: 1
      root cause: epoll_wait 310ms — network latency to openai

5. pylow diff

Compare execution flow metrics between versions or releases to detect regressions.

pylow diff --before deploy-v1.2 --after deploy-v1.3

Output:

Comparing before v1.2 vs after v1.3...

REGRESSIONS:

  handle_request     +150ms avg  (was 200ms, now 350ms)
  call_llm           +140ms avg  (was 180ms, now 320ms)
  serialize          +12ms avg  (was 3ms, now 15ms)

NEW CALLS in v1.3:
  validate_schema    8ms  (added input validation)

REMOVED in v1.3:
  legacy_cache_check (removed)

6. pylow syscall <pid>

Trace syscall counts and histogram latency patterns for the target process.

pylow syscall 4821

Output:

Attaching syscall counter to PID 4821...
✓ Attached. Monitoring syscall events... Ctrl+C to stop.

--- BPF Map: @sys_counts ---
  sys_enter_read: 231
  sys_enter_write: 184
  sys_enter_epoll_wait: 42

7. pylow malloc <pid>

Profile allocations and heap sizing metrics.

pylow malloc 4821

Output:

Attaching allocator profile to PID 4821...
✓ Attached. Monitoring memory allocations... Ctrl+C to stop.

--- BPF Map: @alloc_sizes (bytes allocated) ---
[64, 127]              45 |@@@@@@@@@@@                         |
[512, 1023]           120 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4096, 8191]           18 |@@@@                                |

8. pylow tcp <pid>

Trace outbound TCP latency.

pylow tcp 4821

Output:

Attaching TCP latency tracer to PID 4821...
✓ Attached. Monitoring TCP sendmsg... Ctrl+C to stop.

--- BPF Map: @tcp_send_us (ns delay) ---
[100000, 200000]       21 |@@@@@@@@@@                          |

9. pylow io <pid>

Trace Block and File I/O read/write latencies.

pylow io 4821

Output:

Attaching File I/O latency tracer to PID 4821...
✓ Attached. Collecting block I/O events... Ctrl+C to stop.

--- BPF Map: @read_lat (ns) ---
[4096, 8191]          150 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16384, 32767]        23 |@@@@@                               |

10. pylow flame <pid>

Generate sampling-based user/kernel stack flame graphs.

pylow flame 4821 --duration 5

Output:

Attaching kernel profile sampler to PID 4821 for 5s...
✓ Attached. Sampling for 5s...
✓ Saved flame graph to flamegraph.svg

11. pylow sched <pid>

Monitor runqueue latency and scheduling delays.

pylow sched 4821

Output:

Attaching scheduler delay tracer to PID 4821...
✓ Attached. Collecting scheduler runqueue events... Ctrl+C to stop.

--- BPF Map: @runq_latency_us (us delay) ---
[1, 2]                 98 |@@@@@@@@@@@@@@@@@@@@@@@@            |
[4, 8]                142 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

12. pylow pycall <pid>

Profile Python PyObject_Call execution timings.

pylow pycall 4821

Output:

Attaching Python function call timer to PID 4821...
✓ Attached. Collecting PyObject_Call events... Ctrl+C to stop.

--- BPF Map: @latency (time spent per function in us) ---
SLOW execute_query took 12400us
SLOW process_job took 1850us

13. pylow pyframe <pid>

Log Python execution contexts at the frame level (file, function, line).

pylow pyframe 4821

Output:

Attaching Python frame USDT tracer to PID 4821...
✓ Attached. Collecting USDT frame events... Ctrl+C to stop.

ENTER execute_query() @ db/models.py:142
slow_query() @ db/models.py:142 — p99: 340ms

14. pylow pycpu <pid>

Identify CPU hotspots in Python runtime execution stacks.

pylow pycpu 4821

Output:

Attaching CPU hotspot sampler to PID 4821...
✓ Attached. Sampling CPU stacks... Ctrl+C to stop.

--- BPF Map: @stacks ---
  [0x7f3b821034bc, 0x7f3b821035dc]: 145
  -> Resolved: call_llm_chain @ gateway/orchestrator.py:120

15. pylow pyexcept <pid>

Trace raised and caught exceptions within Python virtual machine.

pylow pyexcept 4821

Output:

Attaching Python exception tracer to PID 4821...
✓ Attached. Monitoring exceptions... Ctrl+C to stop.

EXCEPTION KeyError @ tid=10234
  [ustack]:
    get_user_context @ db/client.py:48

16. pylow pyiowait <pid>

Trace Python code blocked waiting on blocking I/O calls.

pylow pyiowait 4821

Output:

Attaching I/O Wait blocking call tracer to PID 4821...
✓ Attached. Monitoring blocking sys_read calls... Ctrl+C to stop.

BLOCKING READ 12ms
  [ustack]:
    fetch_metadata @ db/client.py:54

17. pylow pygil <pid>

Profile GIL lock acquisition delays and thread contention.

pylow pygil 4821

Output:

Attaching GIL lock contention tracer to PID 4821...
✓ Attached. Monitoring GIL wait states... Ctrl+C to stop.

GIL WAIT 1250us tid=10234 stack:
  [ustack]:
    calculate_features @ ml/engine.py:89

18. pylow pyleak <pid>

Profile heap allocations to detect memory leak patterns.

pylow pyleak 4821

Output:

Attaching memory leak tracer to PID 4821...
✓ Attached. Collecting memory allocation metrics... Ctrl+C to stop.

=== TOP ALLOCATORS ===
@allocs[0x7f3b821034bc]: 10485760 bytes
  -> allocating callsite: load_dataset @ ml/data.py:12

19. pylow pyreq <pid>

Measure end-to-end request lifecycle breakdown.

pylow pyreq 4821

Output:

Attaching Request Lifecycle timer to PID 4821...
✓ Attached. Collecting Request Latency counts... Ctrl+C to stop.

REQ START tid=10234
REQ DONE total=340ms db=310ms other=30ms

20. pylow timeline <pid>

Trace absolute chronological timeline call graph.

pylow timeline 4821 --duration 5.0 --threshold 2.0

Output:

[     0.000ms] → handle_request()  server.py:45
[     0.040ms]   → parse_headers()  http.py:12
[     0.051ms]   ← parse_headers()  [0.011ms]
[     0.055ms]   → execute_query()  db.py:88
[    91.230ms]   ← execute_query()  [91.175ms]  ⚠️ SLOW

21. pylow pythread <pid>

Trace thread-aware function call timelines with self-time.

pylow pythread 4821

Output:

Attaching thread-aware tracer to PID 4821...
✓ Attached. Collecting threaded events... Ctrl+C to stop.

--- Thread ID: 10001 ---
  parse_headers() spent 0.00ms (Self time: 0.00ms)

22. pylow pyasync <pid>

Trace async await coroutine metrics and yields.

pylow pyasync 4821

Output:

Attaching async/coroutine tracer to PID 4821...
✓ Attached. Monitoring coroutine suspends/resumes... Ctrl+C to stop.

--- Coroutine: 0x7f3b821034bc ---
  Suspended counts: 2
  Total CPU Time: 80us

23. pylow pyargs <pid>

Profile Python function call argument types and layout.

pylow pyargs 4821

Output:

Attaching argument Layout layout-tracer to PID 4821...
✓ Attached. Dereferencing Python structs... Ctrl+C to stop.

1000 CALL obj=0x7f3b821034bc args=0x7f3b821051fa

24. pylow pysyscall <pid>

Profile syscalls attributed directly to Python frames.

pylow pysyscall 4821

Output:

Attaching syscall-to-Python attribution tracer to PID 4821...
✓ Attached. Monitoring slow read and futex syscalls... Ctrl+C to stop.

=== SLOW READ fd=4 dur=12ms ===
  [ustack]:
    fetch_metadata @ db/client.py:54

25. pylow pynplus1 <pid>

Detect potential ORM loop-driven N+1 query patterns.

pylow pynplus1 4821

Output:

Attaching N+1 query loop detector to PID 4821...
✓ Attached. Monitoring ORM execute loops... Ctrl+C to stop.

⚠️  N+1 CANDIDATE: db/models.py:142
   Called 15x in 5s (3.0/s)

26. pylow pygraph <pid>

Trace hierarchical call relationships.

pylow pygraph 4821

Output:

handle_request()  calls=1  avg=0.00ms  (server.py:45)
  execute_query()  calls=1  avg=0.00ms  (db.py:88)

27. pylow pyanomaly <pid>

Identify slow function calls using statistical baselines.

pylow pyanomaly 4821

Output:

[BASELINE] execute_query(): mean=10.33ms stddev=0.76ms
🚨 ANOMALY execute_query(): 45.00ms vs baseline 10.33ms

28. pylow pydash <pid>

Stream traces directly to live curses dashboard.

pylow pydash 4821

Output:

Attaching curses dashboard to PID 4821...
=== LIVE FUNCTION TRACER ===
RECENT CALLS:
  handle_request() 120.40ms

29. pylow pysingle <pid> <target_func>

Trace single request / execution call tree with self time.

pylow pysingle 4821 handle_request

Output:

[     0.000ms] → handle_request()  server.py:45
[     0.011ms]   → validate_token()  auth.py:12
[     0.015ms]   ← validate_token()  total=0.004ms  self=0.003ms

30. pylow page-faults <pid>

Trace page fault hotspots to identify cold memory access patterns.

pylow page-faults 4821

Output:

Attaching page faults tracer to PID 4821...
✓ Attached. Collecting page fault events... Ctrl+C to stop.

=== PAGE FAULT HOTSPOTS ===
@faults[
    malloc+0x24
    PyBytes_FromStringAndSize+0x18
    load_dataset @ ml/data.py:12
]: 421

31. pylow context-switches <pid>

Profile preemption context switches and voluntary/involuntary off-CPU delays.

pylow context-switches 4821

Output:

Attaching context switches tracer to PID 4821...
✓ Attached. Collecting context switch events... Ctrl+C to stop.

OFF CPU 87ms next_cpu=2

--- BPF Map: @off_cpu_ms ---
[0, 1]                12 |@@@@                                |
[2, 4]                89 |@@@@@@@@@@@@@@                      |
[64, 128]            210 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

32. pylow kernel-blocked <pid>

Trace the exact kernel blocked code path where a process is sleeping in an uninterruptible wait.

pylow kernel-blocked 4821

Output:

Attaching kernel blocked stack tracer to PID 4821...
✓ Attached. Monitoring blocked states... Ctrl+C to stop.

BLOCKED IN KERNEL:
        __schedule+0x310
        schedule+0x44
        futex_wait_queue_me+0xb8
        futex_wait+0x120
        do_futex+0x340
        __x64_sys_futex+0x140
        [ustack]:
        pthread_cond_wait+0x12
        take_gil+0x42
        execute_query+0x91  db.py:102

33. pylow tlb-shootdowns <pid>

Profile Translation Lookaside Buffer (TLB) flush rates and reasons.

pylow tlb-shootdowns 4821

Output:

Attaching TLB shootdowns tracer to PID 4821...
✓ Attached. Monitoring TLB flushes... Ctrl+C to stop.

--- BPF Map: @tlb_reason ---
[0] (TLB_FLUSH_ON_TASK_SWITCH)              42
[1] (TLB_FLUSH_ON_PAGE_FAULT)              187

34. pylow irq-impact <pid>

Monitor soft and hard IRQ impact vectors to detect when CPU cycles are stolen.

pylow irq-impact 4821

Output:

Attaching Soft/Hard IRQ tracer to PID 4821...
✓ Attached. Collecting IRQ impact events... Ctrl+C to stop.

--- BPF Map: @sirq_type ---
[1] (TIMER_SOFTIRQ)                         84
[3] (NET_RX_SOFTIRQ)                        187

--- BPF Map: @sirq_lat (us) ---
[0, 1]                12 |@@@@                                |
[2, 4]                43 |@@@@@@@@@@@@@@                      |
[8, 16]              187 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

Diagnostic Decision Tree & Multi-Layer Debugging Workflow

Use this decision tree to diagnose performance degradation layer-by-layer:

What are you seeing?
│
├── App is slow
│   │
│   ├── CPU high?
│   │   └── profile:hz:999 + ustack → hottest function
│   │
│   └── CPU normal?
│       └── raw_syscalls timer → which syscall blocking + stack
│
├── Memory growing
│   ├── malloc sum by stack → top allocator
│   └── alloc vs free count → confirm leak
│
├── Random crashes
│   └── raise__exception + ustack → every throw point
│
├── Process hangs
│   ├── sched_switch + ustack → where it sleeps
│   └── PyThread_acquire_lock → deadlock check
│
└── Too much noise from other queries
    └── add /str(arg1) == "your_func"/ filter
        or /str(arg0) == "your_file.py"/
        or /tid == <specific_thread>/

The Mental Model — Layers of Execution

Your Python code
      ↓
CPython interpreter (ceval loop)
      ↓
Standard library / third party (SQLAlchemy, requests, asyncio)
      ↓
Python C extensions (.so files)
      ↓
libc (malloc, free, connect, read)
      ↓
System calls (read, write, futex, mmap, connect)
      ↓
Kernel (TCP stack, VFS, scheduler, memory manager)
      ↓
Hardware (CPU, disk, NIC)

Step 0 — Quick Triaging (Run first)

Run this for 10 seconds to pinpoint the category of the problem:

sudo bpftrace -e '
profile:hz:99 /pid == $1/ { @cpu[ustack(perf,3)] = count(); }
tracepoint:sched:sched_switch /args->prev_pid == $1/ { @offcpu = count(); }
tracepoint:raw_syscalls:sys_enter /pid == $1/ { @syscalls = count(); }
software:page-faults:1 /pid == $1/ { @faults = count(); }
interval:s:10 {
  printf("cpu_samples : %d\n", @cpu);
  printf("off_cpu     : %d\n", @offcpu);
  printf("syscalls    : %d\n", @syscalls);
  printf("page_faults : %d\n", @faults);
  exit();
}' <PID>

Triage Criteria:

  • cpu_samples high + offcpu low → CPU BOUND
  • cpu_samples low + offcpu high → I/O BOUND
  • syscalls very high → SYSCALL STORM
  • page_faults high → MEMORY
  • All counts low → DEADLOCK / STUCK

CPU BOUND — Your Code Is Burning CPU

A. Find exactly which function is hot:

sudo bpftrace -p <PID> -e '
profile:hz:999 {
  @[ustack(perf, 5)] = count();
}
interval:s:10 {
  print(@, 3);   // top 3 stacks only
  exit();
}'

B. Confirm with duration:

sudo bpftrace -p <PID> -e '
usdt:/usr/bin/python3:python:function__entry { @t[tid,str(arg1)] = nsecs; }
usdt:/usr/bin/python3:python:function__return {
  $d = nsecs - @t[tid,str(arg1)];
  if ($d > 10000000) {
    printf("%lldms %s %s:%d\n", $d/1000000, str(arg1), str(arg0), arg2);
  }
  delete(@t[tid,str(arg1)]);
}'

I/O BOUND — Your Code Is Waiting

A. Find which syscall and Python line caused it:

sudo bpftrace -e '
tracepoint:raw_syscalls:sys_enter /pid == $1/ {
  @t[tid,args->id] = nsecs;
}
tracepoint:raw_syscalls:sys_exit /pid == $1/ {
  $d = nsecs - @t[tid,args->id];
  if ($d > 5000000) {
    printf("BLOCKED syscall=%d %lldms\n", args->id, $d/1000000);
    print(ustack(perf, 5));
    exit();   // stop after first hit
  }
  delete(@t[tid,args->id]);
}' <PID>

(Translate syscall ID using ausyscall <ID>)

B. If it's a read — find which file descriptor:

sudo bpftrace -e '
tracepoint:syscalls:sys_enter_read /pid == $1/ {
  @t[tid] = nsecs;
  @fd[tid] = args->fd;
}
tracepoint:syscalls:sys_exit_read /pid == $1/ {
  $d = nsecs - @t[tid];
  if ($d > 5000000) {
    printf("SLOW READ fd=%d %lldms\n", @fd[tid], $d/1000000);
    print(ustack(perf,5));
    exit();
  }
  delete(@t[tid]); delete(@fd[tid]);
}' <PID>

(Translate fd to file: ls -la /proc/<PID>/fd/<FD>)


SYSCALL STORM — Too Many Kernel Transitions

A. Find which syscall is called most:

sudo bpftrace -e '
tracepoint:raw_syscalls:sys_enter /pid == $1/ {
  @[args->id] = count();
}
interval:s:5 {
  print(@, 5);   // top 5 syscalls by count
  exit();
}' <PID>

B. Find which Python code is calling it:

sudo bpftrace -e '
tracepoint:raw_syscalls:sys_enter /pid == $1 && args->id == <ID>/ {
  @[ustack(perf,5)] = count();
}
interval:s:5 {
  print(@, 3);
  exit();
}' <PID>

MEMORY — Growing, Leaking, Slow GC

A. Find what is allocating most:

sudo bpftrace -p <PID> -e '
uprobe:/lib/x86_64-linux-gnu/libc.so.6:malloc {
  @[ustack(perf,5)] = sum(arg0);
}
interval:s:10 {
  print(@, 3);   // top 3 allocating callsites
  exit();
}'

B. Confirm it's a leak (allocations without frees):

sudo bpftrace -p <PID> -e '
uprobe:/lib/x86_64-linux-gnu/libc.so.6:malloc {
  @alloc = sum(arg0);
  @alloc_count = count();
}
uprobe:/lib/x86_64-linux-gnu/libc.so.6:free {
  @free_count = count();
}
interval:s:5 {
  printf("allocated: %lldMB  alloc_calls: %d  free_calls: %d\n",
    @alloc/1048576, @alloc_count, @free_count);
  clear(@alloc); clear(@alloc_count); clear(@free_count);
}'

C. Trace GC pauses:

sudo bpftrace -p <PID> -e '
usdt:/usr/bin/python3:python:gc__start { @t[tid] = nsecs; @gen[tid] = arg0; }
usdt:/usr/bin/python3:python:gc__done {
  printf("GC gen%d %lldms\n", @gen[tid], (nsecs-@t[tid])/1000000);
}'

DEADLOCK / STUCK — Process Is Stuck

A. Find where the process is sleeping:

sudo bpftrace -e '
tracepoint:sched:sched_switch /args->prev_pid == $1/ {
  @stack = ustack(perf, 10);
  @kstack = kstack(perf, 10);
}
interval:s:1 {
  printf("=== WHERE PROCESS SLEEPS ===\n");
  print(@stack);
  print(@kstack);
}' <PID>

B. Confirm deadlock (lock never released):

sudo bpftrace -p <PID> -e '
uprobe:/usr/bin/python3:PyThread_acquire_lock {
  @lock[tid, arg0] = nsecs;
}
uprobe:/usr/bin/python3:PyThread_release_lock {
  delete(@lock[tid, arg0]);
}
interval:s:5 {
  printf("=== LOCKS HELD > 5s ===\n");
  print(@lock);
}'

Noise Elimination — Surgical Filters

Every diagnostic query can be customized using target filters:

# Filter by function name
/str(arg1) == "execute_query"/

# Filter by file name
/str(arg0) == "db.py"/

# Filter by thread ID
/tid == 140234/

# Filter by slow execution threshold (e.g. > 50ms)
/nsecs - @t[tid] > 50000000/

Logical Error Troubleshooting Playbook (Zero Instrumentation)

When there is no instrumentation and you need to debug logical errors, bpftrace lets you observe execution flows, function arguments, outputs, and cross-service communication payloads directly from the kernel.

Phase 1 — You Don't Know Which Service Is the Problem (Map Request Flow)

Run this tracepoint probe script on each service PID to find where data flow stops:

sudo bpftrace -e '
# catch every outbound HTTP call
tracepoint:syscalls:sys_enter_connect /pid == $1/ {
  @conn_start[tid] = nsecs;
  @conn_count = count();
}
tracepoint:syscalls:sys_exit_connect /pid == $1/ {
  if (@conn_start[tid]) {
    $d = nsecs - @conn_start[tid];
    @conn_time = hist($d / 1000000);
    delete(@conn_start[tid]);
  }
}
# catch every inbound request handling
tracepoint:syscalls:sys_enter_accept4 /pid == $1/ {
  @accept_ts = nsecs;
}
tracepoint:syscalls:sys_exit_accept4 /pid == $1/ {
  if (@accept_ts) {
    printf("INBOUND CONNECTION at %lld\n", nsecs);
  }
}
# catch all data sent and received
tracepoint:syscalls:sys_exit_read /pid == $1 && args->ret > 0/ {
  @bytes_in = sum(args->ret);
}
tracepoint:syscalls:sys_exit_write /pid == $1 && args->ret > 0/ {
  @bytes_out = sum(args->ret);
}
interval:s:5 {
  printf("\n=== SERVICE pid=%d ===\n", $1);
  printf("outbound_calls : %d\n", @conn_count);
  printf("bytes_in       : %lld\n", @bytes_in);
  printf("bytes_out      : %lld\n", @bytes_out);
  printf("conn_time_ms:\n"); print(@conn_time);
  clear(@conn_count); clear(@bytes_in);
  clear(@bytes_out); clear(@conn_time);
}' <PID>

Phase 2 — You Know Which Service, Not Where Inside (Trace Call Flow)

A. Trace every function call and return:

sudo bpftrace -p <PID> -e '
usdt:/usr/bin/python3:python:function__entry {
  @enter_ts[tid, str(arg1)] = nsecs;
  @call_depth[tid]++;
  printf("%lld %d ENTER %s %s:%d\n",
    nsecs, tid, str(arg1), str(arg0), arg2);
}
usdt:/usr/bin/python3:python:function__return {
  $func = str(arg1);
  $dur  = nsecs - @enter_ts[tid, $func];
  @call_depth[tid]--;
  printf("%lld %d EXIT  %s %lldms\n",
    nsecs, tid, $func, $dur/1000000);
  delete(@enter_ts[tid, $func]);
}'

B. Narrow execution details to business logic files only:

sudo bpftrace -p <PID> -e '
usdt:/usr/bin/python3:python:function__entry
/str(arg0) != "<frozen importlib._bootstrap>"
&& str(arg0) != "<frozen importlib._bootstrap_external>"
&& str(arg0) != "threading.py"
&& str(arg0) != "socketserver.py"/ {
  printf("%lld ENTER %-40s %s:%d\n",
    nsecs, str(arg1), str(arg0), arg2);
}
usdt:/usr/bin/python3:python:function__return
/str(arg0) != "<frozen importlib._bootstrap>"
&& str(arg0) != "threading.py"/ {
  printf("%lld EXIT  %s\n", nsecs, str(arg1));
}'

Phase 3 — Intercept Data at Boundaries

A. Read values being passed to specific functions and trace caller stacks:

sudo bpftrace -p <PID> -e '
usdt:/usr/bin/python3:python:function__entry
/str(arg1) == "process_payment"/ {
  printf("\n=== process_payment CALLED ===\n");
  printf("file : %s\n", str(arg0));
  printf("line : %d\n", arg2);
  printf("tid  : %d\n", tid);
  printf("time : %lld\n", nsecs);
  print(ustack(perf, 5));
}
usdt:/usr/bin/python3:python:function__return
/str(arg1) == "process_payment"/ {
  printf("=== process_payment RETURNED ===\n");
  printf("duration: %lldms\n", (nsecs - @t[tid])/1000000);
}'

B. Intercept raw HTTP request/response payloads:

sudo bpftrace -e '
tracepoint:syscalls:sys_enter_write /pid == $1/ {
  if (args->count > 0 && args->count < 1024) {
    printf("\n=== OUTBOUND DATA ===\n");
    printf("%s\n", str(args->buf, args->count));
    print(ustack(perf, 5));
  }
}
tracepoint:syscalls:sys_enter_read /pid == $1/ {
  @read_buf[tid] = args->buf;
  @read_ts[tid]  = nsecs;
}
tracepoint:syscalls:sys_exit_read /pid == $1 && args->ret > 0/ {
  if (@read_buf[tid] && args->ret < 2048) {
    printf("\n=== INBOUND DATA ===\n");
    printf("RECEIVED: %s\n", str(@read_buf[tid], args->ret));
  }
  delete(@read_buf[tid]);
}' <PID>

Phase 4 — Spot Anomaly and Trace Random Failures

A. Arm trigger dynamically on anomalies to avoid trace noise:

sudo bpftrace -p <PID> -e '
usdt:/usr/bin/python3:python:function__entry
/str(arg1) == "validate_payment"/ {
  @watch[tid] = 1;
  @watch_start[tid] = nsecs;
}
usdt:/usr/bin/python3:python:function__entry /@watch[tid]/ {
  printf("%lld ENTER %s %s:%d\n",
    nsecs, str(arg1), str(arg0), arg2);
}
usdt:/usr/bin/python3:python:function__return /@watch[tid]/ {
  printf("%lld EXIT  %s\n", nsecs, str(arg1));
}
usdt:/usr/bin/python3:python:raise__exception /@watch[tid]/ {
  printf("\n!!! EXCEPTION DURING PAYMENT !!!\n");
  printf("type : %s\n", str(arg0));
  printf("after: %lldms\n", (nsecs-@watch_start[tid])/1000000);
  print(ustack(perf, 15));
  @watch[tid] = 0;
  exit();
}'

B. Flag suspiciously fast executions (indicating early return errors):

sudo bpftrace -p <PID> -e '
usdt:/usr/bin/python3:python:function__entry
/str(arg1) == "get_transaction_status"/ {
  @tx_start[tid] = nsecs;
  @tx_stack[tid] = ustack(perf, 8);
}
usdt:/usr/bin/python3:python:function__return
/str(arg1) == "get_transaction_status"/ {
  $dur = nsecs - @tx_start[tid];
  if ($dur < 100000) {
    printf("\n!!! SUSPICIOUSLY FAST RETURN !!!\n");
    printf("returned in %ldus — early return / branch issue?\n", $dur/1000);
    print(@tx_stack[tid]);
  }
  delete(@tx_start[tid]);
  delete(@tx_stack[tid]);
}'

Phase 5 — Multi-Service Timestamp Correlation

Run a timestamped log on all machines concurrently:

sudo bpftrace -p <PID> -e '
usdt:/usr/bin/python3:python:function__entry {
  printf("SVC=%s TS=%lld ENTER %s %s:%d\n",
    comm, nsecs, str(arg1), str(arg0), arg2);
}
usdt:/usr/bin/python3:python:function__return {
  printf("SVC=%s TS=%lld EXIT  %s\n",
    comm, nsecs, str(arg1));
}
tracepoint:syscalls:sys_enter_connect /pid == $1/ {
  printf("SVC=%s TS=%lld CONNECT\n", comm, nsecs);
}' <PID> 2>&1 | tee /tmp/trace_<service_name>.log

Merge and sort all logs on one machine:

cat /tmp/trace_*.log | sort -k2 -t= | grep -v "^$"

Logical Error Mental Model

Logical error symptoms:
├── Wrong value returned        → intercept write() — read actual payload
├── Wrong branch taken          → function returns too fast = early return
├── Missing call                → expected function never appears in trace
├── Called in wrong order       → timestamps show wrong sequence
├── Called with wrong args      → read() intercept shows wrong payload
└── Race condition              → two threads in same function simultaneously
START — something is wrong
│
├── Which service?
│   └── bytes_in/bytes_out per service → find where chain breaks
│
├── Which function?
│   └── full function trace filtered to your files only
│       → find the function that returns wrong/fast/never
│
├── What data?
│   └── intercept write()/read() syscalls → read actual payloads
│       → find where value becomes wrong
│
├── Which code path?
│   └── ustack on entry → who called this function
│       → wrong caller = wrong code path taken
│
├── Race condition?
│   └── count concurrent threads in same function
│       → > 1 simultaneously = race
│
└── Happens randomly?
    └── arm trigger on anomaly → capture only on bad execution
        → exit() after first capture = zero noise

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylow-0.1.5.tar.gz (55.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylow-0.1.5-py3-none-any.whl (78.8 kB view details)

Uploaded Python 3

File details

Details for the file pylow-0.1.5.tar.gz.

File metadata

  • Download URL: pylow-0.1.5.tar.gz
  • Upload date:
  • Size: 55.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for pylow-0.1.5.tar.gz
Algorithm Hash digest
SHA256 a51c9f89ed81868c3e72530bc662e9b0875cb58e486d03cc9338f2f7b03db52f
MD5 ace072480ecdb870af23719d3bc25581
BLAKE2b-256 08672cd5fe0126ec4856c4a6bc8ad1f16d216f9a86b24a454a77f57fef137d20

See more details on using hashes here.

File details

Details for the file pylow-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: pylow-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 78.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for pylow-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 760ef216d245be47dd46aa621f4da8442821428722dd5603251b7696e1895079
MD5 f6068d884bc4086d519ccd12b4ef7fe5
BLAKE2b-256 4fc50d386afd12e43c4fcfd12e750b437c926780f30429ffbe33eafafbb6b948

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page