Provider-agnostic log analysis package with LLM support and LangGraph ReAct agent

These details have not been verified by PyPI

Project links

Project description

pylogtracer

Provider-agnostic log analysis package with LLM support and a dynamic ReAct agent.

What is pylogtracer?

pylogtracer is a Python package that helps you analyze log files using LLMs. It works in two modes:

Library mode — direct function calls, no agent needed, works without LLM
Agent mode — ask free-form questions, a LangGraph ReAct agent decides which tools to call

It supports any LLM provider — Ollama (local), OpenAI, Anthropic, or any custom API.

Features

Provider agnostic — OpenAI, Anthropic, Ollama, or any OpenAI-compatible API (vLLM, llama.cpp, Groq)
Keyword learning — LLM learns error patterns once, reuses them for free next time (persists across runs)
ReAct agent — multi-step reasoning, calls multiple tools per question
Grounded answers — counts, durations, log lines come straight from the data; the model never invents them, and says "not found" instead of guessing
Time resolver — understands "10am", "yesterday", "2 hours ago", "between 9am and 11am"
Date-scoped search — find a keyword's value on a specific date (search("MODEL-X", date="2024-03-01"))
Cluster analysis — groups related errors into incidents automatically
Reports — generate Markdown/HTML summaries (generate_report)
Real logs — gzip, JSON-lines, rotated files, custom formats, and huge files (bounded/tail reads)
No .env required — pass config directly to LogTracer

Installation

The core install is light (library mode works offline). Add an extra only for the LLM provider you actually use:

pip install pylogtracer            # core: library mode, reports, custom parsers
pip install pylogtracer[ollama]    # + Ollama (local models)
pip install pylogtracer[openai]    # + OpenAI
pip install pylogtracer[anthropic] # + Anthropic
pip install pylogtracer[agent]     # + LangGraph ReAct agent (ask())
pip install pylogtracer[all]       # everything

Agent mode (ask()) needs the agent extra; LLM classification / root-cause need a provider extra. Library-mode methods (summary, search, error_frequency, generate_report, …) work with just the core install.

Quick Start

from pylogtracer import LogTracer

# Ollama (local, no API key needed)
tracer = LogTracer(
    file_path  = "app.log",
    llm_config = {
        "provider": "ollama",
        "model":    "qwen2.5:7b",
        "base_url": "http://localhost:11434"
    }
)

# Library mode — no LLM needed
print(tracer.summary())
print(tracer.error_frequency())
print(tracer.health_check())

# Agent mode — LLM required
print(tracer.ask("what caused the crash at 10am?"))
print(tracer.ask("show INC1000004 related logs and how long it lasted"))

Supported Providers

Provider	Example Model	API Key
Ollama	`qwen2.5:7b`, `llama3`, `mistral`	No
OpenAI	`gpt-4o-mini`, `gpt-4o`	Yes
Anthropic	`claude-3-5-haiku-20241022`	Yes
Custom	Any OpenAI-compatible API	Optional

# OpenAI
tracer = LogTracer("app.log", llm_config={
    "provider": "openai",
    "model":    "gpt-4o-mini",
    "api_key":  "sk-..."
})

# Anthropic
tracer = LogTracer("app.log", llm_config={
    "provider": "anthropic",
    "model":    "claude-3-5-haiku-20241022",
    "api_key":  "sk-ant-..."
})

# Custom / vLLM / LM Studio
tracer = LogTracer("app.log", llm_config={
    "provider": "custom",
    "model":    "my-model",
    "base_url": "http://my-server:8000/v1",
    "api_key":  "optional"
})

Library Mode — All Methods

tracer = LogTracer("app.log")   # no LLM needed for library mode

# Overview
tracer.summary()
# {'total_entries': 100, 'total_errors': 30, 'total_clusters': 11,
#  'error_types': [...], 'first_error': '...', 'last_error': '...'}

# Error counts
tracer.error_frequency()
tracer.error_frequency(date="2024-03-01")
tracer.error_frequency(from_dt="2024-03-01 09:00:00", to_dt="2024-03-01 11:00:00")

# Filter errors
tracer.errors_by_date("2024-03-01")
tracer.errors_in_range("2024-03-01 09:00:00", "2024-03-01 11:00:00")

# Last incident
tracer.last_incident()

# System health
tracer.health_check()
# {'healthy': False, 'status': 'CRITICAL', 'total_errors': 30, ...}

# Incident duration
tracer.incident_duration()
# {'start': '...', 'end': '...', 'duration_human': '6 minutes 12 seconds', ...}

# Search (any keyword / id / snippet)
tracer.search("INC1000001")              # by incident ID
tracer.search("connection refused")      # by keyword
tracer.search("MODEL-X", date="2024-03-01")  # scope to one date (same key, per-date value)
tracer.get_related_logs("INC1000004")   # all logs in same cluster
tracer.get_entry_details("INC1000004")  # full entry with traceback

# Duration of ANY keyword/id (first -> last occurrence)
tracer.keyword_duration("INC1000001")
tracer.incident_duration()               # the most recent error burst

# Reports (no LLM needed)
tracer.generate_report("markdown")
tracer.generate_report("html", output="report.html")

# Root cause (LLM required)
tracer.root_cause_analysis()

Agent Mode — Ask Anything

tracer = LogTracer("app.log", llm_config={...})

# Simple questions
tracer.ask("what is the last error?")
tracer.ask("is the system healthy?")
tracer.ask("how many DB errors happened?")

# Time-based (auto-resolved — no need to specify exact timestamps)
tracer.ask("what errors happened at 10am?")
tracer.ask("show errors from yesterday")
tracer.ask("what happened 2 hours ago?")
tracer.ask("errors between 9am and 11am")

# Identifier search
tracer.ask("show me INC1000004 related logs")
tracer.ask("what happened with REQ-456?")

# Date-scoped value lookup (same key can differ per date)
tracer.ask("what was the prediction for MODEL-X on 2024-03-01?")
tracer.ask("how long did INC1000002 last?")

# Multi-step (agent calls multiple tools automatically)
tracer.ask("what caused the crash and how long did it last?")
tracer.ask("compare errors today vs yesterday")
tracer.ask("show INC1000004 related logs and diagnose the root cause")

How the Agent Works

The agent uses a LangGraph ReAct loop — it thinks, calls a tool, sees the result, and decides whether to call another tool or answer:

User: "what caused the crash and how long did it last?"
        ↓
  [think] → I need last_incident first
        ↓
  [tool]  → last_incident() → sees cluster
        ↓
  [think] → now I need root_cause and duration
        ↓
  [tool]  → root_cause() → LLM analysis
        ↓
  [tool]  → incident_duration() → 6 minutes 12 seconds
        ↓
  [think] → I have everything now
        ↓
  FINAL_ANSWER: "The crash was caused by..."

Time Resolution

The agent automatically understands relative time — no need for exact timestamps:

You say	Resolved to
`"10am"`	today 10:00:00 → 10:59:59
`"yesterday 2pm"`	yesterday 14:00:00 → 14:59:59
`"this morning"`	today 06:00:00 → 12:00:00
`"2 hours ago"`	now - 2h → now
`"last 30 minutes"`	now - 30m → now
`"last night"`	yesterday 20:00:00 → today 06:00:00
`"March 1"`	2024-03-01

Architecture

from pylogtracer import LogTracer      ← single entry point

LogTracer
    ├── preprocessing/
    │   ├── smart_reader.py            log reading, filtering, search
    │   ├── error_extractor.py         clustering, deduplication
    │   └── error_type_classifier.py   regex + keyword learning + LLM
    │
    ├── agents/
    │   ├── qa_agent.py                LangGraph ReAct agent (ask())
    │   └── root_cause_analyzer.py     LLM root cause analysis
    │
    ├── multiagent/
    │   └── context_bridge.py          agent-to-agent context loop
    │
    ├── llm/
    │   └── llm_factory.py             provider-agnostic LLM factory
    │
    └── utils/
        └── time_resolver.py           relative time resolution

How Keyword Learning Works

The classifier uses a 3-pass system to minimize LLM calls:

Pass 1 — Named exception regex (free):
  "ConnectionError: timed out"  → ConnectionError ✓

Pass 2 — Keyword store (free, learned this session):
  "database connection refused" → DatabaseConnectionError ✓
  (learned from a previous LLM call this session)

Pass 3 — LLM batch (only truly unknown errors):
  LLM classifies + returns keywords for future use
  Keywords stored → next similar error is FREE

Configuration Options

LogTracer(
    file_path   = "app.log",    # path to log file (.log/.txt/.jsonl/.gz)
    llm_config  = {             # LLM provider config (None = library mode)
        "provider":    "ollama",
        "model":       "qwen2.5:7b",
        "base_url":    "http://localhost:11434",
        "api_key":     "optional",
        "temperature": 0.0,
        "max_tokens":  1024,
    },
    gap_seconds = 60,           # seconds between entries to split incidents
    max_retries = 2,            # max times LLM can request more context

    # ── 0.2.0 — robustness & cost ──────────────────────────────────
    cache_path  = ".plt_cache.json",  # persist learned keywords across runs
    max_context_tokens = None,        # override model context window for batching
    level_aware = False,        # detect errors from the LEVEL field, not substrings
    include_warnings = False,   # with level_aware, also count WARN/WARNING

    # ── 0.2.0 — large files & formats ─────────────────────────────
    tail        = False,        # read only a recent window (huge logs)
    max_lines   = None,         # read only the last N lines
    max_bytes   = None,         # read only the last N bytes
    log_format  = "auto",       # "auto" | "text" | "json" (JSON-lines)
    json_keys   = None,         # override JSON timestamp/level/message keys
    glob_rotated = False,       # also read app.log.1, app.log.2.gz

    # ── 0.2.0 — trust ─────────────────────────────────────────────
    redact      = None,         # None=auto (on for cloud, off for local Ollama)
    evidence    = True,         # ask() answers carry the supporting log lines

    # ── custom log format (any layout) ────────────────────────────
    log_pattern = None,         # regex w/ named groups (timestamp/level/message)
    timestamp_format = None,    # strptime fmt for the captured timestamp
)

Cost note: with cache_path set, error types the LLM classified in a previous run are recognized for free, so the tokens sent to the model stay roughly flat no matter how large the log grows.

Command-Line Interface

Installing the package also installs a pylogtracer command:

pylogtracer app.log --summary
pylogtracer app.log --frequency --health
pylogtracer app.log --search INC5000002
pylogtracer app.log --tail --max-lines 100000 --level-aware --health
pylogtracer app.log --format json --health         # JSON-lines logs
pylogtracer app.log --summary --json                # machine-readable output
pylogtracer app.log --report markdown               # full Markdown report
pylogtracer app.log --report html -o report.html    # HTML report to a file

# Agent mode (LLM):
pylogtracer app.log --ask "what caused the crash?" \
    --provider ollama --model qwen2.5:3b

Reports

Generate a shareable report (no LLM needed):

print(tracer.generate_report("markdown"))
tracer.generate_report("html", output="report.html")
tracer.generate_report("markdown", include_root_cause=True)  # adds LLM root cause

Custom log formats

Point pylogtracer at any layout with a regex (named groups timestamp / level / message); matching lines are normalized internally so every feature still works:

tracer = LogTracer(
    "weird.log",
    log_pattern = r"(?P<timestamp>\d{2}/\d{2}/\d{4}-\d{2}:\d{2}:\d{2})\s*\|\s*"
                  r"(?P<level>\w+)\s*\|\s*(?P<message>.*)",
    timestamp_format = "%d/%m/%Y-%H:%M:%S",
    level_aware = True,
)

Built-in formats (YYYY-MM-DD HH:MM:SS, ISO T, DD-MM-YYYY, YYYY/MM/DD), JSON-lines, and gzip are detected automatically — a custom pattern is only for non-standard layouts.

Requirements

langchain>=0.2.0
langchain-core>=0.2.0
langchain-openai>=0.1.0
langchain-anthropic>=0.1.0
langchain-ollama>=0.1.0
langgraph>=0.1.0
pydantic>=2.0.0
python-dotenv>=1.0.0

Running Tests

# Install dev dependencies
pip install pytest pytest-cov

# Run all tests
pytest tests/ -v

# Run specific test
pytest tests/test_smart_reader.py -v

CI/CD

Every push to main runs tests on Python 3.10, 3.11, and 3.12. Every GitHub Release automatically publishes to PyPI.

License

MIT

Contributing

Pull requests welcome! Please run tests before submitting.

pip install -e .
pytest tests/ -v

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.2

Jun 27, 2026

0.2.1

Jun 27, 2026

0.2.0

Jun 20, 2026

0.1.3

Apr 13, 2026

0.1.2

Mar 31, 2026

0.1.1

Mar 22, 2026

0.1.0

Mar 22, 2026

0.0.2

Mar 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylogtracer-0.2.2.tar.gz (72.0 kB view details)

Uploaded Jun 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pylogtracer-0.2.2-py3-none-any.whl (73.5 kB view details)

Uploaded Jun 27, 2026 Python 3

File details

Details for the file pylogtracer-0.2.2.tar.gz.

File metadata

Download URL: pylogtracer-0.2.2.tar.gz
Upload date: Jun 27, 2026
Size: 72.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pylogtracer-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`bece6873930cf701d3ec10049db50add53736689fbe01f7b6926995909a98bb2`
MD5	`6492ca9492234b86072b095912df6148`
BLAKE2b-256	`16c443ba301adfbb7f4bc8525fe6c7bf6b359c866ada1ceff7af060ead8157c4`

See more details on using hashes here.

File details

Details for the file pylogtracer-0.2.2-py3-none-any.whl.

File metadata

Download URL: pylogtracer-0.2.2-py3-none-any.whl
Upload date: Jun 27, 2026
Size: 73.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pylogtracer-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dbaaa45c33f55a4a485738be91e809de35a2e84bd570cf2867469bd7f09943e4`
MD5	`b5f9a49d16b0aeabf03a1d2387144008`
BLAKE2b-256	`d88fa25603dd312687982b0d474277b1ec24663a5979e5e5ebda4f19b25d2041`

See more details on using hashes here.

pylogtracer 0.2.2

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

pylogtracer

What is pylogtracer?

Features

Installation

Quick Start

Supported Providers

Library Mode — All Methods

Agent Mode — Ask Anything

How the Agent Works

Time Resolution

Architecture

How Keyword Learning Works

Configuration Options

Command-Line Interface

Reports

Custom log formats

Requirements

Running Tests

CI/CD

License

Contributing

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes