Skip to main content

Pure python implementation of parsing PDB debug information files

Project description

pdbpy

A pure python implementation of Program Database file parsing

Motivation

I want to be able to parse PDB files using python. I want to understand their structure.

There are other libraries and implementations (see below). This one differs in that it works out of the box, and it lazily loads only what is requested. Working with a file 1GB+ large and only want to know how to parse a single structure? Only want to know the address of a symbol? No problem, only the minumum will be loaded!

The PDB is memory-mapped. The underlying MSF format makes data possibly non-contiguous, but using the memorywrapper library that becomes (mostly) a non-issue, as it can provide a memoryview-like self-sliceable non-copying front for the data. Bytes are only copied when the view is accessed as a buffer. Unfortunately the data needs to be copied at that point, as the buffer protocol does not support wildly discontiguous memory areas.

Features!


Feature pdbpy
Can open PDB
Can find a given type by name
Uses the Hash Stream to accelerate type lookup by name?
Can look up symbols given name? ✅ (from global table)
Can look up symbols given addresses?

Installation

pip install pdbpy

Getting started


From test_symbol_address in test_windows_pdb.py

    pdb = PDB("example_pdbs/addr.pdb")
    addr = pdb.find_symbol_address("global_variable")

Explain the type hash stream


The hash stream consists of two parts: An ordered list of truncated hashes, and a list of of {TI, byteoffset} pairs to accelerate lookup.

The truncated hashes are hashes of the TI records, modulo'ed by the number of buckets. The number of buckets can be found in the header of the type stream. This can be loaded into a hash = Dict[TruncatedHash, List[TI]] Given the hash of a TI-record, we can find a list of potential TIs.

The second part of the hash stream accelerates this. It contains a list of monotonically increasing Tuple[TI, ByteOffset]-pairs. If we have a TI, we can find the offset of the closes preceeding TI and parse the TI-records from there until we find the exact one we want.

Combining the two functionalities offered by the hash stream, we thus find a list of potential TIs given a hash, and then use the second part to accelerate the lookup of the actual records, which we need to examine in order to determine if we found the TI matching the non-truncated hash.

The hash of TI-records is often the hash of the unique name (if there is one). If there isn't any unique name, it's a hash of the bytes of entire record. The functions used to compute the hashes are different for the unique name strings and for the bytes of the records.

Sources and references

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdbpy-0.0.1.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdbpy-0.0.1-py3-none-any.whl (53.6 kB view details)

Uploaded Python 3

File details

Details for the file pdbpy-0.0.1.tar.gz.

File metadata

  • Download URL: pdbpy-0.0.1.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for pdbpy-0.0.1.tar.gz
Algorithm Hash digest
SHA256 800ca5452250b080e6c68856f23fb28a753a6f191f41278571208c7d79f7a3f6
MD5 55f398804e29f9b9077966674b287458
BLAKE2b-256 3a011fb135bcc9ec267a0cf2385df6e065d986edcef56d596045465f066515ce

See more details on using hashes here.

File details

Details for the file pdbpy-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: pdbpy-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 53.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for pdbpy-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bd5f6b98eb4b4b2e31167a795850c1694373946e9f234bfd7686090e0eee68b5
MD5 2cab7916ba6723c854e39b865dbbc8c5
BLAKE2b-256 6a0b03be2fd67fe3a9cff6e8a805a7275c76a5811e6d6ceb74cf5a9164ca9115

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page