Skip to main content

Ghost through binaries โ€” parallel IDA analysis + AI function naming in your terminal

Project description

๐Ÿ‘ป spectrIDA

Ghost through binaries.

A local, AI-powered reverse engineering assistant: parallel IDA Pro analysis, AI function naming, a terminal that doesn't suck, a Neo4j knowledge graph of everything it's ever figured out, and an MCP server so Claude can search and chain through that graph directly.

spectrida analyze GameAssembly.dll --workers 16
โ—ˆ  spectrIDA  โ–ธ  GameAssembly.dll

  โœ“ 00  โœ“ 01  โœ“ 02  โœ“ 03  โ–ธ 04  ยท 05  ยท 06  ยท 07
  โœ“ 08  โœ“ 09  โœ“ 10  โœ“ 11  โœ“ 12  โœ“ 13  โ–ธ 14  ยท 15

  14/16 shards  โ”‚  141,203 functions found
  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘  89%  ~4s remaining

What it is

IDA Pro's auto-analysis is single-threaded. On a 34 MB il2cpp DLL that's minutes. spectrIDA splits the binary into N shards, runs them in parallel via idalib, merges into one .i64, then lets a fine-tuned 8B model name every function โ€” all from one terminal UI with a cyberpunk theme and exactly the right amount of sarcasm.

That's Chapter 1, and it stands on its own: pure speed, no AI required if you don't want it.

Chapter 2 turns the output into something that outlives the session: every name, every piece of pseudocode, every disassembled instruction, every call edge โ€” written into a Neo4j graph an agent can actually live in. Point Claude (or pi, or anything else that speaks MCP) at it and it can search_functions, walk get_callers/get_callees, follow a call chain N hops deep, read exact instruction boundaries for planning a patch, and even kick off analysis on a brand-new binary itself โ€” instead of you pasting decompiler output into a chat window one function at a time.

Binary  โ”€โ–ถ  Parallel IDA Analysis  โ”€โ–ถ  Demangle  โ”€โ–ถ  AI Naming  โ”€โ–ถ  Neo4j Graph  โ”€โ–ถ  MCP Server  โ”€โ–ถ  Claude
              (N idalib shards)       (free, real)   (stripped     (persists,        (search/chain/
                                                       leftovers     forever,          rename, live)
                                                       only)         across sessions)

It is not Ghidra. It does one annoying thing (slow analysis + naming) fast, and it's genuinely fun to use. 199 downloads speak for themselves.

No cloud. No telemetry. Runs entirely on your machine.


Numbers

task time
Among Us DLL โ€” single-threaded IDA ~4 hours
Among Us DLL โ€” spectrIDA (16 workers) 67 seconds
153,649 function binary โ€” full naming pass overnight
Binary overview (what does this thing do?) ~30 seconds

Hardware these were measured on: AMD Ryzen 7 5800X3D (8C/16T), 32 GB RAM, RTX 4070 12 GB. Different hardware moves the parallel-analysis numbers (more cores, more shards, faster); the naming numbers are mostly GPU-bound. The 4-hour/67-second Among Us figures predate Chapter 2 and aren't independently re-verified in every release โ€” re-run spectrida analyze yourself if you want a number for your own machine and binary, results vary with shard density and binary size.

Numbers actually re-verified during Chapter 2 development, same hardware:

binary functions task time / result
test_small.dll (PE) 189 parallel analysis, 4 workers, CLI 6.4s
test_small.dll (PE) 164 full MCP pipeline (analyze + demangle + graph write) 9.8s
main.nso โ€” Mario Odyssey (NSO), 16 workers 28,038 seed functions parallel sharded scan phase 54.5s
main.nso โ€” Mario Odyssey (NSO), 16 workers 74,790 total functions + merge/full-analysis phase 143.1s
main.nso โ€” Mario Odyssey (NSO), 16 workers 74,790 total functions end-to-end wall time 197.6s
main.nso โ€” Mario Odyssey (NSO) 74,790 resolved via demangling alone (Itanium ABI, free, no AI) 67,300 (90.0%)

That NSO row is the real equivalent of the old Among Us "4 hours โ†’ 67 seconds" claim, measured fresh this release on a 74,790-function Switch binary, no AI naming involved (demangling only โ€” populate=False). The parallel phase (16 cores, ~55s) does the initial sharded discovery; the merge phase (~143s) is single-threaded by design โ€” one IDA database, one writer โ€” and is also where most of the function count comes from: once the binary is correctly decompressed and flagged AArch64, IDA's own analyzer expands the parallel pass's 28k seed functions to the full 74,790 in one consolidated pass. Getting an honest number here surfaced (and fixed) three real bugs in the NSO pipeline: wrong-architecture detection (it was silently scanning AArch64 as x86), locally-scoped entry-point seeding that missed any call crossing a shard boundary, and a missing LZ4-decompression step that meant earlier NSO runs were partly scanning compressed garbage. (We also tried trimming the merge phase's analysis flags for speed; turned out the flags that mattered for timing โ€” local-variable/stack-frame analysis โ€” are exactly what Hex-Rays needs to decompile anything, so disabling them silently zeroed out pseudocode for plenty of functions. Reverted to only skipping FLIRT signature matching, which is genuinely unneeded here and safe to drop โ€” worth ~3%, not the dramatic win it first looked like.)

On naming accuracy: it's not Ghidra-grade ground truth, it's an 8B model guessing from pseudocode. Generic helpers/getters tend to land well; deeply game-specific logic is more of a coin flip. Rename anything it gets wrong โ€” that's why rename_function persists straight back into the graph.


Features

  • Parallel sharded analysis โ€” splits into address-space shards, runs N idalib instances, merges into one .i64. Workers configurable via flag, config, or env var.
  • AI function naming โ€” fine-tuned Qwen3-8B runs locally via Ollama, streams names token-by-token. Press N. Watch it think. Name appears.
  • Batch naming โ€” B to name every sub_* function in the list. Walk away. Come back.
  • Binary overview โ€” press O or run spectrida overview file.i64. Model reads 120 sampled function names and tells you what the binary does, what its subsystems are, and anything security-relevant. Correctly identified a 153k-function IL2CPP runtime in 30 seconds.
  • Call chain explorer โ€” C shows callers and callees. The model uses these as context when naming โ€” a function called by Player$$TakeDamage gets named better than one in isolation.
  • Decompiler view โ€” D toggles Hex-Rays pseudocode.
  • Export โ€” dump everything to JSON, CSV, IDA .idc script, or a symbols file. The .idc applies all AI-generated names back into any IDA install in one click.
  • Programmatic API โ€” from spectrida.api import open_i64. Drive everything from scripts, notebooks, or Claude Code without touching the TUI.
  • MCP server โ€” spectrida install mcp wires it straight into Claude Code and/or pi, no manual JSON editing. Claude can then search/read/chain through a Neo4j-backed function graph (name, pseudocode, disassembly, callers/callees) and kick off a fresh analysis on a new binary itself โ€” analyze_binary runs the whole pipeline (parallel analysis โ†’ demangle โ†’ AI naming โ†’ graph) from one tool call, as a background job it polls. Works on PE and NSO. See Chapter 2 below.
  • Demo mode (spectrida --demo) โ€” try the whole thing with zero setup. No IDA, no Ollama.
  • A first-run wizard โ€” helps you install Ollama + the model, detects your IDA install automatically, then never asks again.

Install

pip install spectrida

Requirements: IDA Pro 9.x with idalib ยท Python 3.10+ ยท Ollama

# install Ollama (Windows)
winget install Ollama.Ollama

# pull the model (8.7 GB โ€” go get coffee)
ollama pull hf.co/gdfhhjk/spectrida-re-gguf:latest

# first run โ€” detects your IDA install and sets everything up
spectrida onboard

# or just try the demo right now
spectrida --demo

Commands

# analyze a binary from scratch
spectrida analyze GameAssembly.dll
spectrida analyze GameAssembly.dll --workers 8    # custom worker count

# open an existing .i64 in the browser
spectrida open file.i64

# ask the AI what this binary is
spectrida overview file.i64
spectrida overview file.i64 --addr 0x10001000 --addr 0x10353fd0  # include specific functions

# export function names
spectrida export file.i64 -f idc           # IDA script โ€” apply names to any install
spectrida export file.i64 -f json          # full dump with addresses + sizes
spectrida export file.i64 -f csv           # spreadsheet
spectrida export file.i64 -f symbols       # addr name pairs
spectrida export file.i64 --named-only     # skip sub_* functions

# check Ollama + model status
spectrida serve

# re-run the setup wizard
spectrida onboard

TUI keys

Key Action
N Name selected function โ€” AI streams the result live
R Rename โ€” pre-filled with the AI suggestion
D Toggle decompiled pseudocode (Hex-Rays)
C Call chain โ€” callers and callees
B Batch-name all sub_* functions in the current list
O Overview โ€” AI summary of the whole binary
/ Fuzzy search
? Help
Q Quit

Programmatic API

No TUI needed โ€” drive spectrIDA from scripts, Claude Code, notebooks, whatever:

import asyncio
from spectrida.api import open_i64

async def main():
    async with open_i64("GameAssembly.i64") as db:

        # list all 153k functions
        funcs = await db.list_functions()

        # name one function โ€” returns name + reasoning + confidence
        result = await db.name_function(0x10001000)
        print(result["new_name"])     # init_atexit_handler
        print(result["reasoning"])    # allocates array of 3 fn ptrs, calls _atexit...

        # batch name everything (with live progress)
        async def on_progress(done, total, r):
            print(f"  {done}/{total}  {r['old_name']} -> {r['new_name']}")

        await db.batch_name(limit=500, rename=True, progress_cb=on_progress)

        # ask what the binary does
        overview = await db.overview()
        print(overview)

        # export to IDA script
        await db.export("names.idc", fmt="idc", named_only=True)

asyncio.run(main())

The model

hf.co/gdfhhjk/spectrida-re-gguf โ€” Qwen3-8B fine-tuned for reverse engineering.

Trained on:

  • x86/x64 assembly โ†’ function name pairs with call-chain context
  • Tool call traces from jtsylve/ida-mcp โ€” headless IDA with idalib
  • Extended context reasoning traces from a codebase context server

Training approach: neuron-targeted SFT + GRPO. Only the RE-relevant neurons are tuned โ€” base Qwen3 knowledge stays intact, you just added a very specific skill on top.

Runs locally via Ollama. GGUF โ€” works on CPU, GPU, or both.


Who is this for

You're reversing something. You have a binary with 150,000 functions. Maybe 2,000 have names from metadata. The other 148,000 are sub_XXXXXXXX. You want to find the network code. You can't grep for it because nothing has a name yet.

A human RE can name ~50-100 functions per hour if they're fast. At that rate, 150k functions = 3 years.

spectrIDA names them overnight. Not perfectly โ€” maybe 70% accuracy on generic functions, much higher on patterns the model recognizes. But now instead of 148k sub_ functions you have network_send_packet, serialize_player_state, validate_checksum โ€” and you know where to look.

It doesn't replace a skilled reverse engineer. It does the boring 80% so you can focus on the interesting 20%. It's the orientation layer.

Real use cases:

  • Game modding โ€” find the physics system in a 150k-function binary in minutes, not days
  • Security research โ€” malware triage, understand a binary's architecture quickly
  • CTF โ€” time pressure, need to know what you're looking at immediately
  • Anyone who has stared at sub_140001234 for 20 minutes thinking there has to be a better way

Configuration

~/.spectrida/config.toml:

[ida]
idalib = "C:/Program Files/IDA Professional 9.1"
output_dir = "~/.spectrida/output"

[ollama]
base_url = "http://localhost:11434"
model = "spectrida-re"   # any ollama model name works

[pipeline]
workers = 16

Env var overrides: SPECTRIDA_IDALIB ยท SPECTRIDA_MODEL ยท SPECTRIDA_WORKERS ยท SPECTRIDA_OLLAMA_URL


Chapter 2 โ€” the ghost learns to talk back

Chapter 1 was a faster, funnier IDA. Chapter 2 is spectrIDA as a teammate: a persistent, queryable knowledge graph of every function it's ever named, and an MCP server so Claude (or any MCP client โ€” pi works too) can search and reason through it directly, instead of you copy-pasting decompiler output into a chat window.

spectrida install mcp

That's it. It registers the server with Claude Code and pi automatically (pulling in mcp + neo4j if a bare pip install spectrida skipped them), writes their config, and tells you which restart you owe it.

What Claude actually gets, once Neo4j is running (spectrida config [graph] section, or just point it at a local instance):

  • search_functions / get_function / get_callees / get_callers / trace_chain โ€” fast, cached graph reads. get_function returns pseudocode and disassembly (exact instruction boundaries and operands โ€” the layer pseudocode can't give you, which matters the moment you go from "what does this do" to "where exactly would I patch this") plus inline callers/callees, so Claude decides whether to chain deeper by looking at whether a callee is still sub_* right there in the response โ€” no extra round trip just to find out there's nothing more to see.
  • get_full_pseudocode / rename_function โ€” live, authoritative reads/writes straight to the .i64 when the cached snippet isn't enough or a name is finally figured out.
  • analyze_binary โ€” hand it a binary it's never seen (PE or NSO, parallel-sharded either way) and it runs the whole pipeline โ€” analyze โ†’ demangle (Itanium and MSVC) โ†’ AI-name the genuinely stripped leftovers โ†’ push it all into the graph โ€” as one background job you poll, so a multi-minute run never blocks the conversation.
  • doctor / start_all โ€” check or boot llama-server + Neo4j without leaving the chat.

It's not magic โ€” a function that's still sub_140001234 because nobody's looked at it yet is still sub_140001234. But the graph remembers everything the model has figured out, forever, across sessions, and Claude can walk it like a colleague who already read the codebase instead of staring at one function at a time.

Still coming:

  • Deep context naming โ€” follow call trees N levels deep, feed the full chain to the model. A function 3 hops from encrypt_block should know it's in the crypto path.
  • Deobfuscation โ€” TigressVM pattern detection and handler tracing
  • Actual patching โ€” the disassembly is in the graph now so an agent can plan a byte-level patch; turning "here's the exact instruction to change" into "and here's the write" is next.

License

MIT. Do whatever you want with it. If it works, cool. If it doesn't, blame the GGUF quantization.

Built with spite, coffee, and an RTX 4070. The model has 199 downloads with zero marketing. Each one adds 0.01% to development speed. (This is not true. But it's close.) ๐Ÿ‘ป

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectrida-0.2.1.tar.gz (76.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spectrida-0.2.1-py3-none-any.whl (88.7 kB view details)

Uploaded Python 3

File details

Details for the file spectrida-0.2.1.tar.gz.

File metadata

  • Download URL: spectrida-0.2.1.tar.gz
  • Upload date:
  • Size: 76.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for spectrida-0.2.1.tar.gz
Algorithm Hash digest
SHA256 af104eeb06c4650fd1239b48dcd6ca0653a64a61c5c2c784b48c5b5d4b693f42
MD5 bc444d34a8001d6afe9d91b6c020d991
BLAKE2b-256 64c00820139ad8f9d14f561f49cf29aef6854dfd35ab39717191ac369a7e0f49

See more details on using hashes here.

File details

Details for the file spectrida-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: spectrida-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 88.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for spectrida-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 535b779579df8793c64bc609cb8b26839f9b7d223bc2621a12cb8ac50b15e9d9
MD5 6310c1dbcf8507dff63a2c77aa61d9cf
BLAKE2b-256 427dbcfbb846719b6ceac61e81940c529b7adbe6c7276af534d2d41e0c8674ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page