Convert STIX cyber threat intelligence bundles to Pandas DataFrames

These details have not been verified by PyPI

Project links

Project description

stix2tabular

Convert STIX cyber threat intelligence bundles to Pandas DataFrames.

Installation

pip install stix2tabular

Quick Start

from stix2tabular import stix_to_tables, save_tables

tables = stix_to_tables("enterprise-attack.json")

print(tables.keys())
# → dict_keys(['attack-pattern', 'intrusion-set', 'malware', 'tool', 'relationships', ...])

print(tables["malware"].head())
#                              id       type            name  ...
# 0  malware--abc123             malware     CHOPSTICK  ...
# 1  malware--def456             malware     X-Agent    ...

# Save to Parquet for later use
save_tables(tables, "attack_tables/")

Before / After

Before (without stix2tabular):

import json
import pandas as pd

with open("enterprise-attack.json") as f:
    bundle = json.load(f)

objects_by_type = {}
relationships = []

for obj in bundle["objects"]:
    obj_type = obj.get("type")
    if obj_type == "marking-definition":
        continue
    if obj_type == "relationship":
        relationships.append({
            "id": obj["id"],
            "type": obj["type"],
            "relationship_type": obj["relationship_type"],
            "source_ref": obj["source_ref"],
            "target_ref": obj["target_ref"],
            "created": obj.get("created"),
            "modified": obj.get("modified"),
        })
        continue
    if obj_type not in objects_by_type:
        objects_by_type[obj_type] = []
    row = {}
    for key, value in obj.items():
        row[key] = value
    objects_by_type[obj_type].append(row)

tables = {}
for obj_type, rows in objects_by_type.items():
    tables[obj_type] = pd.DataFrame(rows)
tables["relationships"] = pd.DataFrame(relationships)
# Still missing: sightings, SCO handling, STIX 2.0 embedded observables,
# deduplication, multi-bundle merging, error handling...

After (with stix2tabular):

from stix2tabular import stix_to_tables

tables = stix_to_tables("enterprise-attack.json")

What You Get

tables = stix_to_tables("enterprise-attack.json")

# One DataFrame per STIX type
tables["attack-pattern"]     # 680 rows × 15 columns
tables["intrusion-set"]      # 138 rows × 12 columns
tables["malware"]            # 490 rows × 14 columns
tables["tool"]               # 78 rows × 11 columns
tables["campaign"]           # 23 rows × 10 columns

# Relationships as a lean edge table
tables["relationships"]      # 18,400 rows × 9 columns

# Sightings
tables["sightings"]          # 42 rows × 8 columns

# SCO types (when include_scos=True)
tables["ipv4-addr"]          # 12 rows × 4 columns

API Reference

`stix_to_tables(source, include_scos=True)`

Convert STIX bundles into a dict of Pandas DataFrames.

source: str | list[str] | list[dict]
- File path (.json): reads and parses a single file
- Directory path: globs all *.json files, merges into one set of tables
- list[str]: each string is parsed as a full STIX bundle JSON
- list[dict]: each dict is treated as a parsed STIX bundle
include_scos: bool (default True)
- When True, STIX Cyber-observable Objects (IP addresses, domain names, file hashes, etc.) get their own DataFrames
- When False, only SDOs, relationships, and sightings are included
Returns: dict[str, pd.DataFrame]

`save_tables(tables, directory)`

Save all DataFrames to a directory as Parquet files.

tables: dict returned by stix_to_tables()
directory: path to output directory (created if it doesn't exist)
Writes one {type}.parquet file per key (e.g., malware.parquet, relationships.parquet)

`load_tables(directory)`

Load DataFrames from a directory of Parquet files.

directory: path to directory containing .parquet files from save_tables()
Returns: dict[str, pd.DataFrame] — dict keys derived from filenames

Working with the Data

# All techniques used by APT28
rels = tables["relationships"]
apt28_id = tables["intrusion-set"].query("name == 'APT28'")["id"].iloc[0]
technique_ids = rels.query(
    "source_ref == @apt28_id and relationship_type == 'uses'"
)["target_ref"]
techniques = tables["attack-pattern"][
    tables["attack-pattern"]["id"].isin(technique_ids)
]["name"]

# Most common relationship types
tables["relationships"]["relationship_type"].value_counts()

# Explode aliases to find all names for threat actors
tables["intrusion-set"].explode("aliases")[["name", "aliases"]]

# Merge bundles from a directory of STIX feeds
tables = stix_to_tables("/path/to/stix_feeds/")

# Join source names onto relationships for a denormalized view
import pandas as pd

rels = tables["relationships"].copy()
names = pd.concat([df[["id", "name"]] for df in tables.values() if "name" in df.columns])
rels = rels.merge(names, left_on="source_ref", right_on="id", suffixes=("", "_source"))
rels = rels.merge(names, left_on="target_ref", right_on="id", suffixes=("", "_target"))

Saving and Loading

The library includes built-in Parquet persistence for lossless round-tripping:

from stix2tabular import stix_to_tables, save_tables, load_tables

tables = stix_to_tables("enterprise-attack.json")

# Save all DataFrames to a directory (one .parquet file per type)
save_tables(tables, "output/attack_tables/")
# Creates: attack-pattern.parquet, intrusion-set.parquet, malware.parquet,
#          relationships.parquet, sightings.parquet, ...

# Load them back — identical DataFrames, including list/dict columns
tables = load_tables("output/attack_tables/")

Parquet preserves Python lists and dicts natively — no serialization needed, no data loss.

CSV note: If you need CSV, you'll need to serialize list/dict columns yourself before exporting:

import json
df = tables["malware"].copy()
for col in df.columns:
    df[col] = df[col].apply(lambda x: json.dumps(x) if isinstance(x, (list, dict)) else x)
df.to_csv("malware.csv", index=False)

Comparison with stix2nx

Need	Use
Graph traversal, centrality	stix2nx
Filtering, aggregation, ML	stix2tabular
Both	Install both

Same input API. Same STIX version support. Independent libraries — no cross-dependency.

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run all tests (integration test downloads live ATT&CK data, falls back to curated subset if offline)
pytest

# Run in offline mode (uses curated ~1MB ATT&CK subset only, no network needed)
STIX2TABULAR_OFFLINE=true pytest

# Regenerate the curated subset from latest ATT&CK (requires network)
python tests/data/build_subset.py

STIX Version Support

Supports both STIX 2.0 and STIX 2.1 bundles. STIX 2.0 observed-data objects with embedded observables are automatically extracted into their respective type DataFrames when include_scos=True.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stix2tabular-0.1.0.tar.gz (185.1 kB view details)

Uploaded Feb 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stix2tabular-0.1.0-py3-none-any.whl (11.2 kB view details)

Uploaded Feb 16, 2026 Python 3

File details

Details for the file stix2tabular-0.1.0.tar.gz.

File metadata

Download URL: stix2tabular-0.1.0.tar.gz
Upload date: Feb 16, 2026
Size: 185.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for stix2tabular-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ebed206ef7c1583e551543565811bb294727d469d69aac2c6c31565d8368f7c4`
MD5	`33fce2477bdb1b03280dd9d8da2e679a`
BLAKE2b-256	`1596b2ac359d443216f09bf520d4f01f8059af01b8e326cef8694e5069779542`

See more details on using hashes here.

File details

Details for the file stix2tabular-0.1.0-py3-none-any.whl.

File metadata

Download URL: stix2tabular-0.1.0-py3-none-any.whl
Upload date: Feb 16, 2026
Size: 11.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for stix2tabular-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1981474d335683d4f725d46c4fda52a75a93c0d6c90b14f5b3753d46ff2751a0`
MD5	`02e759ec870ff6b8b2ae9befcec9b4e2`
BLAKE2b-256	`89bce3bc2af9d65dd8cc630b040046c5930bcc0c7e734fc8addc39b23e865fe2`

See more details on using hashes here.

stix2tabular 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

stix2tabular

Installation

Quick Start

Before / After

What You Get

API Reference

stix_to_tables(source, include_scos=True)

save_tables(tables, directory)

load_tables(directory)

Working with the Data

Saving and Loading

Comparison with stix2nx

Running Tests

STIX Version Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`stix_to_tables(source, include_scos=True)`

`save_tables(tables, directory)`

`load_tables(directory)`