Skip to main content

Labeled property graph transformer for followthemoney data.

Project description

followthemoney-graph

The followthemoney-graph (ftmg) tool transforms and loads FollowTheMoney entity data into a Neo4j property graph database. The tool provides flexible data transformation capabilities including filtering, reification of entity properties into graph nodes, and graph optimization.

Features

  • Load FollowTheMoney entities into Neo4j with configurable schema mappings
  • Reify entity properties (names, addresses, identifiers, etc.) as graph nodes to reveal shared values
  • Automatically create unique constraints and indexes for optimal query performance
  • Prune single-reference reified nodes to optimize graph structure
  • Support for custom label mappings and schema filtering
  • Handles out-of-sequence data (nodes defined after edges that reference them)

Installation

Requirements

  • Python 3.10 or higher
  • Neo4j 5.0 or higher (running and accessible)

Install from source

git clone https://github.com/opensanctions/followthemoney.git
cd followthemoney-graph
pip install -e .

Install for development

pip install -e ".[dev]"

This includes additional tools like mypy, pytest, and coverage for development.

Configuration

Create a config.yml file to configure database connection and transformation settings:

# Database connection settings
db:
  url: bolt://localhost:7687
  username: neo4j
  password: your_password

# Node configuration
nodes:
  # Schema-specific settings
  schemata:
    Position:
      ignore: true  # Skip this entity type
    Address:
      ignore: true  # Don't create Address entity nodes
    Person:
      label: "Human"  # Use custom label instead of "Person"

  # Property type reification
  types:
    address:
      reify: true  # Create separate nodes for address values
    identifier:
      reify: true  # Create separate nodes for identifiers
    phone:
      reify: true
    email:
      reify: true
    url:
      reify: true

  # Topic-based labeling
  topics:
    "sanction":
      label: "Sanctioned"
    "role.pep":
      label: "Politician"
    "poi":
      label: "PersonOfInterest"
    "gov.national":
      ignore: true  # Skip entities with this topic

# Edge configuration
edges:
  schemata:
    Occupancy:
      ignore: true  # Skip this relationship type

Configuration Options

Database (db)

  • url: Neo4j connection URL (bolt:// or neo4j://)
  • username: Database username
  • password: Database password

Nodes (nodes)

Schemata Configuration (nodes.schemata)

Configure how FollowTheMoney entity schemas are mapped:

  • ignore: true: Skip entities of this schema type
  • label: "CustomLabel": Use a custom Neo4j label instead of the schema name

Type Reification (nodes.types)

Specify which property types should be reified as separate nodes:

  • reify: true: Create a separate node for this property type
  • When reified, properties like addresses or emails become nodes that can be shared between entities

Topic Labels (nodes.topics)

Map FollowTheMoney topics to Neo4j labels:

  • label: "CustomLabel": Apply this label to entities with the topic
  • ignore: true: Skip entities with this topic

Edges (edges)

Schemata Configuration (edges.schemata)

Configure which relationship types to include:

  • ignore: true: Skip relationships of this type

Usage

The ftmg command-line tool provides several commands for managing your graph database:

Check Configuration

Validate and display the expanded configuration:

ftmg check-config config.yml

This parses your configuration file and outputs the complete configuration including defaults.

Load Data

Load FollowTheMoney entities from a JSON Lines file into Neo4j:

ftmg load config.yml --source entities.ftm.json

This command:

  1. Creates unique constraints and indexes for all node types
  2. Reads entities from the source file (JSON Lines format)
  3. Transforms and loads them into Neo4j according to your configuration
  4. Handles out-of-sequence data automatically

Source file format: Each line should contain a single FollowTheMoney entity as JSON.

Prune Graph

Remove reified value nodes that are only referenced by a single entity:

ftmg prune config.yml

This optimization command:

  • Identifies reified nodes (addresses, emails, identifiers, etc.)
  • Counts unique entities referencing each reified node
  • Deletes nodes referenced by fewer than 2 entities
  • Reports the number of nodes pruned per type

Why prune? Reified nodes are most valuable when they reveal shared values between multiple entities. Single-reference reified nodes don't add structural value to the graph.

Delete All Data

Completely wipe the database (use with caution):

ftmg trash config.yml

This command requires confirmation and will delete all nodes and relationships.

Examples

Basic Workflow

# 1. Validate your configuration
ftmg check-config config.yml

# 2. Load your data
ftmg load config.yml --source my-entities.ftm.json

# 3. Optimize the graph by removing single-reference reified nodes
ftmg prune config.yml

Starting Fresh

# Clear the database
ftmg trash config.yml

# Load new data
ftmg load config.yml --source entities.ftm.json

Graph Structure

Entity Nodes

Entities are loaded as nodes with:

  • Base label: Entity
  • Schema label: e.g., Person, Company, Asset
  • Topic labels: e.g., Sanctioned, Politician (if configured)
  • Properties: All entity properties as node properties
  • Special property: id (unique constraint enforced)

Reified Value Nodes

When property types are marked for reification:

  • Each unique value becomes a separate node
  • Relationships connect entities to value nodes
  • Value nodes can be shared between entities
  • Labels: e.g., address, identifier, email
  • Special property: id (unique constraint enforced)

Relationships

Entity relationships are preserved as graph edges with:

  • Relationship type based on the FollowTheMoney schema
  • Properties from the relationship entity

Development

Running Tests

pytest

Type Checking

mypy ftmg

Code Coverage

pytest --cov=ftmg --cov-report=html

Releasing

Releases to PyPI are published automatically by the build GitHub Actions workflow when a version tag is pushed, using PyPI Trusted Publishing (OIDC) — no API token is stored in the repository.

To cut a release:

bump2version patch   # or: minor / major — creates a commit and a vX.Y.Z tag
git push --follow-tags

The tag push runs the test/lint/type-check job, builds the wheel + sdist, attaches a build-provenance attestation, and publishes to PyPI.

Links

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

followthemoney_graph-0.1.0.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

followthemoney_graph-0.1.0-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file followthemoney_graph-0.1.0.tar.gz.

File metadata

  • Download URL: followthemoney_graph-0.1.0.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for followthemoney_graph-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e97476eca6f268f84feb388b9d4be871efc72a14d1af194789c46b677ec397ff
MD5 cf8274420b1c72dc6625833e2714f035
BLAKE2b-256 6d8ec7b399b26a12b260d843b9d984777cec6e28635114e61fd54f52645e24f6

See more details on using hashes here.

Provenance

The following attestation bundles were made for followthemoney_graph-0.1.0.tar.gz:

Publisher: build.yml on opensanctions/followthemoney-graph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file followthemoney_graph-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for followthemoney_graph-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 136c09de9c621fa10894937530349ea979c77e965694750f561893433054894b
MD5 3412eb0750073adc0913c0eecf5ab6a1
BLAKE2b-256 788a5ccaab1ccb4743972194d47b825f2455fffc5e3b10254a35051ab45aa9d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for followthemoney_graph-0.1.0-py3-none-any.whl:

Publisher: build.yml on opensanctions/followthemoney-graph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page