Skip to main content

A declarative toolkit for transforming machine-readable data into FollowTheMoney entities

Project description

The Beast

A flexible, declarative toolkit for transforming machine-readable data into FollowTheMoney (FTM) entities.

The Beast is currently in beta and is battle-tested in production on hundreds of data sources. While the mapping format may evolve for better flexibility, changes are introduced cautiously.

Installation

pip install thebeast

Quick Start

  1. Write a YAML mapping that describes how to read your source data and transform it into FTM entities.
  2. Run the mapping:
beast mapping.yaml
  1. Or sample a small fraction first:
beast-sample mapping.yaml --fraction 0.01

Features

  • Declarative YAML mappings - define data transformations without writing code
  • Multiple input formats - CSV, TSV, JSON, JSONL, with support for compressed and remote files (via smart_open)
  • Rich property pipelines - column extraction, literals, Jinja2 templates, regex operations, transformers, augmentors
  • Nested collections - handle hierarchical data with JMESPath traversal
  • Statement metadata - attach provenance at dataset, collection, or property level
  • Multiprocessing - parallel digest for CPU-bound workloads
  • Built-in transformers - date parsing, phone/email normalization, transliteration, and more
  • FTM schema validation - entities are validated against FollowTheMoney schemas
  • Custom FTM ontologies - extend or replace the standard FTM model with your own schemas

Mapping Example

id: my_dataset

ingest:
  cls: thebeast.ingest.CSVDictReader
  params:
    input_uri: ./people.csv

digest:
  cls: thebeast.digest.SingleProcessDigestor
  meta:
    dataset: { literal: MY_DATASET }
  collections:
    persons:
      path: "[@]"
      entities:
        person:
          schema: Person
          keys:
            - record.id
          properties:
            name:
              template: "{{ record.first }} {{ record.last }}"
            birthDate:
              column: birth
            email:
              column: emails
              regex_split: "[;,]"

dump:
  cls: thebeast.dump.StatementsCSVWriter
  params:
    output_uri: ./output.csv
    error_uri: ./errors.csv

Documentation

Full documentation is available in docs/README.md, covering:

  • Mapping format and all property operations
  • Ingestors, digestors, and dumpers
  • Statement metadata and provenance
  • Nested collections and entity references
  • Record and property transformers
  • Sampling and testing workflows

Running Tests

pip install thebeast[dev]
python -m pytest thebeast/tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thebeast-0.5.0.tar.gz (27.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thebeast-0.5.0-py3-none-any.whl (33.4 kB view details)

Uploaded Python 3

File details

Details for the file thebeast-0.5.0.tar.gz.

File metadata

  • Download URL: thebeast-0.5.0.tar.gz
  • Upload date:
  • Size: 27.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for thebeast-0.5.0.tar.gz
Algorithm Hash digest
SHA256 b474042cdbd7e0a8a0f94fdbeb915b6bb4ed7ae326290d55aead259f2b3eaf64
MD5 a7758d026c6de238e4a670f7495204be
BLAKE2b-256 8b149fc50f15108201cf4cdc01a98ad8e907734cdbeea576c4976cd7188dbf26

See more details on using hashes here.

File details

Details for the file thebeast-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: thebeast-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 33.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for thebeast-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2888e5138568225cb6b890fbed7542fc0e6e0be5d46bb2259bf917c0aa4c07f1
MD5 b616dcb8c25e81a49cae56340216e62f
BLAKE2b-256 ab611701be65cf44b763b8ca0fc95c8e613e4d938573675e65f6dea39f23614f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page