A declarative toolkit for transforming machine-readable data into FollowTheMoney entities
Project description
The Beast
A flexible, declarative toolkit for transforming machine-readable data into FollowTheMoney (FTM) entities.
The Beast is currently in beta and is battle-tested in production on hundreds of data sources. While the mapping format may evolve for better flexibility, changes are introduced cautiously.
Installation
pip install thebeast
Quick Start
- Write a YAML mapping that describes how to read your source data and transform it into FTM entities.
- Run the mapping:
beast mapping.yaml
- Or sample a small fraction first:
beast-sample mapping.yaml --fraction 0.01
Features
- Declarative YAML mappings - define data transformations without writing code
- Multiple input formats - CSV, TSV, JSON, JSONL, with support for compressed and remote files (via
smart_open) - Rich property pipelines - column extraction, literals, Jinja2 templates, regex operations, transformers, augmentors
- Nested collections - handle hierarchical data with JMESPath traversal
- Statement metadata - attach provenance at dataset, collection, or property level
- Multiprocessing - parallel digest for CPU-bound workloads
- Built-in transformers - date parsing, phone/email normalization, transliteration, and more
- FTM schema validation - entities are validated against FollowTheMoney schemas
- Custom FTM ontologies - extend or replace the standard FTM model with your own schemas
Mapping Example
id: my_dataset
ingest:
cls: thebeast.ingest.CSVDictReader
params:
input_uri: ./people.csv
digest:
cls: thebeast.digest.SingleProcessDigestor
meta:
dataset: { literal: MY_DATASET }
collections:
persons:
path: "[@]"
entities:
person:
schema: Person
keys:
- record.id
properties:
name:
template: "{{ record.first }} {{ record.last }}"
birthDate:
column: birth
email:
column: emails
regex_split: "[;,]"
dump:
cls: thebeast.dump.StatementsCSVWriter
params:
output_uri: ./output.csv
error_uri: ./errors.csv
Documentation
Full documentation is available in docs/README.md, covering:
- Mapping format and all property operations
- Ingestors, digestors, and dumpers
- Statement metadata and provenance
- Nested collections and entity references
- Record and property transformers
- Sampling and testing workflows
Running Tests
pip install thebeast[dev]
python -m pytest thebeast/tests/ -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file thebeast-0.5.0.tar.gz.
File metadata
- Download URL: thebeast-0.5.0.tar.gz
- Upload date:
- Size: 27.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b474042cdbd7e0a8a0f94fdbeb915b6bb4ed7ae326290d55aead259f2b3eaf64
|
|
| MD5 |
a7758d026c6de238e4a670f7495204be
|
|
| BLAKE2b-256 |
8b149fc50f15108201cf4cdc01a98ad8e907734cdbeea576c4976cd7188dbf26
|
File details
Details for the file thebeast-0.5.0-py3-none-any.whl.
File metadata
- Download URL: thebeast-0.5.0-py3-none-any.whl
- Upload date:
- Size: 33.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2888e5138568225cb6b890fbed7542fc0e6e0be5d46bb2259bf917c0aa4c07f1
|
|
| MD5 |
b616dcb8c25e81a49cae56340216e62f
|
|
| BLAKE2b-256 |
ab611701be65cf44b763b8ca0fc95c8e613e4d938573675e65f6dea39f23614f
|