Skip to main content

followthemoney query dsl and io helpers

Project description

ftmq on pypi Python test and package pre-commit Coverage Status MIT License

ftmq

An attempt towards a followthemoney query dsl.

This library provides methods to query and filter entities formatted as followthemoney data, either from a json file/stream or using a SQL backend via followthemoney-store

It also provides a Query class that can be used in other libs to work with SQL queries or api queries.

Minimum Python version: 3.10

Installation

pip install ftmq

Usage

ftmq accepts either a line-based input stream or an argument with a file uri. (For integration with followthemoney-store, see below)

Input stream:

cat entities.ftm.json | ftmq <filter expression> > output.ftm.json

URI argument:

Under the hood, ftmq uses smart_open to be able to interpret arbitrary file uris as argument -i:

ftmq <filter expression> -i ~/Data/entities.ftm.json
ftmq <filter expression> -i https://example.org/data.json.gz
ftmq <filter expression> -i s3://data-bucket/entities.ftm.json
ftmq <filter expression> -i webhdfs://host:port/path/file

...and so on

Of course, the same is possible for output -o:

cat data.json | ftmq <filter expression> -o s3://data-bucket/output.json

Filter for a dataset:

cat entities.ftm.json | ftmq -d ec_meetings

Filter for a schema:

cat entities.ftm.json | ftmq -s Person

Filter for a schema and all it's descendants or ancestors:

cat entities.ftm.json | ftmq -s LegalEntity --schema-include-descendants
cat entities.ftm.json | ftmq -s LegalEntity --schema-include-ancestors

Filter for properties:

Properties are options via --<prop>=<value>

cat entities.ftm.json | ftmq -s Company --country=de

Comparison lookups for properties:

cat entities.ftm.json | ftmq -s Company --incorporationDate__gte=2020 --address__ilike=berlin

Possible lookups:

  • gt - greater than
  • lt - lower than
  • gte - greater or equal
  • lte - lower or equal
  • like - SQLish LIKE (use % placeholders)
  • ilike - SQLish ILIKE, case-insensitive (use % placeholders)
  • [] - usage: prop[]=foo evaluates if foo is member of array prop

ftmq apply

"Uplevel" an entity input stream to nomenklatura.entity.CompositeEntity and optionally apply a dataset.

ftmq apply -i ./entities.ftm.json -d <aditional_dataset>

Overwrite datasets:

ftmq apply -i ./entities.ftm.json -d <aditional_dataset> --replace-dataset

Coverage / Statistics

Often in ftm scripting, we are iterating through all the proxies (e.g. during aggregation). Why not use this to collect statistics on the way? There is a context manager for this, which turns into the Coverage model:

Print coverage to stdout (and filtered entities to nowhere):

cat entities.ftm.json | ftmq -s Event -o /dev/null --coverage-uri -

Within code:

from ftmq.coverage import Coverage

fragments = [...]
buffer = {}

coverage = Coverage({"frequency": "unknown"})
with coverage as cx:
    for proxy in fragments:
        if proxy.id in buffer:
            buffer[proxy.id].merge(proxy)
        else:
            buffer[proxy.id] = proxy
            # here collect stats:
            cx.collect(proxy)

stats = coverage.dict()

ftmstore (database read)

NOT IMPLEMENTED YET

The same cli logic applies:

ftmq store iterate -d ec_meetings -s Event --date__gte=2019 --date__lte=2020

Python Library

NOT IMPLEMENTED YET

from ftmq import Query

q = Query(engine="sqlite") \
    .where(dataset="ec_meetings", date__lte=2020) \
    .where(schema="Event") \
    .order_by("date", ascending=False)

# resulting sqlite query:
str(q)
"""
SELECT t.id,
    t.schema,
    t.entity,
    json_extract(t.entity, '$.properties.date') AS date
FROM ec_meetings t
WHERE
    (EXISTS (SELECT 1 FROM json_each(date) WHERE value <= ?)) AND (t.schema = ?)
ORDER BY date DESC
"""

# parameterized
[p for p in q.parameters]
[2020, 'Event']

support

This project is part of investigraph

Media Tech Lab Bayern batch #3

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ftmq-0.2.0.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ftmq-0.2.0-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file ftmq-0.2.0.tar.gz.

File metadata

  • Download URL: ftmq-0.2.0.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.2 Linux/6.1.0-10-amd64

File hashes

Hashes for ftmq-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9c0a6532c5b157c00f35a11ab2e7169b336edd4329410c61163076e064943e20
MD5 70db68c3774ae9deed7390bce867fc43
BLAKE2b-256 0f17b70d4ed50c25d8c6a428b38523886c47fc7ff10322d6f82aa5ca53ee6d09

See more details on using hashes here.

File details

Details for the file ftmq-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ftmq-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.2 Linux/6.1.0-10-amd64

File hashes

Hashes for ftmq-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dd1e01451672b14d19aa974813995dffb19340f4ed048312bec857a3bd21f41c
MD5 45526260da1b92ac26706084a7313184
BLAKE2b-256 8122be0a90e56c494e44409d692a0ad40770bdbf88255916c55d0feec3f743f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page