Skip to main content

followthemoney data search experiments based on ftmq

Project description

ftmq-search on pypi Python test and package pre-commit Coverage Status MIT License

ftmq-search

Search stores logic for FollowTheMoney data.

The aim is to experiment around with different full-text search backends for efficient shallow search of entities.

Currently supported backends:

Install

Python 3.11 or later.

pip install ftmq-search

Generate search documents

ftmqs transform -i entities.ftm.json > entities.transformed.json

Speed it up via GNU Parallel

cat entities.ftm.json | parallel -j8 --pipe --roundrobin ftmqs transform > entities.transformed.json

Index transformed documents

Sqlite FTS

ftmqs --uri sqlite:///ftmqs.store index -i entities.transformed.json

Elasticsearch

ftmqs --uri http://localhost:9200 index -i entities.transformed.json

ES can be parallelized:

cat entities.transformed.json | parallel -j8 --pipe --roundrobin ftmqs --uri http://localhost:9200 index

Tantivy

ftmqs --uri tantivy://tantivy.db index -i entities.transformed.json

Search

ftmqs search <query>

Autocomplete

ftmqs autocomplete <query>

Python

from ftmq import Query, smart_stream_proxies

from ftmqs import get_store
from ftmqs.logic import index_proxies

# elasticsearch
store = get_store("http://localhost:9200")

# sqlite
store = get_store("sqlite:///ftmqs.db")

# tantivy
store = get_store("tantivy://tantivy.db")

# tantivy in-memory
store = get_store("memory://")

# index entity data
proxies = smart_stream_proxies("./entities.ftm.json")
index_proxies(proxies, store)

# search
store.search("jane doe")

# filter for country and schema
q = Query().where(country="de", schema="Person")
store.search("jane doe", q)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ftmq_search-0.0.3.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ftmq_search-0.0.3-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file ftmq_search-0.0.3.tar.gz.

File metadata

  • Download URL: ftmq_search-0.0.3.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.3 Linux/6.12.22-amd64

File hashes

Hashes for ftmq_search-0.0.3.tar.gz
Algorithm Hash digest
SHA256 31e280d46591092a281750f33cd1068e4711e92be8e2a17db4f870c00eb1bce9
MD5 f02f13fe302f75e7e73160be86f20723
BLAKE2b-256 66abc55ef0514dc8504081c18dbf019766c5a4497bb156118ef4be91e62b83ce

See more details on using hashes here.

File details

Details for the file ftmq_search-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: ftmq_search-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.3 Linux/6.12.22-amd64

File hashes

Hashes for ftmq_search-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2da314491b1f858d3aa1585718b8950a68e4425b7e7cba57b39b4730594758fa
MD5 69435a8f449786ebae9d20b9479d92fb
BLAKE2b-256 d368d4dd716e420a3ed07f5d03f1c5ab2f5fc4fc772bbb94066cf880918712dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page