Skip to main content

followthemoney data search experiments based on ftmq

Project description

ftmq-search on pypi Python test and package pre-commit Coverage Status MIT License

ftmq-search

Search stores logic for FollowTheMoney data.

The aim is to experiment around with different full-text search backends for efficient shallow search of entities.

Currently supported backends:

Install

Python 3.11 or later.

pip install ftmq-search

Generate search documents

ftmqs transform -i entities.ftm.json > entities.transformed.json

Speed it up via GNU Parallel

cat entities.ftm.json | parallel -j8 --pipe --roundrobin ftmqs transform > entities.transformed.json

Index transformed documents

Sqlite FTS

ftmqs --uri sqlite:///ftmqs.store index -i entities.transformed.json

Elasticsearch

ftmqs --uri http://localhost:9200 index -i entities.transformed.json

ES can be parallelized:

cat entities.transformed.json | parallel -j8 --pipe --roundrobin ftmqs --uri http://localhost:9200 index

Tantivy

ftmqs --uri tantivy://tantivy.db index -i entities.transformed.json

Search

ftmqs search <query>

Autocomplete

ftmqs autocomplete <query>

Python

from ftmq import Query, smart_stream_proxies

from ftmqs import get_store
from ftmqs.logic import index_proxies

# elasticsearch
store = get_store("http://localhost:9200")

# sqlite
store = get_store("sqlite:///ftmqs.db")

# tantivy
store = get_store("tantivy://tantivy.db")

# tantivy in-memory
store = get_store("memory://")

# index entity data
proxies = smart_stream_proxies("./entities.ftm.json")
index_proxies(proxies, store)

# search
store.search("jane doe")

# filter for country and schema
q = Query().where(country="de", schema="Person")
store.search("jane doe", q)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ftmq_search-0.0.4.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ftmq_search-0.0.4-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file ftmq_search-0.0.4.tar.gz.

File metadata

  • Download URL: ftmq_search-0.0.4.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.3 Linux/6.12.22-amd64

File hashes

Hashes for ftmq_search-0.0.4.tar.gz
Algorithm Hash digest
SHA256 f6e94d9bf3925d7869226a1e5418154ca33b5eafcb49aabc73bc8ffb9d640a55
MD5 5958c6e08707a0c62c5ba47ad15cf1b0
BLAKE2b-256 85d2cfbc51c51bda061c2fb6e4f4d346f8f7b44f40b2768ce631386934083a75

See more details on using hashes here.

File details

Details for the file ftmq_search-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: ftmq_search-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 26.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.13.3 Linux/6.12.22-amd64

File hashes

Hashes for ftmq_search-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 911c40edbd11e0203d143e2ae337bd20f54a105f1acff19b309ad94be05bf525
MD5 2fa002366b01ec4d22b21669bcceda9e
BLAKE2b-256 38f8a698e86a2690f6b365c6b0739a0e63484b5e7b562bb33879ccd1c720251a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page