Skip to main content

Python utility to reconcile Pandas DataFrames

Project description

reconciler

license pytest status documentation status DOI

reconciler is a python package to reconcile tabular data with various reconciliation services, such as Wikidata, working similarly to what OpenRefine does, but entirely within Python, using Pandas.

Quickstart

You can install the latest version of reconciler from PyPI with:

pip install reconciler

Then to use it:

from reconciler import reconcile
import pandas as pd

# A DataFrame with a column you want to reconcile.
test_df = pd.DataFrame(
    {
        "City": ["Rio de Janeiro", "São Paulo", "São Paulo", "Natal"],
        "Country": ["Q155", "Q155", "Q155", "Q155"]
    }
)

# Reconcile against type city (Q515), getting the best match for each item.
reconciled = reconcile(test_df["City"], type_id="Q515")

The resulting dataframe would look like this:

id match name score type type_id input_value
Q8678 True Rio de Janeiro 100 city Q515 Rio de Janeiro
Q174 True São Paulo 100 city Q515 São Paulo
Q131620 True Natal 100 municipality of Brazil Q3184121 Natal

In case you want to ensure the results are cities from Brazil, you can specify the property_mapping argument with a specific property-value pair:

# Reconcile against type city (Q515) and items have the country (P17) property equals to Brazil (Q155)
reconciled = reconcile(test_df["City"], type_id="Q515", property_mapping={"P17": test_df["Country"]})

Options

The reconcile() function accepts several options.

  • type_id - The type of items to reconcile against per the API specification.
  • top_res - Either the number of results to return per entry or the string 'all' to return all results.
  • property_mapping - A list of properties to filter results on per the API specification.
  • reconciliation_endpoint - The reconciliation service to connect to. Defaults to https://wikidata.reconci.link/en/api.

Other very useful packages

Although my opinion may be biased, I think reconciler is a pretty nice package. But the thing is, it probably won't fulfill all your Wikidata-related needs. Here are other packages that could help with that:

  • WikidataIntegrator has a lot of very nice, low-level, functions for dealing with various wikidata-related activities, such as item acquisition and programmatic editing.

  • wikidata2df is a very simple utility package for quickly and easily turning wikidata SPARQL queries into Pandas DataFrames.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reconciler-0.2.2.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

reconciler-0.2.2-py2.py3-none-any.whl (8.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file reconciler-0.2.2.tar.gz.

File metadata

  • Download URL: reconciler-0.2.2.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for reconciler-0.2.2.tar.gz
Algorithm Hash digest
SHA256 ad90fc2601baeadafa412d49633e03cfeebdd5daf722fab12a9ef79ca3425403
MD5 ce91b0b9e6d458c1cc6bfa828c28d192
BLAKE2b-256 07d1d26244a060fac223b8caa7b90668c414df302841b266a47ab08970d64177

See more details on using hashes here.

File details

Details for the file reconciler-0.2.2-py2.py3-none-any.whl.

File metadata

  • Download URL: reconciler-0.2.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for reconciler-0.2.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 408f61fae3439e3166ac8c088050605229d9608e53648e14506defe9b98c0ed7
MD5 5b60d46564d5f82d017cfaa757205cca
BLAKE2b-256 c6bdbfa915e135d86207e8d8b7349c4c9100bb0c5e64c19deeff6d906d102b9c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page