Python utility to reconcile Pandas DataFrames
Project description
reconciler
reconciler
is a python package to reconcile tabular data with various reconciliation services, such as
Wikidata, working similarly to what OpenRefine
does, but entirely within Python, using Pandas.
Quickstart
You can install the latest version of reconciler from PyPI with:
pip install reconciler
Then to use it:
from reconciler import reconcile
import pandas as pd
# A DataFrame with a column you want to reconcile.
test_df = pd.DataFrame(
{
"City": ["Rio de Janeiro", "São Paulo", "São Paulo", "Natal"],
"Country": ["Q155", "Q155", "Q155", "Q155"]
}
)
# Reconcile against type city (Q515), getting the best match for each item.
reconciled = reconcile(test_df["City"], type_id="Q515")
The resulting dataframe would look like this:
id | match | name | score | type | type_id | input_value |
---|---|---|---|---|---|---|
Q8678 | True | Rio de Janeiro | 100 | city | Q515 | Rio de Janeiro |
Q174 | True | São Paulo | 100 | city | Q515 | São Paulo |
Q131620 | True | Natal | 100 | municipality of Brazil | Q3184121 | Natal |
In case you want to ensure the results are cities from Brazil, you can specify the property_mapping argument with a specific property-value pair:
# Reconcile against type city (Q515) and items have the country (P17) property equals to Brazil (Q155)
reconciled = reconcile(test_df["City"], type_id="Q515", property_mapping={"P17": test_df["Country"]})
Options
The reconcile()
function accepts several options.
type_id
- The type of items to reconcile against per the API specification.top_res
- Either the number of results to return per entry or the string 'all' to return all results.property_mapping
- A list of properties to filter results on per the API specification.reconciliation_endpoint
- The reconciliation service to connect to. Defaults tohttps://wikidata.reconci.link/en/api
.
Other very useful packages
Although my opinion may be biased, I think reconciler
is a pretty nice package.
But the thing is, it probably won't fulfill all your Wikidata-related needs.
Here are other packages that could help with that:
-
WikidataIntegrator has a lot of very nice, low-level, functions for dealing with various wikidata-related activities, such as item acquisition and programmatic editing.
-
wikidata2df is a very simple utility package for quickly and easily turning wikidata SPARQL queries into Pandas DataFrames.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file reconciler-0.2.2.tar.gz
.
File metadata
- Download URL: reconciler-0.2.2.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad90fc2601baeadafa412d49633e03cfeebdd5daf722fab12a9ef79ca3425403 |
|
MD5 | ce91b0b9e6d458c1cc6bfa828c28d192 |
|
BLAKE2b-256 | 07d1d26244a060fac223b8caa7b90668c414df302841b266a47ab08970d64177 |
File details
Details for the file reconciler-0.2.2-py2.py3-none-any.whl
.
File metadata
- Download URL: reconciler-0.2.2-py2.py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 408f61fae3439e3166ac8c088050605229d9608e53648e14506defe9b98c0ed7 |
|
MD5 | 5b60d46564d5f82d017cfaa757205cca |
|
BLAKE2b-256 | c6bdbfa915e135d86207e8d8b7349c4c9100bb0c5e64c19deeff6d906d102b9c |