Skip to main content

A Python library for enriching strings, entities and KGs using Wikibase knowledge graphs. It's adapted for people, organizations and German geographic entities, both modern and historical.

Project description

kg-enricher

PyPI version

kg-enricher is an open source Python library for enriching strings, entities and knowledge graphs using Wikibase knowledge graphs. It's adapted for people, organizations and German geographic entities, both modern and historical. By default it connects to Wikidata, but it can be configured for any Wikibase instance.

Context. In project BERD@NFDI there are multiple knowledge graphs with German company data. We link strings to entities and enrich strings with data from knowledge graphs. For geographic strings we also check whether geographic coordinates of an entity correspond to a point inside modern or historical German boundaries using the CShapes 2.0 Dataset.

Table of contents

Installation

pip install kg-enricher

or

git clone https://github.com/UB-Mannheim/kg-enricher
cd kg-enricher/
pip install .

How to use

Just import enrich-function and apply it to strings, which correspond to people, organizations or geographic entities.

An example for a person:

from enricher import enrich
enrich("Adolf Daimler")
{'label': 'Adolf Daimler',
 'description': 'German entrepreneur (1871-1913)',
 'aliases': [],
 'id': 'Q361191',
 'url': 'https://www.wikidata.org/wiki/Special:EntityData/Q361191',
 'date_of_birth': {'time': '+1871-09-08T00:00:00Z',
  'timezone': 0,
  'before': 0,
  'after': 0,
  'precision': 11,
  'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'},
 'date_of_death': {'time': '+1913-03-24T00:00:00Z',
  'timezone': 0,
  'before': 0,
  'after': 0,
  'precision': 11,
  'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'},
 'VIAF ID': '77537760',
 'ISNI': '0000 0000 2006 7510',
 'GND ID': '135728673',
 'Google Knowledge Graph ID': '/g/11mvrmlm7'}

An example for a geographic entity:

from enricher import enrich
enrich("Mannheim")
{'label': 'Mannheim',
 'description': 'city in Baden-Württemberg, Germany',
 'aliases': ['Mannem',
  'Monnem',
  'Universitätsstadt Mannheim',
  'Mannheim, Germany',
  'Mannheim (Germany)',
  'Mannheim Germany'],
 'id': 'Q2119',
 'url': 'https://www.wikidata.org/wiki/Special:EntityData/Q2119',
 'GeoNames ID': '2873891',
 'Geographic coordinates': {'latitude': 49.48777777777778,
  'longitude': 8.466111111111111,
  'altitude': None,
  'precision': 0.0002777777777777778,
  'globe': 'http://www.wikidata.org/entity/Q2'},
 'OSM Relation ID': '62691',
 'German district key': '08222',
 'German municipality key': '08222000',
 'German regional key': '082220000000',
 'UN/LOCODE': 'DEMHG',
 'Freebase ID': '/m/0pf5y',
 'OpenStreetMap node ID': '240060919',
 'is_within_current_germany': True,
 'is_within_historical_germany_1886_1919': True,
 'is_within_historical_germany_1919_1920': True,
 'is_within_historical_germany_1920_1938': True,
 'is_within_historical_germany_1938_1945': True,
 'is_within_historical_GFR_1945_1949': True,
 'is_within_historical_GFR_1949_1990': True,
 'is_within_historical_GFR_1990_2019': True,
 'is_within_historical_GDR_1945_1949': False,
 'is_within_historical_GDR_1949_1990': False}

An example for an organization:

from enricher import enrich
enrich("BASF SE")
{'label': 'BASF',
 'description': 'German chemical company with worldwide reach',
 'aliases': ['Badische Anilin- & Soda-Fabrik',
  'Baden Aniline and Soda Factory',
  'BASF SE',
  'Badische Anilin- und Soda-Fabrik'],
 'id': 'Q9401',
 'url': 'https://www.wikidata.org/wiki/Special:EntityData/Q9401',
 'inception': {'time': '+1865-04-06T00:00:00Z',
  'timezone': 0,
  'before': 0,
  'after': 0,
  'precision': 11,
  'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'},
 'LEI code': '529900PM64WH8AF1E917',
 'GRID ID': 'grid.3319.8',
 'ISIN': 'DE000BASF111',
 'EU Transparency Register ID': '7410939793-88',
 'Freebase ID': '/m/01713t',
 'EU Research participant ID': '999829926',
 'German Lobbyregister ID': 'R002326',
 'LinkedIn organization ID': 'basf',
 'PermID': '4295869198',
 'PM20 folder ID': 'co/002589'}

Geographic linking

For geographic linking we use the geographic coordinates of an entity from Wikidata and check whether the point belongs to the boundaries of Germany using geojson files provided by the CShapes 2.0 Dataset. The historical geographic boundaries of Germany from the CShapes 2.0 Dataset can be found at https://demo.ldproxy.net under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

We use the following maps of Germany:

State Unique identifier Start date End date Source identifier Country capital
Germany (Prussia) 84 01/01/1886 27/06/1919 255 Berlin
Germany (Prussia) 85 28/06/1919 09/02/1920 255 Berlin
Germany (Prussia) 86 10/02/1920 29/09/1938 255 Berlin
Germany (Prussia) 87 30/09/1938 07/05/1945 255 Berlin
German Federal Republic 88 08/05/1945 20/09/1949 260 Bonn
German Federal Republic 89 21/09/1949 02/10/1990 260 Bonn
German Federal Republic 90 03/10/1990 31/12/2019 260 Berlin
German Democratic Republic 91 08/05/1945 04/10/1949 265 East Berlin
German Democratic Republic 92 05/10/1949 02/10/1990 265 East Berlin

If you use kg-enricher on geographic entities, please cite the following paper due to the license of the CShapes 2.0 Dataset: Schvitz, G., Girardin, L., Rüegger, S., Weidmann, N. B., Cederman, L.-E., & Gleditsch, K. S. (2022). Mapping the International System, 1886-2019: The CShapes 2.0 Dataset. Journal of Conflict Resolution, 66(1), 144-161. https://doi.org/10.1177/00220027211013563.

An example for "West Berlin":

from enricher import enrich
enrich("West Berlin")
{'label': 'West Berlin',
 'description': 'the Western sectors of Berlin between 1945 and 1990',
 'aliases': ['Berlin (West)', 'Westberlin', 'WB'],
 'id': 'Q56036',
 'url': 'https://www.wikidata.org/wiki/Special:EntityData/Q56036',
 'GeoNames ID': '11612751',
 'Geographic coordinates': {'latitude': 52.5,
  'longitude': 13.28,
  'altitude': None,
  'precision': 0.0002777777777777778,
  'globe': 'http://www.wikidata.org/entity/Q2'},
 'Freebase ID': '/m/082g6',
 'is_within_current_germany': True,
 'is_within_historical_germany_1886_1919': True,
 'is_within_historical_germany_1919_1920': True,
 'is_within_historical_germany_1920_1938': True,
 'is_within_historical_germany_1938_1945': True,
 'is_within_historical_GFR_1945_1949': False,
 'is_within_historical_GFR_1949_1990': False,
 'is_within_historical_GFR_1990_2019': True,
 'is_within_historical_GDR_1945_1949': True,
 'is_within_historical_GDR_1949_1990': True}

An example for "East Berlin":

from enricher import enrich
enrich("East Berlin")
{'label': 'East Berlin',
 'description': 'Soviet sector of Berlin between 1949 and 1990',
 'aliases': ['Soviet zone of Berlin',
  'Berlin-Ost',
  'Ostberlin',
  'Soviet sector of Berlin',
  'Berlin, Hauptstadt der DDR',
  'Berlin Hauptstadt der DDR'],
 'id': 'Q56037',
 'url': 'https://www.wikidata.org/wiki/Special:EntityData/Q56037',
 'Geographic coordinates': {'latitude': 52.518611111111,
  'longitude': 13.404444444444,
  'altitude': None,
  'precision': None,
  'globe': 'http://www.wikidata.org/entity/Q2'},
 'Freebase ID': '/m/02lcc',
 'is_within_current_germany': True,
 'is_within_historical_germany_1886_1919': True,
 'is_within_historical_germany_1919_1920': True,
 'is_within_historical_germany_1920_1938': True,
 'is_within_historical_germany_1938_1945': True,
 'is_within_historical_GFR_1945_1949': False,
 'is_within_historical_GFR_1949_1990': False,
 'is_within_historical_GFR_1990_2019': True,
 'is_within_historical_GDR_1945_1949': True,
 'is_within_historical_GDR_1949_1990': True}

Archived code

Shigapov, R. (2023). KG-enricher: An open-source Python library for enriching strings, entities and knowledge graphs using Wikibase knowledge graphs (0.1.0). Zenodo. https://doi.org/10.5281/zenodo.10405073

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kg-enricher-0.1.6.tar.gz (88.5 kB view details)

Uploaded Source

Built Distribution

kg_enricher-0.1.6-py3-none-any.whl (111.7 kB view details)

Uploaded Python 3

File details

Details for the file kg-enricher-0.1.6.tar.gz.

File metadata

  • Download URL: kg-enricher-0.1.6.tar.gz
  • Upload date:
  • Size: 88.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.6

File hashes

Hashes for kg-enricher-0.1.6.tar.gz
Algorithm Hash digest
SHA256 84aaf3f7d15c7846dd5f63458bece429e549759e73bcced1cc54971fb54681c5
MD5 cc08b778cf5eff06540da29c72118e3c
BLAKE2b-256 47be0254f4bfcebe7378acd1c9e4cedde9d16c25709fafd99b532963870f2e70

See more details on using hashes here.

File details

Details for the file kg_enricher-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: kg_enricher-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 111.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.6

File hashes

Hashes for kg_enricher-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e4cc1768cbc6d2a026f77cd061ee64294e09287ead85535cc66bf3459ba0faa6
MD5 3df6f988a13fd9e00205f2ab04841c96
BLAKE2b-256 14932edf8c7e26067b207aa7bce74a0272a412e7c30bf7d9018a1af4d6afc05c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page