Skip to main content

A minimal client for entity-fishing service.

Project description

HIRMEOS Entity Fishing client

http://img.shields.io/:license-apache-blue.svg https://travis-ci.org/hirmeos/entity-fishing-client-python.svg?branch=master

Python client to query the Entity Fishing service API developed in the context of the EU H2020 HIRMEOS project (WP3). For more information about entity-fishing, please check the Entity Fishing Documentation.

Installation

The client can be installed using pip:

pip install entity-fishing-client

Usage

Disambiguation

from nerd import nerd_client
client = nerd_client.NerdClient()

To disambiguate text (> 5 words):

client.disambiguate_text(
    "Linux is a name that broadly denotes a family of free and open-source software operating systems (OS) built around the Linux kernel."
)

To disambiguate a search query

client.disambiguate_query(
    "python method acronym concrete"
)

To process a PDF:

client.disambiguate_pdf(pdfFile, language)

you can supply the language (iso form of two digits, en, fr, etc..) and the entities (only for text) you already know, using the format:

{
    'offsetEnd': 5,
    'offsetStart': 0,
    'rawName': 'Linux',
    'wikidataId': 'Q388',
    'wikipediaExternalRef': 6097297
}

The response is a tuple where the first element is the response body as a dictionary and the second element the error code. Here an example:

 (
     {'entities': [
         {
             'domains': ['Computer_Science'],
             'nerd_score': 0.3753,
             'nerd_selection_score': 0.7268,
             'offsetEnd': 5,
             'offsetStart': 0,
             'rawName': 'Linux',
             'type': 'PERSON',
             'wikidataId': 'Q388',
             'wikipediaExternalRef': 6097297
         },
         {
             'domains': ['Computer_Science'],
             'nerd_score': 0.7442,
             'nerd_selection_score': 0.85,
             'offsetEnd': 78,
             'offsetStart': 49,
             'rawName': 'free and open-source software',
             'wikidataId': 'Q506883',
             'wikipediaExternalRef': 1721496
         },
         {
             'domains': ['Electrotechnology', 'Electronics',
             'Computer_Science'],
             'nerd_score': 0.7442,
             'nerd_selection_score': 0.4487,
             'offsetEnd': 96,
             'offsetStart': 79,
             'rawName': 'operating systems',
             'wikidataId': 'Q9135',
             'wikipediaExternalRef': 22194
         },
         {
             'domains': [
                 'Electrotechnology', 'Electronics', 'Computer_Science'
             ],
             'nerd_score': 0.7442,
             'nerd_selection_score': 0.4487,
             'offsetEnd': 100,
             'offsetStart': 98,
             'rawName': 'operating systems',
             'wikidataId': 'Q9135',
             'wikipediaExternalRef': 22194
         },
         {
             'domains': ['Electronics', 'Computer_Science'],
             'nerd_score': 0.743,
             'nerd_selection_score': 0.8383,
             'offsetEnd': 131,
             'offsetStart': 119,
             'rawName': 'Linux kernel',
             'wikidataId': 'Q14579',
             'wikipediaExternalRef': 21347315
         }
     ],
     'global_categories': [
         {'category': 'Finnish inventions',
         'page_id': 27421536,
         'source': 'wikipedia-en',
         'weight': 0.09684039970133569},
        {'category': 'Free software programmed in C',
         'page_id': 11241711,
         'source': 'wikipedia-en',
         'weight': 0.06433942787438053},
        {'category': 'Unix variants',
         'page_id': 10429397,
         'source': 'wikipedia-en',
         'weight': 0.09684039970133569},
        {'category': 'Operating systems',
         'page_id': 693664,
         'source': 'wikipedia-en',
         'weight': 0.12888888710813473},
        {'category': 'Free software',
         'page_id': 693287,
         'source': 'wikipedia-en',
         'weight': 0.06444444355406737},
        {'category': 'Free system software',
         'page_id': 6721544,
         'source': 'wikipedia-en',
         'weight': 0.06433942787438053},
        {'category': 'Software licenses',
         'page_id': 703100,
         'source': 'wikipedia-en',
         'weight': 0.06444444355406737},
        {'category': 'Linux kernel',
         'page_id': 13215678,
         'source': 'wikipedia-en',
         'weight': 0.06433942787438053},
        {'category': 'Monolithic kernels',
         'page_id': 10730969,
         'source': 'wikipedia-en',
         'weight': 0.06433942787438053},
        {'category': '1991 software',
         'page_id': 11167446,
         'source': 'wikipedia-en',
         'weight': 0.09684039970133569},
        {'category': 'Linus Torvalds',
         'page_id': 53479567,
         'source': 'wikipedia-en',
         'weight': 0.09684039970133569}
     ],
     'language': {'conf': 0.9999973266294648, 'lang': 'en'},
     'nbest': False,
     'onlyNER': False,
     'runtime': 107,
     'sentences': [{'offsetEnd': 132, 'offsetStart': 0}],
     'text': 'Linux is a name that broadly denotes a family of free and open-source software operating systems (OS) built around the Linux kernel.'
     },
     200
)

Batch processing

The batch processing is implemented in the class NerdBatch. The class can be instantiated by defining the entity-fishing url in the constructor, else the default one is used.

To run the processing, the method process requires the input directory, a callback and the number of threads/processes. There is an already ready implementation in script/batchSample.py.

To run it:

  • under this work branch, prepare two folders: input which containing the input Pdf files to be processed and output which collecting the processing result

  • we recommend to create a new virtualenv, activate it and install all the requirements needed in this virtual environment using $ pip install -r /path/of/entity-fishing-client-python/source/requirements.txt

  • (temporarly, until this branch is not merged) install entity-fishing multithread branch in edit mode (pip install -e /path/of/entity-fishing-client-python/source)

  • run it with python runFile.py input output 5

KB access

nerd.get_concept("Q456")

with response

(
   {
     'rawName': 'Lyon',
     'preferredTerm': 'Lyon',
     'nerd_score': 0,
     'nerd_selection_score': 0,
     'wikipediaExternalRef': 8638634,
     'wikidataId': 'Q456',
     'definitions': [
       {
         'definition': "'''Lyon''' ( or ;, locally: ; ), also known as ''Lyons'', is a city in east-central [[France]], in the [[Auvergne-Rhône-Alpes]] [[Regions of France|region]], about from [[Paris]], from [[Marseille]] and from [[Saint-Étienne]]. Inhabitants of the city are called ''Lyonnais''.",
         'source': 'wikipedia-en',
         'lang': 'en'
       }
     ],
     'domains': [
       'Geology',
       'Sociology'
     ],
     'categories': [
       {
         'source': 'wikipedia-en',
         'category': 'World Heritage Sites in France',
         'page_id': 1178961
       },
       [...]
     ],
     'multilingual': [
       {
         'lang': 'de',
         'term': 'Lyon',
         'page_id': 13964
       },
       {
         'lang': 'es',
         'term': 'Lyon',
         'page_id': 46490
       },
       {
         'lang': 'fr',
         'term': 'Lyon',
         'page_id': 802627
       },
       {
         'lang': 'it',
         'term': 'Lione',
         'page_id': 41786
       }
     ],
     'statements': [
       {
         'conceptId': 'Q456',
         'propertyId': 'P1082',
         'propertyName': 'population',
         'valueType': 'quantity',
         'value': {
           'amount': '+500716',
           'unit': '1',
           'upperBound': '+500717',
           'lowerBound': '+500715'
         }
       },
       {
         'conceptId': 'Q456',
         'propertyId': 'P1082',
         'propertyName': 'population',
         'valueType': 'quantity',
         'value': {
           'amount': '+500716',
           'unit': '1',
           'upperBound': '+500717',
           'lowerBound': '+500715'
         }
       },
       {
         'conceptId': 'Q456',
         'propertyId': 'P1464',
         'propertyName': 'category for people born here',
         'valueType': 'wikibase-item',
         'value': 'Q8061504'
       },
       {
         'conceptId': 'Q456',
         'propertyId': 'P190',
         'propertyName': 'sister city',
         'valueType': 'wikibase-item',
         'value': 'Q5687',
         'valueName': 'Jericho'
       },
       {
         'conceptId': 'Q456',
         'propertyId': 'P190',
         'propertyName': 'sister city',
         'valueType': 'wikibase-item',
         'value': 'Q2079',
         'valueName': 'Leipzig'
       },
       {
         'conceptId': 'Q456',
         'propertyId': 'P190',
         'propertyName': 'sister city',
         'valueType': 'wikibase-item',
         'value': 'Q580',
         'valueName': 'Łódź'
       },
       [...]
     ]
   },
   200
)

Utilities

Language detection

nerd.get_language("This is a sentence. This is a second sentence.")

with response:

(
   {
      'sentences':
      [
         {'offsetStart': 0, 'offsetEnd': 19},
         {'offsetStart': 19, 'offsetEnd': 46}
      ]
   },
   200
)

Segmentation

nerd.segment("This is a sentence. This is a second sentence.")

with response:

(
    {
        "lang": "en",
        "conf": 0.9
    },
    200
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

entity_fishing_client-0.7.6.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

entity_fishing_client-0.7.6-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file entity_fishing_client-0.7.6.tar.gz.

File metadata

  • Download URL: entity_fishing_client-0.7.6.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for entity_fishing_client-0.7.6.tar.gz
Algorithm Hash digest
SHA256 8319b0c7ed9e4fdcd95d1b203ce2f1994facd0b9ba152ede925ebbdb66977fb2
MD5 9590502f93b527acc4c1ad4ad2681614
BLAKE2b-256 9a14aaaadd925897eba26c97808f6a5c5f842f79881e57e672781f4da7b1ac62

See more details on using hashes here.

File details

Details for the file entity_fishing_client-0.7.6-py3-none-any.whl.

File metadata

File hashes

Hashes for entity_fishing_client-0.7.6-py3-none-any.whl
Algorithm Hash digest
SHA256 4766191a2a9e096c5eeae47db0053d98f3debde9972c2adb94e70d48a8d48e20
MD5 832ede0f3c9725c4b281ffc383869b27
BLAKE2b-256 030663b632b0da678299907e2c4f28c51c41f8dfe51bae16dda94bc58e1f291c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page