Link names, titles, organization, and subject indices with taxonomy entries.
Project description
RaRa Linker
rara-norm-linker is a Python library for linking personal names, organizations, geographical names, titles and keywords with taxonomy entries.
NB! Requires access to an Elasticsearch>=8.0 instance.
✨ Features
- Link personal names, organizations, geographical names, titles, and keywords with taxonomy entries.
- Use fuzzy matching for linking.
- Use vector search for filtering results.
- Use VIAF queries for enrichment.
⚡ Quick Start
Get started with rara-norm-linker in just a few steps:
-
Install the Package
Ensure you're using Python 3.10 or above, then run:pip install rara-norm-linker
-
Import and Use
Example usage to link entries with default configuration:from rara_linker.linkers.linker import Linker from pprint import pprint import logging # Disables logging, feel free to comment this out logging.disable(logging.CRITICAL) # Initialize Linker instance linker = Linker(add_viaf_info=True, vectorizer_or_dir_path="./vectorizer_data") entity = "Lennart Mere" linked_info = linker.link(entity) pprint(linked_info.to_dict())
⚙️ Installation Guide
Follow the steps below to install the rara-norm-linker package, either via pip or locally.
Installation via pip
Click to expand
-
Set Up Your Python Environment
Create or activate a Python environment using Python 3.10 or above. -
Install the Package
Run the following command:pip install rara-norm-linker
Local Installation
Follow these steps to install the rara-norm-linker package locally:
Click to expand
-
Clone the Repository
Clone the repository and navigate into it:git clone <repository-url> cd <repository-directory>
-
Set Up Python Environment
Create or activate a Python environment using Python 3.10 or above. E.g:conda create -n py310 python==3.10 conda activate py310
-
Install Build Package
Install thebuildpackage to enable local builds:pip install build
-
Build the Package
Run the following command inside the repository:python -m build
-
Install the Package
Install the built package locally:pip install .
🚀 Testing Guide
Follow these steps to test the rara-norm-linker package.
How to Test
Click to expand
-
Clone the Repository
Clone the repository and navigate into it:git clone <repository-url> cd <repository-directory>
-
Set Up Python Environment
Create or activate a Python environment using Python 3.10 or above. -
Install Build Package
Install thebuildpackage:pip install build
-
Build the Package
Build the package inside the repository:python -m build
-
Install with Testing Dependencies
Install the package along with its testing dependencies:pip install .[testing]
-
Run Tests
Run the test suite from the repository root:python -m pytest -v tests
📝 Documentation
Click to expand
🔍 Linker Class
Overview
Linker class combines 4 different classes (PersonLinker, OrganizationLinker, EMSLinker and LocationLinker) into a single workflow. This adds some flexibility to the input: the user doesn't necessarily have to know the type of the entity to link.
Importing
from rara_linker.linkers.linker import Linker
Class Parameters
| Name | Type | Optional | Default | Description |
|---|---|---|---|---|
| add_viaf_info | bool | True | False | If enabled, a query is made to VIAF to enrich the linked information. |
| vectorized_data_path | str | True | "./vectorizer_data" | Specifies the directory, where vectorization model's resources are downloaded. |
| per_config | dict | True | rara_linker.config.PER_CONFIG | Configuration of PersonLinker's fields. |
| org_config | dict | True | rara_linker.config.ORG_CONFIG | Configuration of OrganizationLinker's fields |
| loc_config | dict | True | rara_linker.config.LOC_CONFIG | Configuration of LocationLinker's fields |
| ems_config | dict | True | rara_linker.config.EMS_CONFIG | Configuration of EMSLinker's fields |
| title_config | dict | True | rara_linker.config.TITLE_CONFIG | Configuration of TitleLinker's fields |
NB! per_config, org_config, loc_config, title_config and ems_config have already pre-configured values. However, every single one of them can be overwritten, if necessary. For example, if the same data for linking personal names is uploaded into a new index, it is sufficient to just pass the new index with the configuration:
linker = Linker(
per_config={"es_index": "my_new_index"}
)
All possible configuration parameters are listed in the table below:
Configuration Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| es_host | str | True | Elasticsearch's URL, e.g. "http://localhost:9200" |
| es_index | str | True | Elasticsearch's index containing the norm data used for linking. |
| search_field | str | True | Field in Elasticsearch's index that is used for linking. NB! The value of the field has to be of type List[str], e.g. ["Contra", "Margus Konnula"]. |
| alt_search_field | str | False | Alternative search field to search_field. This is used as a backup, is no results were found from search_field. The information is contained in a separate field as somewhat different linking parameters might apply (e.g. acronyms for organizations, where fuzziness is restricted to 0). |
| key_field | str | True | Field containing the normalized value of the entity. |
| json_field | str | False | Field containing JSON version of MARC21-I (compatible with Sierra). |
| marc_field | str | False | Field containing MARC21-I |
| identifier_field | str | True | Field containing the identifier value. |
| vector_field | str | False | Field containing vectrized data. |
| viaf_field | str | False | VIAF search field, e.g. local.personalNames |
Key Functions
Function: link
Parameters
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| entity | str | True | - | Entity to link. |
| entity_type | str | False | - | Type of the entity. Specifying it will make the linking process a little bit faster and possibly more accurate. Allowed options are: ["PER", "ORG", "LOC", "EMS_KEYWORD"] |
| fuzziness | int | False | 2 | Maximum edit distance (Levenshtein distance) used for linking. NB! Allowed values are: 0, 1, 2. |
| prefix_length | int | False | 1 | Number of prefix symbols that need to match exactly. |
| context | str | False | None | Some contextual information about the entity. This will be vectorized and compared to the vectors stored in a corresponding Elasticsearch index to select the likeliest match in case multiple entities with the same similarity score are returned. |
| query_vector | List[float] | False | [] | Vector that will be compared to the vectors stored in a corresponding Elasticsearch index to select the likeliest match in case multiple entities with the same similarity score are returned. NB! If this is passed, param context will not be used, even if it is not empty / None. |
| min_similarity | float | False | 0.9 | Minimum required Jaro-Winkler distance. The matches not surpassing it, will NOT be returned, even if they pass the fuzziness threshold. |
Result
link will return a result of type LinkingResult.
LinkingResult Class
Attributes:
| Name | Type | Description |
|---|---|---|
| original_entity | str | The original entity passed to the function. |
| entity_type | str | Type of the linked entity. One of the following types: ["PER", "ORG", "LOC", "EMS_KEYWORD", "UNKNOWN"] |
| linked_info | List[LinkedDoc] | List of the linked entities. See LinkedDoc for more specific information. |
| linking_config | dict | The configuration used for linking. |
| n_linked | int | Number of linked entities. |
| similarity_score | float | Similarity score of the linked entity/entities. If multiple entities are returned, they all have the same similarity score as lower ones are always filtered out. |
Funtcions:
to_dict() - Converts all information stored in the class into a dictionary.
LinkedDoc Class
Attributes
| Name | Type | Description |
|---|---|---|
| viaf | dict | Information about the entity retrieved from VIAF. Contains two keys: parsed - "human-readable" parsed version of the VIAF response and "original" - the original VIAF record. |
| json | dict | All information about the entity in JSON format (converted directly from MARC21-I). |
| marc | str | All information about the entity in MARC21-I format. |
| linked_entity | str | Normalized entity. |
| elastic | dict | All information about the entity stored in Elasticsearch's index. |
| similarity_score | float | Similarity score of the linked entity. |
Funtcions:
to_dict() - Converts all information stored in the class into a dictionary.
Function: link_keywords
Parameters
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| keywords | List[dict] | True | - | Keywords to link. The keywords should have exactly the same format as rara-subject-indexer apply_indexer output with param flat=True. |
| use_viaf | bool | False | True | If enabled, VIAF queries are used for linking / enriching the output. NB! Overwrites Linker class instance param add_viaf_info. |
| main_taxonomy_lang | str | False | "et" | Main language of the taxonomies indexed into Elasticsearch (e.g. EMS etc). Expects ISO 639-1 compliant language code. |
| fuzziness | int | False | 2 | Maximum edit distance (Levenshtein distance) used for linking. NB! Allowed values are: 0, 1, 2. |
| prefix_length | int | False | 1 | Number of prefix symbols that need to match exactly. |
| context | str | False | None | Some contextual information about the entity. This will be vectorized and compared to the vectors stored in a corresponding Elasticsearch index to select the likeliest match in case multiple entities with the same similarity score are returned. |
| query_vector | List[float] | False | [] | Vector that will be compared to the vectors stored in a corresponding Elasticsearch index to select the likeliest match in case multiple entities with the same similarity score are returned. NB! If this is passed, param context will not be used, even if it is not empty / None. |
| min_similarity | float | False | 0.9 | Minimum required Jaro-Winkler distance. The matches not surpassing it, will NOT be returned, even if they pass the fuzziness threshold. |
Result
link_keywords will return a result of type List[dict].
Usage Examples
NB! The information stored in field "viaf" has slightly changed since the examples were generated!
Example 1: Simple Usage
from rara_linker.linkers.linker import Linker
from pprint import pprint
import logging
logging.disable(logging.CRITICAL)
linker = Linker(add_viaf_info=True, vectorizer_or_dir_path="./vectorizer_data")
entity = "Lennar Mere"
linked = linker.link(entity)
pprint(linked.to_dict())
Raw Output
{'entity_type': 'PER',
'linked_info': [{'elastic': {'birth_year': 1929,
'death_year': 2006,
'description': 'Eesti riigitegelane, aja- ja '
'kultuuriloolane ning esseist. '
'A-tel 1992-2001 EV president. '
'Tõlkija ja diplomaadi Georg Meri '
'(1900-1983) poeg, Hindrek Meri '
'vend',
'identifier': 'a11133193',
'identifier_source': 'ErRR',
'life_year': '1929-2006',
'link_variations': ['meri, lennart',
'meri, lennart-georg',
'lennart meri',
'meri, lennarts',
'lennart-georg meri',
'леннарт-георг мери',
'мери, леннарт-георг',
'мери, леннарт',
'леннарт мери',
'lennarts meri'],
'name': 'Meri, Lennart',
'name_in_cyrillic': False,
'name_specification': '',
'name_variations': ['Meri, Lennart-Georg',
'Meri, Lennarts',
'Мери, Леннарт',
'Мери, Леннарт-Георг'],
'source': 'Eesti kirjanike leksikon, 2000 ja EE, '
'14. kd., 2000'},
'json': {'fields': [{'001': 'a11133193'},
{'003': 'ErRR'},
{'008': '990104|n|adnnnaabn || '
'|a| '},
{'040': {'ind1': ' ',
'ind2': ' ',
'subfields': [{'a': 'ErRR'},
{'b': 'est'},
{'c': 'ErRR'},
{'d': 'ErTrtKR'}]}},
{'043': {'ind1': ' ',
'ind2': ' ',
'subfields': [{'c': 'ee'}]}},
{'100': {'ind1': '1',
'ind2': ' ',
'subfields': [{'a': 'Meri, '
'Lennart,'},
{'d': '1929-2006.'}]}},
{'400': {'ind1': '1',
'ind2': ' ',
'subfields': [{'a': 'Meri, '
'Lennart-Georg.'}]}},
{'400': {'ind1': '1',
'ind2': ' ',
'subfields': [{'a': 'Meri, '
'Lennarts.'}]}},
{'400': {'ind1': '1',
'ind2': ' ',
'subfields': [{'a': 'Мери, '
'Леннарт,'},
{'d': '1929-2006.'}]}},
{'400': {'ind1': '1',
'ind2': ' ',
'subfields': [{'a': 'Мери, '
'Леннарт-Георг,'},
{'d': '1929-2006.'}]}},
{'670': {'ind1': ' ',
'ind2': ' ',
'subfields': [{'a': 'Eesti '
'kirjanike '
'leksikon, '
'2000 ja '
'EE, 14. '
'kd., '
'2000.'}]}},
{'680': {'ind1': ' ',
'ind2': ' ',
'subfields': [{'i': 'Eesti '
'riigitegelane, '
'aja- ja '
'kultuuriloolane '
'ning '
'esseist. '
'A-tel '
'1992-2001 '
'EV '
'president. '
'Tõlkija ja '
'diplomaadi '
'Georg Meri '
'(1900-1983) '
'poeg, '
'Hindrek '
'Meri '
'vend.'}]}}],
'leader': '00654nz a2200169n 4500'},
'linked_entity': 'Meri, Lennart',
'marc': '=LDR 00654nz a2200169n 4500\n'
'=001 a11133193\n'
'=003 ErRR\n'
'=008 '
'990104|n|adnnnaabn\\\\\\\\\\\\\\\\\\\\||\\|a|\\\\\\\\\\\\\n'
'=040 \\\\$aErRR$best$cErRR$dErTrtKR\n'
'=043 \\\\$cee\n'
'=100 1\\$aMeri, Lennart,$d1929-2006.\n'
'=400 1\\$aMeri, Lennart-Georg.\n'
'=400 1\\$aMeri, Lennarts.\n'
'=400 1\\$aМери, Леннарт,$d1929-2006.\n'
'=400 1\\$aМери, Леннарт-Георг,$d1929-2006.\n'
'=670 \\\\$aEesti kirjanike leksikon, 2000 ja EE, '
'14. kd., 2000.\n'
'=680 \\\\$iEesti riigitegelane, aja- ja '
'kultuuriloolane ning esseist. A-tel 1992-2001 EV '
'president. Tõlkija ja diplomaadi Georg Meri '
'(1900-1983) poeg, Hindrek Meri vend.\n',
'similarity_score': 0.9484848484848484,
'viaf': {'message': '/api/search Successfully reached!',
'queryResult': {'echoedSearchRetrieveRequest': {'maximumRecords': {'type': 'xsd:nonNegativeInteger',
'value': 50},
'query': {'type': 'xsd:string',
'value': 'local.personalNames '
'all '
'"a11133193"'},
'recordPacking': {'type': 'xsd:string',
'value': 'xml'},
'recordSchema': {'type': 'xsd:string',
'value': 'BriefVIAF'},
'sortKeys': {'type': 'xsd:string',
'value': 'holdingscount'},
'startRecord': {'type': 'xsd:positiveInteger',
'value': 1},
'type': 'ns2:echoedSearchRetrieveRequestType',
'version': {'type': 'xsd:string',
'value': 1.1},
'xQuery': {'searchClause': {'index': {'type': 'xsd:string',
'value': 'local.personalNames'},
'relation': {'type': 'ns3:relationType',
'value': [{'value': 'all'}]},
'term': {'type': 'xsd:string',
'value': 'a11133193'},
'type': 'ns3:searchClauseType'}}},
'extraResponseData': {'extraData': {'databaseTitle': 'VIAF: '
'The '
'Virtual '
'International '
'Authority '
'File'},
'type': 'ns4:extraDataType'},
'numberOfRecords': {'type': 'xsd:nonNegativeInteger',
'value': 1},
'records': {'record': [{'recordData': {'VIAFCluster': {'mainHeadings': {'data': [{'sources': {'s': ['DNB',
'NKC',
'LIH',
'ERRR',
'NUKAT',
'SUDOC',
'LNB',
'PLWABN',
'BNF',
'NTA',
'SZ',
'SELIBR'],
'sid': ['DNB|11930063X',
'NKC|jo2003190730',
'LIH|LNB:_b_D_o_;=B_l_',
'ERRR|a11133193',
'NUKAT|n '
'2004098863',
'SUDOC|031225608',
'LNB|LNC10-000015183',
'PLWABN|9810611048005606',
'BNF|12248233',
'NTA|070005389',
'SZ|11930063X',
'SELIBR|tb4zmtkqrlq5bsk2']},
'text': 'Meri, '
'Lennart, '
'1929-2006.'},
{'sources': {'s': ['BIBSYS',
'LC',
'RERO',
'NYNYRILM',
'NLA',
'NII'],
'sid': ['BIBSYS|90264114',
'LC|n '
'88622494',
'RERO|A005679822',
'NYNYRILM|389279',
'NLA|000049286517',
'NII|DA19756873']},
'text': 'Meri, '
'Lennart'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q153149']},
'text': 'Lennart-Georg '
'Meri'},
{'sources': {'s': ['ISNI'],
'sid': ['ISNI|0000000078320195']},
'text': 'Lennart '
'Meri'}]},
'nameType': 'Personal',
'titles': {'work': [{'sources': {'s': ['BNF'],
'sid': ['BNF|12248233']},
'title': '1940-1988 '
'(§ '
'58)'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Ahvide '
'planeet'},
{'sources': {'s': ['DNB'],
'sid': ['DNB|11930063X']},
'title': 'Baltikum '
'- '
'Prüfstein '
'für '
'die '
'Union '
'Europas'},
{'sources': {'s': ['DNB',
'LC'],
'sid': ['DNB|11930063X',
'LC|n '
'88622494']},
'title': 'Botschaften '
'und '
'Zukunftsvisionen '
'Reden '
'des '
'estnischen '
'Präsidenten'},
{'sources': {'s': ['SUDOC'],
'sid': ['SUDOC|031225608']},
'title': 'Dans '
'le '
'silence '
'des '
'glaces'},
{'sources': {'s': ['SUDOC'],
'sid': ['SUDOC|031225608']},
'title': 'Deportation '
'from '
'Estonia '
'to '
'Russia '
': '
'deportation '
'in '
'March '
'1949'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Eesti '
'identiteet '
'ja '
'iseseisvus'},
{'sources': {'s': ['NLA',
'LC'],
'sid': ['NLA|000049286517',
'LC|n '
'88622494']},
'title': 'Eesti '
'kirjanduse '
'biog. '
'leks., '
'1975:'},
{'sources': {'s': ['LC'],
'sid': ['LC|n '
'88622494']},
'title': 'Eesti '
'maailmas '
'21. '
'sajandi '
'künnisel '
': '
'Eesti '
'Vabariigi '
'presidendi '
'Lennart '
'Meri '
'70. '
'sünnipäevale '
'pühendatud '
'konverentsi '
'kogumik '
': '
'Tartus, '
'27. '
'märtsil '
'1999.'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Eesti '
'rahva '
'elulood.'},
{'sources': {'s': ['DNB',
'SUDOC'],
'sid': ['DNB|11930063X',
'SUDOC|031225608']},
'title': 'Es '
'zog '
'uns '
'nach '
'Kamtschatka'},
{'sources': {'s': ['ERRR',
'LC'],
'sid': ['ERRR|a11133193',
'LC|n '
'88622494']},
'title': 'Freedom '
'through '
'democracy, '
'security '
'and '
'unity '
'in '
'diversity '
': '
'memorable '
'words '
'of '
'Lennart '
'Meri, '
'President '
'of '
'the '
'Republic '
'of '
'Estonia, '
'from '
'his '
'speeches '
'1992-2001'},
{'sources': {'s': ['PLWABN',
'NUKAT'],
'sid': ['PLWABN|9810611048005606',
'NUKAT|n '
'2004098863']},
'title': 'Gorące '
'wodospady '
'/ '
'Lennart '
'Meri. '
'- '
'Warszawa, '
'1971.'},
{'id': 'VIAF|5551155832947733490005',
'sources': {'s': ['XR',
'ERRR',
'BIBSYS',
'LC',
'SUDOC',
'LNB',
'WKP',
'BNF',
'NTA',
'NII'],
'sid': ['XR|VIAFWORKLCno2019067182',
'ERRR|a11133193',
'BIBSYS|90264114',
'LC|n '
'88622494',
'SUDOC|031225608',
'LNB|LNC10-000015183',
'WKP|Q153149',
'BNF|12248233',
'NTA|070005389',
'NII|DA19756873']},
'title': 'Hõbevalge.'},
{'sources': {'s': ['SUDOC',
'ERRR'],
'sid': ['SUDOC|031225608',
'ERRR|a11133193']},
'title': 'Hopeanvalkea '
': '
'matka '
'menneeseen '
'oppaina '
'aurinko, '
'fantasia '
'ja '
'folklore'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Ieškant '
'prarastos '
'šypsenos '
': '
'dienoraštis '
'apie '
'kelionę '
'į '
'160 '
'meridianą'},
{'id': 'VIAF|309479565',
'sources': {'s': ['XR'],
'sid': ['XR|VIAFWORKLCn '
'00025583']},
'title': 'Ilmamaa.'},
{'sources': {'s': ['ERRR',
'LNB',
'BNF',
'LC'],
'sid': ['ERRR|a11133193',
'LNB|LNC10-000015183',
'BNF|12248233',
'LC|n '
'88622494']},
'title': 'Kaks '
'ajalugu, '
'seljad '
'vastamisi '
': '
'aulaloeng, '
'14. '
'mail '
'1996'},
{'sources': {'s': ['SUDOC',
'ERRR'],
'sid': ['SUDOC|031225608',
'ERRR|a11133193']},
'title': 'Kamtšatka '
': '
'tulivuorten '
'maa'},
{'sources': {'s': ['ERRR',
'LNB'],
'sid': ['ERRR|a11133193',
'LNB|LNC10-000015183']},
'title': 'Kāvu '
'vārtos'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Keskpäevane '
'praam'},
{'sources': {'s': ['SUDOC',
'ERRR'],
'sid': ['SUDOC|031225608',
'ERRR|a11133193']},
'title': 'Kobrade '
'ja '
'karakurtide '
'jälgedes '
': '
'Kesk-Aasia '
'matkamärkmeid'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Kobrák '
'és '
'karakurtok '
'nyomában.'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Kommunismi '
'must '
'raamat '
': '
'kuriteod, '
'terror, '
'repressioonid'},
{'sources': {'s': ['SUDOC',
'ERRR',
'BNF'],
'sid': ['SUDOC|031225608',
'ERRR|a11133193',
'BNF|12248233']},
'title': 'Küüditamine '
'Eestist '
'Venemaale '
': '
'märtsiküüditamine '
'1949.'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Laevapoisid '
'rohelisel '
'ookeanil'},
{'sources': {'s': ['ERRR',
'NLA',
'LC',
'LNB'],
'sid': ['ERRR|a11133193',
'NLA|000049286517',
'LC|n '
'88622494',
'LNB|LNC10-000015183']},
'title': 'Lähenevad '
'rannad '
': '
'reisid '
'130. '
'ja '
'160. '
'meridiaani '
'vahel'},
{'sources': {'s': ['DNB'],
'sid': ['DNB|11930063X']},
'title': 'Lennart '
'Meri '
'1929 '
'- '
'2006 '
'; '
'spezial'},
{'sources': {'s': ['SUDOC',
'ERRR',
'RERO'],
'sid': ['SUDOC|031225608',
'ERRR|a11133193',
'RERO|A005679822']},
'title': 'Lennart '
'Meri, '
'ein '
'Leben '
'für '
'Estland '
': '
'Dialog '
'mit '
'dem '
'Präsidenten'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Le '
'livre '
'noir '
'du '
'communisme.'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Look '
'back '
'in '
'anger '
': '
'a '
'play '
'in '
'three '
'acts.'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Maailm '
'ja '
'meie'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Mees, '
'kes '
'käis '
'läbi '
'seina '
': '
'[novellid, '
'näidend]'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Meie '
'mees '
'Havannas'},
{'sources': {'s': ['NKC'],
'sid': ['NKC|jo2003190730']},
'title': 'Most '
'v '
'beloje '
'bezmolvije'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Our '
'man '
'in '
'Havanna.'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|12248233']},
'title': 'peinture '
'estonienne '
'au '
'XXème '
'siècle'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Piirideta '
'maailm '
': '
'valmistumine '
'21. '
'sajandi '
'kapitalismiks'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Piirilinn '
'Mõisaküla.'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'La '
'planète '
'des '
'singes.'},
{'sources': {'s': ['NKC',
'ERRR'],
'sid': ['NKC|jo2003190730',
'ERRR|a11133193']},
'title': 'Pod '
'klenbou '
'polární '
'záře'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Põhiseadus '
'ja '
'Põhiseaduse '
'Assamblee '
': '
'koguteos'},
{'id': 'VIAF|9345163464740205680007',
'sources': {'s': ['ERRR',
'SUDOC',
'LC',
'XR',
'NTA',
'LNB'],
'sid': ['ERRR|a11133193',
'SUDOC|031225608',
'LC|n '
'88622494',
'XR|VIAFWORK1269735045',
'NTA|070005389',
'LNB|LNC10-000015183']},
'title': 'Poliitiline '
'testament'},
{'sources': {'s': ['SUDOC',
'ERRR',
'BNF'],
'sid': ['SUDOC|031225608',
'ERRR|a11133193',
'BNF|12248233']},
'title': 'Poliitilised '
'arreteerimised '
'Eestis.'},
{'sources': {'s': ['SUDOC',
'ERRR',
'BNF'],
'sid': ['SUDOC|031225608',
'ERRR|a11133193',
'BNF|12248233']},
'title': 'Political '
'arrests '
'in '
'Estonia'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Population '
'ageing '
'in '
'Estonia.'},
{'id': 'VIAF|316341491',
'sources': {'s': ['ERRR',
'BNF',
'LC',
'SUDOC',
'XR',
'NTA',
'LNB',
'WKP'],
'sid': ['ERRR|a11133193',
'BNF|12248233',
'LC|n '
'88622494',
'SUDOC|031225608',
'XR|VIAFWORK243846008',
'NTA|070005389',
'LNB|LNC10-000015183',
'WKP|Q153149']},
'title': 'Presidendikôned'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Raudne '
'kodu '
': '
'Evald '
'Tammlaane '
'draama '
'lauludega '
'3-es '
'vaatuses '
'proloogi '
'ja '
'epiloogiga'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|12248233']},
'title': 'Reisikiri '
'suurest '
'paugust, '
'tuulest '
'ja '
'muinasluulest'},
{'sources': {'s': ['SUDOC',
'ERRR'],
'sid': ['SUDOC|031225608',
'ERRR|a11133193']},
'title': 'Revontulten '
'porteilla'},
{'sources': {'s': ['LNB'],
'sid': ['LNB|LNC10-000015183']},
'title': 'Rīgas '
'Balss '
'15.marts '
'(Nr.53), '
'2006:'},
{'sources': {'s': ['ERRR',
'NTA'],
'sid': ['ERRR|a11133193',
'NTA|070005389']},
'title': 'Riigimured'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Sajandi '
'sada '
'elulugu '
'kahes '
'osas.'},
{'sources': {'s': ['ERRR',
'NTA'],
'sid': ['ERRR|a11133193',
'NTA|070005389']},
'title': 'Šamaan'},
{'sources': {'s': ['ERRR',
'NTA'],
'sid': ['ERRR|a11133193',
'NTA|070005389']},
'title': 'Soome-ugri '
'rahvaste '
'filmientsüklopeedia '
': '
'viis '
'dokumentaalfilm, '
'1970-1997 '
'= '
'Encyclopaedia '
'cinematographica '
'gentium '
'Fenno-Ugricarum '
': '
'five '
'documentaries, '
'1970-1997'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Soome-ugri '
'rahvaste '
'VI '
'folkloorifestival '
'Eestis '
'17.-21. '
'VII '
'1997 '
'= '
'VI '
'финно-угорский '
'фольклорный '
'фестиваль '
'в '
'Эстонии'},
{'id': 'VIAF|6246156497273917740008',
'sources': {'s': ['LC'],
'sid': ['LC|n '
'2019043521']},
'title': 'Speeches. '
'Selections'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Tallinna '
'saladused'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Teise '
'mehe '
'pea '
': '
'näidend '
'4 '
'vaatuses'},
{'sources': {'s': ['ERRR',
'NTA'],
'sid': ['ERRR|a11133193',
'NTA|070005389']},
'title': 'Toorumi '
'pojad '
'Hantide '
'karupeied '
': '
'[dokumentaalfilm '
'hantide '
'rahvakommetest]'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Tõotan '
'ustavaks '
'jääda... '
': '
'Eesti '
'Vabariigi '
'Valitsus '
'1940-1992'},
{'sources': {'s': ['LC'],
'sid': ['LC|n '
'88622494']},
'title': 'Tri '
'baĭdarki '
'v '
'zelenom '
'okeane.'},
{'id': 'VIAF|309516186',
'sources': {'s': ['ERRR',
'PLWABN',
'LNB',
'XR'],
'sid': ['ERRR|a11133193',
'PLWABN|9810611048005606',
'LNB|LNC10-000015183',
'XR|VIAFWORKLCn '
'2006019349']},
'title': 'Tulemägede '
'maale.'},
{'id': 'VIAF|307144592',
'sources': {'s': ['XR'],
'sid': ['XR|VIAFWORK77318117']},
'title': 'Tulemägede '
'maale; '
'reisipäevik '
'160. '
'meridiaanilt'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Tulen '
'maasta, '
'jonka '
'nimi '
'on '
'Viro'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Tuli '
'ei '
'kustu'},
{'sources': {'s': ['NKC'],
'sid': ['NKC|jo2003190730']},
'title': "Udivitel'nyj "
'čelovek '
': '
'Kniga '
'putešestvij'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Üks '
'päev '
'Ivan '
'Denissovitši '
'elus '
': '
'[jutustus]'},
{'sources': {'s': ['NKC',
'NUKAT'],
'sid': ['NKC|jo2003190730',
'NUKAT|n '
'2004098863']},
'title': 'V '
'poiskah '
'poterânnoj '
'ulybki'},
{'sources': {'s': ['NTA'],
'sid': ['NTA|070005389']},
'title': 'Veelinnurahvas'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Viimne '
'reliikvia'},
{'id': 'VIAF|307013366',
'sources': {'s': ['SUDOC',
'ERRR',
'XR',
'BIBSYS',
'LNB'],
'sid': ['SUDOC|031225608',
'ERRR|a11133193',
'XR|VIAFWORK249662780',
'BIBSYS|90264114',
'LNB|LNC10-000015183']},
'title': 'Virmaliste '
'väraval'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Virolais-suomalaiset '
'laulu- '
'ja '
'soittojuhlat '
'Tallinnassa'},
{'sources': {'s': ['PLWABN'],
'sid': ['PLWABN|9810611048005606']},
'title': 'Z '
'góry '
'więcej '
'widać'},
{'sources': {'s': ['ERRR',
'LNB'],
'sid': ['ERRR|a11133193',
'LNB|LNC10-000015183']},
'title': 'В '
'поисках '
'потерянной '
'улыбки '
': '
'(дневник '
'путешествия '
'к '
'160-му '
'меридиану)'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Исследование '
'природы '
'Дальнего '
'Востока'},
{'sources': {'s': ['ERRR',
'LNB'],
'sid': ['ERRR|a11133193',
'LNB|LNC10-000015183']},
'title': 'Мост '
'в '
'белое '
'безмолвие'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Один '
'день '
'Ивана '
'Денисовича.'},
{'sources': {'s': ['LNB'],
'sid': ['LNB|LNC10-000015183']},
'title': 'Республика '
'(Вильнюс), '
'(28 '
'июня '
'1999):'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Самоопределение '
'и '
'независимость '
'Эстонии'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Серебристо-белый '
'путь '
'Леннарта '
'Мери '
'путевые '
'заметки. '
'Выступления '
'и '
'интервью. '
'Леннарт '
'Мери '
'глазами '
'друзей'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a11133193']},
'title': 'Удивительный '
'человек '
': '
'книга '
'путешествий '
': '
'произведения '
'Вольдемара '
'Пансо '
'и '
'Леннарта '
'Мери '
'в '
'переводе '
'с '
'эстонского '
'В. '
'Рубер'}]},
'viafID': 84153775},
'type': 'ns1:stringOrXmlFragment'},
'recordPacking': {'type': 'xsd:string',
'value': 'xml'},
'recordPosition': {'type': 'xsd:positiveInteger',
'value': 1},
'recordSchema': {'type': 'xsd:string',
'value': 'http://viaf.org/BriefVIAFCluster'},
'type': 'ns1:recordType'}],
'type': 'ns1:recordsType'},
'resultSetIdleTime': {'type': 'xsd:positiveInteger',
'value': 1},
'schemaLocation': 'http://www.loc.gov/zing/srw/ '
'http://www.loc.gov/standards/sru/sru1-1archive/xml-files/srw-types.xsd',
'version': {'type': 'xsd:string',
'value': 1.1}}}}],
'linking_config': {'add_viaf_info': True,
'context': None,
'entity': 'Lennar Mere',
'fuzziness': 2,
'min_similarity': 0.9,
'prefix_length': 1},
'n_linked': 1,
'original_entity': 'Lennar Mere',
'similarity_score': 0.9484848484848484}
Example 2: Linker returns multiple matches
from rara_linker.linkers.linker import Linker
from pprint import pprint
import logging
logging.disable(logging.CRITICAL)
linker = Linker(add_viaf_info=True, vectorizer_or_dir_path="./vectorizer_data")
entity = "Paul Keres"
linked = linker.link(entity)
# Output formatting:
print(f"Original entity: {linked.original_entity}")
print(f"Entity type: {linked.entity_type}")
print(f"Number of matches: {linked.n_linked}")
print(f"Similarity: {linked.similarity_score}")
for entity_info in linked.linked_info:
print()
print(f"Linked entity: {entity_info.linked_entity}")
print(f"Description: {entity_info.elastic['description']}")
print()
Output:
Original entity: Paul Keres
Entity type: PER
Number of matches: 2
Similarity: 1.0
Linked entity: Keres, Paul
Description: Eesti maletaja ja maleteoreetik
Linked entity: Keres, Paul
Description: Eesti advokaat
Example 3: Add some context for vector-based filtering
from rara_linker.linkers.linker import Linker
from pprint import pprint
import logging
logging.disable(logging.CRITICAL)
linker = Linker(add_viaf_info=True, vectorizer_or_dir_path="./vectorizer_data")
entity = "Paul Keres"
context = "Viimsis selgusid 53. maleturniiri medalistid."
linked = linker.link(entity, context=context)
# Output formatting:
print(f"Original entity: {linked.original_entity}")
print(f"Entity type: {linked.entity_type}")
print(f"Number of matches: {linked.n_linked}")
print(f"Similarity: {linked.similarity_score}")
for entity_info in linked.linked_info:
print()
print(f"Linked entity: {entity_info.linked_entity}")
print(f"Description: {entity_info.elastic['description']}")
print()
Output:
Original entity: Paul Keres
Entity type: PER
Number of matches: 1
Similarity: 1.0
Linked entity: Keres, Paul
Description: Eesti maletaja ja maleteoreetik
Example 4: Link a geographical name
from rara_linker.linkers.linker import Linker
from pprint import pprint
import logging
logging.disable(logging.CRITICAL)
linker = Linker(add_viaf_info=True, vectorizer_or_dir_path="./vectorizer_data")
entity = "Reval"
linked = linker.link(entity)
# Output formatting:
print(f"Original entity: {linked.original_entity}")
print(f"Entity type: {linked.entity_type}")
print(f"Number of matches: {linked.n_linked}")
print(f"Similarity: {linked.similarity_score}")
for entity_info in linked.linked_info:
print()
print(f"Linked entity: {entity_info.linked_entity}")
print()
Output:
Original entity: Reval
Entity type: LOC
Number of matches: 1
Similarity: 1.0
Linked entity: Tallinn
Example 5: Link an organization
from rara_linker.linkers.linker import Linker
from pprint import pprint
import logging
logging.disable(logging.CRITICAL)
linker = Linker(add_viaf_info=True, vectorizer_or_dir_path="./vectorizer_data")
entity = "NASA"
linked = linker.link(entity)
# Output formatting:
print(f"Original entity: {linked.original_entity}")
print(f"Entity type: {linked.entity_type}")
print(f"Number of matches: {linked.n_linked}")
print(f"Similarity: {linked.similarity_score}")
for entity_info in linked.linked_info:
print()
print(f"Linked entity: {entity_info.linked_entity}")
print(f"Description: {entity_info.elastic['description']}")
print()
Output:
Original entity: NASA
Entity type: ORG
Number of matches: 1
Similarity: 1.0
Linked entity: United States National Aeronautics and Space Administration
Description: USA riiklik kosmose uurimise ja kosmonautika arendamise organisatsioon. Asutatud 1958. a., keskus Washingtonis
Example 6: Link a keyword
from rara_linker.linkers.linker import Linker
from pprint import pprint
import logging
logging.disable(logging.CRITICAL)
linker = Linker(add_viaf_info=True, vectorizer_or_dir_path="./vectorizer_data")
entity = "cinema"
linked = linker.link(entity)
# Output formatting:
print(f"Original entity: {linked.original_entity}")
print(f"Entity type: {linked.entity_type}")
print(f"Number of matches: {linked.n_linked}")
print(f"Similarity: {linked.similarity_score}")
for entity_info in linked.linked_info:
print()
print(f"Linked entity: {entity_info.linked_entity}")
print()
Output:
Original entity: cinema
Entity type: EMS_KEYWORD
Number of matches: 1
Similarity: 0.9714285714285714
Linked entity: kinod
Example 7: Link one-word pseudonyms / stage names
from rara_linker.linkers.linker import Linker
from pprint import pprint
import logging
logging.disable(logging.CRITICAL)
linker = Linker(add_viaf_info=True, vectorizer_or_dir_path="./vectorizer_data")
entity = "Shakira"
linked = linker.link(entity)
# Output formatting:
print(f"Original entity: {linked.original_entity}")
print(f"Entity type: {linked.entity_type}")
print(f"Number of matches: {linked.n_linked}")
print(f"Similarity: {linked.similarity_score}")
for entity_info in linked.linked_info:
print()
print(f"Linked entity: {entity_info.linked_entity}")
print(f"Description: {entity_info.elastic['description']}")
print()
Output:
Original entity: Shakira
Entity type: PER
Number of matches: 1
Similarity: 1.0
Linked entity: Shakira
Description: Colombia laulja ja laulukirjutaja. Täisnimi Shakira Isabel Mebarak Ripoll
Example 8: Link a single surname
from rara_linker.linkers.linker import Linker
from pprint import pprint
import logging
logging.disable(logging.CRITICAL)
linker = Linker(add_viaf_info=True, vectorizer_or_dir_path="./vectorizer_data")
# As linking only first names / surnames is not supported,
# no matches should be returned
entity = "Bulgakov"
linked = linker.link(entity)
# Output formatting:
print(f"Original entity: {linked.original_entity}")
print(f"Entity type: {linked.entity_type}")
print(f"Number of matches: {linked.n_linked}")
print(f"Similarity: {linked.similarity_score}")
for entity_info in linked.linked_info:
print()
print(f"Linked entity: {entity_info.linked_entity}")
print(f"Description: {entity_info.elastic['description']}")
print()
Output:
Original entity: Bulgakov
Entity type: UNKNOWN
Number of matches: 0
Similarity: 0
Example 9: Link keywords
from rara_linker.linkers.linker import Linker
from pprint import pprint
import logging
logging.disable(logging.CRITICAL)
linker = Linker(add_viaf_info=True, vectorizer_or_dir_path="./vectorizer_data")
keywords = [
{
"keyword": "demokraatia",
"entity_type": "Teemamärksõnad",
"score": 0.131,
"model_arch": "omikuji"
},
{
"keyword": "sotsiaaluuringud",
"entity_type": "Teemamärksõnad",
"score": 0.113,
"model_arch": "omikuji"
},
{
"keyword": "sotsiaaldemokraat",
"entity_type": "Teemamärksõnad",
"score": 0.235,
"model_arch": "rakun"
},
{
"keyword": "regulatsioon",
"entity_type": "Teemamärksõnad",
"score": 0.156,
"model_arch": "rakun"
},
{
"keyword": "koalitsioon",
"entity_type": "Teemamärksõnad",
"score": 0.126,
"model_arch": "rakun"
},
{
"keyword": "valitsus",
"entity_type": "Teemamärksõnad",
"score": 0.089,
"model_arch": "rakun"
},
{
"keyword": "reformierakondlane",
"entity_type": "Teemamärksõnad",
"score": 0.065,
"model_arch": "rakun"
},
{
"keyword": "20. sajand",
"entity_type": "Ajamärksõnad",
"score": 0.294,
"model_arch": "omikuji"
},
{
"keyword": "e-raamatud",
"entity_type": "Vormimärksõnad",
"score": 0.436,
"model_arch": "omikuji"
},
{
"keyword": "Kristina Kallas",
"entity_type": "Isikunimi",
"score": 1,
"count": 10,
"method": "ner_ensemble",
"model_arch": "ner"
},
{
"keyword": "Tanel Kiik",
"entity_type": "Isikunimi",
"score": 0.6,
"count": 6,
"method": "ner_ensemble",
"model_arch": "ner"
},
{
"keyword": "AJAKIRJANDUS. KOMMUNIKATSIOON. MEEDIA. REKLAAM",
"entity_type": "Valdkonnamärksõnad",
"score": 0.235,
"model_arch": "omikuji"
}
]
linked_keywords = linker.link_keywords(keywords)
pprint(linker_keywords)
Output:
[
{
"count": null,
"dates": "",
"entity_type": "Teemamärksõnad",
"indicator1": " ",
"indicator2": "4",
"is_linked": true,
"keyword": "demokraatia",
"keyword_source": "EMS",
"lang": "et",
"location": "",
"marc_field": 650,
"method": null,
"model_arch": "omikuji",
"numeration": "",
"organisation_sub_unit": "",
"original_keyword": "demokraatia",
"persons_title": "",
"score": 0.131,
"url": "https://ems.elnet.ee/id/EMS001665",
"url_source": "EMS",
"author": ""
},
{
"count": null,
"dates": "",
"entity_type": "Teemamärksõnad",
"indicator1": " ",
"indicator2": "4",
"is_linked": true,
"keyword": "sotsiaaluuringud",
"keyword_source": "EMS",
"lang": "et",
"location": "",
"marc_field": 650,
"method": null,
"model_arch": "omikuji",
"numeration": "",
"organisation_sub_unit": "",
"original_keyword": "sotsiaaluuringud",
"persons_title": "",
"score": 0.113,
"url": "https://ems.elnet.ee/id/EMS006454",
"url_source": "EMS",
"author": ""
},
{
"count": null,
"dates": "",
"entity_type": "Teemamärksõnad",
"indicator1": " ",
"indicator2": "4",
"is_linked": true,
"keyword": "sotsiaaldemokraadid",
"keyword_source": "EMS",
"lang": "et",
"location": "",
"marc_field": 650,
"method": null,
"model_arch": "rakun",
"numeration": "",
"organisation_sub_unit": "",
"original_keyword": "sotsiaaldemokraat",
"persons_title": "",
"score": 0.235,
"url": "https://ems.elnet.ee/id/EMS142984",
"url_source": "EMS",
"author": ""
},
{
"count": null,
"dates": "",
"entity_type": "Teemamärksõnad",
"indicator1": " ",
"indicator2": "4",
"is_linked": true,
"keyword": "regulatsioonid (vormimärksõna)",
"keyword_source": "EMS",
"lang": "et",
"location": "",
"marc_field": 650,
"method": null,
"model_arch": "rakun",
"numeration": "",
"organisation_sub_unit": "",
"original_keyword": "regulatsioon",
"persons_title": "",
"score": 0.156,
"url": "https://ems.elnet.ee/id/EMS171061",
"url_source": "EMS",
"author": ""
},
{
"count": null,
"dates": "",
"entity_type": "Teemamärksõnad",
"indicator1": " ",
"indicator2": "4",
"is_linked": true,
"keyword": "koalitsioonid",
"keyword_source": "EMS",
"lang": "et",
"location": "",
"marc_field": 650,
"method": null,
"model_arch": "rakun",
"numeration": "",
"organisation_sub_unit": "",
"original_keyword": "koalitsioon",
"persons_title": "",
"score": 0.126,
"url": "https://ems.elnet.ee/id/EMS015591",
"url_source": "EMS",
"author": ""
},
{
"count": null,
"dates": "",
"entity_type": "Teemamärksõnad",
"indicator1": " ",
"indicator2": "4",
"is_linked": true,
"keyword": "valitsused",
"keyword_source": "EMS",
"lang": "et",
"location": "",
"marc_field": 650,
"method": null,
"model_arch": "rakun",
"numeration": "",
"organisation_sub_unit": "",
"original_keyword": "valitsus",
"persons_title": "",
"score": 0.089,
"url": "https://ems.elnet.ee/id/EMS002883",
"url_source": "EMS",
"author": ""
},
{
"count": null,
"dates": "",
"entity_type": "Teemamärksõnad",
"indicator1": " ",
"indicator2": "4",
"is_linked": false,
"keyword": "reformierakondlane",
"keyword_source": "AI",
"lang": "",
"location": "",
"marc_field": 693,
"method": null,
"model_arch": "rakun",
"numeration": "",
"organisation_sub_unit": "",
"original_keyword": "reformierakondlane",
"persons_title": "",
"score": 0.065,
"url": "",
"url_source": "",
"author": ""
},
{
"count": null,
"dates": "",
"entity_type": "Ajamärksõnad",
"indicator1": " ",
"indicator2": "4",
"is_linked": true,
"keyword": "20. sajand",
"keyword_source": "EMS",
"lang": "et",
"location": "",
"marc_field": 648,
"method": null,
"model_arch": "omikuji",
"numeration": "",
"organisation_sub_unit": "",
"original_keyword": "20. sajand",
"persons_title": "",
"score": 0.294,
"url": "https://ems.elnet.ee/id/EMS025565",
"url_source": "EMS",
"author": ""
},
{
"count": null,
"dates": "",
"entity_type": "Vormimärksõnad",
"indicator1": " ",
"indicator2": "4",
"is_linked": true,
"keyword": "e-raamatud",
"keyword_source": "EMS",
"lang": "et",
"location": "",
"marc_field": 655,
"method": null,
"model_arch": "omikuji",
"numeration": "",
"organisation_sub_unit": "",
"original_keyword": "e-raamatud",
"persons_title": "",
"score": 0.436,
"url": "https://ems.elnet.ee/id/EMS140805",
"url_source": "EMS",
"author": ""
},
{
"count": 10,
"dates": "1976-01-29-",
"entity_type": "Isikunimi",
"indicator1": "1",
"indicator2": " ",
"is_linked": true,
"keyword": "Kallas, Kristina",
"keyword_source": "SIERRA",
"lang": "et",
"location": "",
"marc_field": 600,
"method": "ner_ensemble",
"model_arch": "ner",
"numeration": "",
"organisation_sub_unit": "",
"original_keyword": "Kristina Kallas",
"persons_title": "",
"score": 1,
"url": "http://viaf.org/viaf/6079149544607000490000/",
"url_source": "VIAF",
"author": ""
},
{
"count": 6,
"dates": "1989-01-23-",
"entity_type": "Isikunimi",
"indicator1": "1",
"indicator2": " ",
"is_linked": true,
"keyword": "Kiik, Tanel",
"keyword_source": "SIERRA",
"lang": "et",
"location": "",
"marc_field": 600,
"method": "ner_ensemble",
"model_arch": "ner",
"numeration": "",
"organisation_sub_unit": "",
"original_keyword": "Tanel Kiik",
"persons_title": "",
"score": 0.6,
"url": "http://viaf.org/viaf/9787159478339927990006/",
"url_source": "VIAF",
"author": ""
}
]
🔍 Usage Examples
Click to expand
The following function is used to help formatting output:Click to expand
from rara_linker.linkers.linking_result import LinkingResult
from typing import NoReturn
def format_output(linked: LinkingResult) -> NoReturn:
print(f"Original entity: {linked.original_entity}")
print(f"Entity type: {linked.entity_type}")
print(f"Number of matches: {linked.n_linked}")
print(f"Similarity: {linked.similarity_score}")
for entity_info in linked.linked_info:
print()
print(f"Linked entity: {entity_info.linked_entity}")
description = entity_info.elastic.get("description", "")
if description:
print(f"Description: {description}")
print()
Example 1: Simple linking
from rara_linker.linkers.linker import Linker
import logging
logging.disable(logging.CRITICAL)
linker = Linker(add_viaf_info=True, vectorizer_or_dir_path="./vectorizer_data")
entity = "Damon Albarn"
linked = linker.link(entity)
format_output(linked)
Output:
Original entity: Damon Albarn
Entity type: PER
Number of matches: 1
Similarity: 1.0
Linked entity: Albarn, Damon
Description: Inglise muusik ja laulukirjutaja
from pprint import pprint
# Code for displaying the raw output of the same linking result:
pprint(linked.to_dict())
Raw output
{'entity_type': 'PER',
'linked_info': [{'elastic': {'birth_year': 1968,
'death_year': None,
'description': 'Inglise muusik ja laulukirjutaja',
'identifier': 'a12660826',
'identifier_source': 'ErRR',
'life_year': '1968-',
'link_variations': ['albran, damon',
'damon albran',
'damon albarn',
'albarn, damon'],
'name': 'Albarn, Damon',
'name_in_cyrillic': False,
'name_specification': '',
'name_variations': ['Albran, Damon'],
'source': 'Vikipeedia'},
'json': {'fields': [{'001': 'a12660826'},
{'003': 'ErRR'},
{'008': '240418|||aznnnaabn || '
'||| '},
{'040': {'ind1': ' ',
'ind2': ' ',
'subfields': [{'a': 'ErRR'},
{'b': 'est'},
{'c': 'ErRR'},
{'e': 'rda'}]}},
{'043': {'ind1': ' ',
'ind2': ' ',
'subfields': [{'c': 'uk'}]}},
{'046': {'ind1': ' ',
'ind2': ' ',
'subfields': [{'f': '1968'}]}},
{'075': {'ind1': ' ',
'ind2': ' ',
'subfields': [{'a': 'persoon'}]}},
{'100': {'ind1': '1',
'ind2': ' ',
'subfields': [{'a': 'Albarn, '
'Damon,'},
{'d': '1968-'}]}},
{'372': {'ind1': ' ',
'ind2': ' ',
'subfields': [{'a': 'rockmuusika'},
{'a': 'elektronmuusika'},
{'a': 'hip hop'},
{'a': 'Britpop.'}]}},
{'374': {'ind1': ' ',
'ind2': ' ',
'subfields': [{'a': 'laulukirjutaja'},
{'a': 'laulja'},
{'a': 'muusik.'}]}},
{'400': {'ind1': '1',
'ind2': ' ',
'subfields': [{'a': 'Albran, '
'Damon,'},
{'d': '1968-'}]}},
{'670': {'ind1': ' ',
'ind2': ' ',
'subfields': [{'a': 'Vikipeedia'},
{'u': 'https://en.wikipedia.org/wiki/Damon_Albarn.'}]}},
{'680': {'ind1': ' ',
'ind2': ' ',
'subfields': [{'i': 'Inglise '
'muusik ja '
'laulukirjutaja.'}]}}],
'leader': '00529nz a2200181n 4500'},
'linked_entity': 'Albarn, Damon',
'marc': '=LDR 00529nz a2200181n 4500\n'
'=001 a12660826\n'
'=003 ErRR\n'
'=008 '
'240418|||aznnnaabn\\\\\\\\\\\\\\\\\\\\||\\|||\\\\\\\\\\\\\n'
'=040 \\\\$aErRR$best$cErRR$erda\n'
'=043 \\\\$cuk\n'
'=046 \\\\$f1968\n'
'=075 \\\\$apersoon\n'
'=100 1\\$aAlbarn, Damon,$d1968-\n'
'=372 \\\\$arockmuusika$aelektronmuusika$ahip '
'hop$aBritpop.\n'
'=374 \\\\$alaulukirjutaja$alaulja$amuusik.\n'
'=400 1\\$aAlbran, Damon,$d1968-\n'
'=670 '
'\\\\$aVikipeedia$uhttps://en.wikipedia.org/wiki/Damon_Albarn.\n'
'=680 \\\\$iInglise muusik ja laulukirjutaja.\n',
'similarity_score': 1.0,
'viaf': {'message': '/api/search Successfully reached!',
'queryResult': {'echoedSearchRetrieveRequest': {'maximumRecords': {'type': 'xsd:nonNegativeInteger',
'value': 50},
'query': {'type': 'xsd:string',
'value': 'local.personalNames '
'all '
'"a12660826"'},
'recordPacking': {'type': 'xsd:string',
'value': 'xml'},
'recordSchema': {'type': 'xsd:string',
'value': 'BriefVIAF'},
'sortKeys': {'type': 'xsd:string',
'value': 'holdingscount'},
'startRecord': {'type': 'xsd:positiveInteger',
'value': 1},
'type': 'ns2:echoedSearchRetrieveRequestType',
'version': {'type': 'xsd:string',
'value': 1.1},
'xQuery': {'searchClause': {'index': {'type': 'xsd:string',
'value': 'local.personalNames'},
'relation': {'type': 'ns3:relationType',
'value': [{'value': 'all'}]},
'term': {'type': 'xsd:string',
'value': 'a12660826'},
'type': 'ns3:searchClauseType'}}},
'extraResponseData': {'extraData': {'databaseTitle': 'VIAF: '
'The '
'Virtual '
'International '
'Authority '
'File'},
'type': 'ns4:extraDataType'},
'numberOfRecords': {'type': 'xsd:nonNegativeInteger',
'value': 1},
'records': {'record': [{'recordData': {'VIAFCluster': {'mainHeadings': {'data': [{'sources': {'s': ['DNB',
'KRNLK',
'PLWABN',
'LIH',
'BNF',
'BNE',
'NKC',
'BIBSYS',
'NUKAT',
'ERRR',
'SUDOC'],
'sid': ['DNB|135275245',
'KRNLK|KAC2020M4718',
'PLWABN|9810618563005606',
'LIH|LNB:B2HO;=_u_Y',
'BNF|14025704',
'BNE|XX1502205',
'NKC|xx0042289',
'BIBSYS|6096767',
'NUKAT|n '
'2009143303',
'ERRR|a12660826',
'SUDOC|168995603']},
'text': 'Albarn, '
'Damon, '
'1968-....'},
{'sources': {'s': ['NLA',
'ISNI',
'LC',
'SIMACOB',
'RERO',
'NSK',
'DBC',
'J9U'],
'sid': ['NLA|000041317329',
'ISNI|0000000108754251',
'LC|n '
'97085620',
'SIMACOB|213734755',
'RERO|A002915097',
'NSK|000760001',
'DBC|87097969360297',
'J9U|987011827765305171']},
'text': 'Albarn, '
'Damon'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'text': 'Damon '
'Albarn '
'English '
'musician, '
'singer-songwriter, '
'and '
'record '
'producer'}]},
'nameType': 'Personal',
'titles': {'work': [{'sources': {'s': ['DNB',
'WKP',
'RERO',
'SUDOC',
'BNF'],
'sid': ['DNB|135275245',
'WKP|Q272069',
'RERO|A002915097',
'SUDOC|168995603',
'BNF|14025704']},
'title': '101 '
'Reykjavík'},
{'sources': {'s': ['NUKAT'],
'sid': ['NUKAT|n '
'2009143303']},
'title': 'Anna '
'and '
'the '
'moods'},
{'sources': {'s': ['BIBSYS'],
'sid': ['BIBSYS|6096767']},
'title': 'Anna '
'går '
'i '
'svart'},
{'sources': {'s': ['NUKAT'],
'sid': ['NUKAT|n '
'2009143303']},
'title': 'Anna '
'i '
'humorki'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Apple '
'carts '
'(2 '
'min '
'36 '
's)'},
{'sources': {'s': ['DBC'],
'sid': ['DBC|87097969360297']},
'title': 'Arbejdsnarkoman '
'uden '
'en '
'plan'},
{'sources': {'s': ['DNB'],
'sid': ['DNB|135275245']},
'title': 'ballad '
'of '
'Darren'},
{'sources': {'s': ['DBC',
'SUDOC',
'BNF'],
'sid': ['DBC|87097969360297',
'SUDOC|168995603',
'BNF|14025704']},
'title': 'Bananaz'},
{'sources': {'s': ['RERO'],
'sid': ['RERO|A002915097']},
'title': 'Broken'},
{'sources': {'s': ['SUDOC',
'BNF'],
'sid': ['SUDOC|168995603',
'BNF|14025704']},
'title': 'Call '
'me '
'up'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Cathedrals '
'(3 '
'min)'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Coffee '
'& '
'TV'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'comédie'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Coronation '
'(1 '
'min '
'10 '
's)'},
{'sources': {'s': ['SIMACOB'],
'sid': ['SIMACOB|213734755']},
'title': 'Cracker '
'Island'},
{'sources': {'s': ['SUDOC'],
'sid': ['SUDOC|168995603']},
'title': 'Damon '
'Albarn '
': '
'Blur, '
'Gorillaz '
'and '
'other '
'fables'},
{'sources': {'s': ['PLWABN'],
'sid': ['PLWABN|9810618563005606']},
'title': 'Damoniczny '
'świat'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'dancing '
'king '
'(3 '
'min '
'21 '
's)'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Dare'},
{'sources': {'s': ['SUDOC',
'BNF'],
'sid': ['SUDOC|168995603',
'BNF|14025704']},
'title': 'Democrazy'},
{'sources': {'s': ['LC',
'DNB'],
'sid': ['LC|n '
'97085620',
'DNB|135275245']},
'title': 'Doctor '
'Dee'},
{'id': 'VIAF|7806173669165707660003',
'sources': {'s': ['DNB',
'RERO',
'SUDOC',
'BNF',
'LC',
'NYNYRILM'],
'sid': ['DNB|135275245',
'RERO|A002915097',
'SUDOC|168995603',
'BNF|14025704',
'LC|n '
'97085620',
'NYNYRILM|95370']},
'title': 'Dr '
'Dee: '
'An '
'English '
'opera'},
{'sources': {'s': ['PLWABN'],
'sid': ['PLWABN|9810618563005606']},
'title': 'Drapieżcy'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Dude, '
'Where’s '
'My '
'Car?'},
{'sources': {'s': ['ERRR'],
'sid': ['ERRR|a12660826']},
'title': 'Euro '
'dance '
'1999'},
{'sources': {'s': ['DNB',
'SIMACOB',
'RERO',
'SUDOC',
'BNF'],
'sid': ['DNB|135275245',
'SIMACOB|213734755',
'RERO|A002915097',
'SUDOC|168995603',
'BNF|14025704']},
'title': 'Everyday '
'robots'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Fatti, '
'strafatti '
'e '
'strafighe'},
{'sources': {'s': ['LC',
'SUDOC',
'BNF'],
'sid': ['LC|n '
'97085620',
'SUDOC|168995603',
'BNF|14025704']},
'title': 'Film '
'of '
'life'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Girls '
'& '
'Boys'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Give '
'it '
'to '
'the '
'people'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Give '
'me'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'golden '
'dawn'},
{'sources': {'s': ['SUDOC',
'BNF'],
'sid': ['SUDOC|168995603',
'BNF|14025704']},
'title': 'good, '
'the '
'bad '
'&the '
'queen '
'Herculean'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Heavy '
'seas '
'of '
'love '
'(3 '
'min '
'44 '
's)'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'history '
'of '
'a '
'cheating '
'heart '
'(4 '
'min)'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Hollow '
'ponds '
'(4 '
'min '
'59 '
's)'},
{'sources': {'s': ['SUDOC',
'BNF'],
'sid': ['SUDOC|168995603',
'BNF|14025704']},
'title': 'Honest '
'Jons '
'sampler'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Hostiles '
'(4 '
'min '
'09 '
's)'},
{'sources': {'s': ['LC'],
'sid': ['LC|n '
'97085620']},
'title': 'The '
'isle '
'of '
'view'},
{'sources': {'s': ['SIMACOB'],
'sid': ['SIMACOB|213734755']},
'title': 'Live '
'forever '
'the '
'rise '
'and '
'fall '
'of '
'Brit '
'pop'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Lonely '
'press '
'play '
'(3 '
'min '
'42 '
's)'},
{'sources': {'s': ['DNB',
'RERO',
'SUDOC',
'BNF'],
'sid': ['DNB|135275245',
'RERO|A002915097',
'SUDOC|168995603',
'BNF|14025704']},
'title': 'Mali '
'music'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'man '
'of '
'England '
'(3 '
'min '
'17 '
's)'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'marvelous '
'dream'},
{'sources': {'s': ['BIBSYS'],
'sid': ['BIBSYS|6096767']},
'title': 'Me '
'and '
'the '
'devil'},
{'sources': {'s': ['DNB'],
'sid': ['DNB|135275245']},
'title': 'Merrie '
'land'},
{'id': 'VIAF|210959931',
'sources': {'s': ['LC',
'DNB',
'WKP'],
'sid': ['LC|n '
'97085620',
'DNB|135275245',
'WKP|Q272069',
'DNB|300912528']},
'title': 'Monkey'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Moon'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Mr '
'Tembo '
'(3 '
'min '
'43 '
's)'},
{'sources': {'s': ['DBC'],
'sid': ['DBC|87097969360297']},
'title': 'No '
'distance '
'left '
'to '
'run'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'One '
'day'},
{'sources': {'s': ['BNE',
'WKP',
'NUKAT',
'BNF'],
'sid': ['BNE|XX1502205',
'WKP|Q272069',
'NUKAT|n '
'2009143303',
'BNF|14025704']},
'title': 'Ordinary '
'Decent '
'Criminal'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Parakeet '
'(43 '
's)'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Parklife '
'(canción)'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Perdu'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Un '
'perfetto '
'criminale'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Photographs'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Point '
'star'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Preparation'},
{'sources': {'s': ['WKP',
'NUKAT'],
'sid': ['WKP|Q272069',
'NUKAT|n '
'2009143303']},
'title': 'Przyzwoity '
'przestępca'},
{'sources': {'s': ['PLWABN',
'BNE',
'SUDOC',
'BNF'],
'sid': ['PLWABN|9810618563005606',
'BNE|XX1502205',
'SUDOC|168995603',
'BNF|14025704']},
'title': 'Ravenous'},
{'sources': {'s': ['BNE'],
'sid': ['BNE|XX1502205']},
'title': 'Scott '
'Walker, '
'30 '
'Century '
'Man'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'selfish '
'giant '
'(4 '
'min '
'47 '
's)'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Seven '
'high '
'(1 '
'min)'},
{'id': 'VIAF|7146522416032391727',
'sources': {'s': ['DNB',
'SUDOC',
'BNF'],
'sid': ['DNB|135275245',
'DNB|1102560960',
'SUDOC|168995603',
'BNF|14025704']},
'title': 'Songs '
'from '
'Wonder.land'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Stary, '
'gdzie '
'moja '
'bryka?'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Swim '
'The '
'Channel'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Tavaline '
'kurjategija'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Tavallisen '
'rehti '
'rikollinen'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Temptation '
'comes '
'in '
'the '
'afternoon '
'(2 '
'min '
'05 '
's)'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'To '
'the '
'end'},
{'sources': {'s': ['NLA',
'PLWABN',
'WKP',
'BNE',
'LC',
'BNF',
'RERO',
'SUDOC'],
'sid': ['NLA|000041317329',
'PLWABN|9810618563005606',
'WKP|Q272069',
'BNE|XX1502205',
'LC|n '
'97085620',
'BNF|14025704',
'RERO|A002915097',
'SUDOC|168995603']},
'title': 'Trainspotting'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Traukinių '
'žymėjimas'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Tree '
'of '
'beauty '
'(2 '
'min)'},
{'sources': {'s': ['PLWABN'],
'sid': ['PLWABN|9810618563005606']},
'title': 'Twarz'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Twentieth '
'century '
'blues '
'the '
'songs '
'of '
'Noël '
'Coward'},
{'sources': {'s': ['RERO',
'SUDOC',
'BNF'],
'sid': ['RERO|A002915097',
'SUDOC|168995603',
'BNF|14025704']},
'title': 'Vorace'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Warrior'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'Watching '
'the '
'fire '
'that '
'waltzed '
'away '
'(2 '
'min '
'37 '
's)'},
{'id': 'VIAF|22146522418032391851',
'sources': {'s': ['DNB'],
'sid': ['DNB|1102560677']},
'title': 'Wonder.land'},
{'sources': {'s': ['BNF'],
'sid': ['BNF|14025704']},
'title': 'You '
'& '
'me '
'(7 '
'min '
'05 '
's)'},
{'sources': {'s': ['NSK'],
'sid': ['NSK|000760001']},
'title': 'Živio '
'album!'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Де '
'моя '
'тачка'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Обыкновенный '
'преступник'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'Трейнспотинг'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'הארי '
'המזוהם'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'مجرم '
'نجیب '
'عادی'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'ট্রেনস্পটিং'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': 'ซองทู'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': '내 '
'차 '
'봤냐?'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': '트레인스포팅'},
{'sources': {'s': ['KRNLK'],
'sid': ['KRNLK|KAC2020M4718']},
'title': '페이스'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': '私が愛したギャングスター'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': '猜火車'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': '豬頭,我的車咧?'},
{'sources': {'s': ['WKP'],
'sid': ['WKP|Q272069']},
'title': '王牌罪犯'}]},
'viafID': 17421456},
'type': 'ns1:stringOrXmlFragment'},
'recordPacking': {'type': 'xsd:string',
'value': 'xml'},
'recordPosition': {'type': 'xsd:positiveInteger',
'value': 1},
'recordSchema': {'type': 'xsd:string',
'value': 'http://viaf.org/BriefVIAFCluster'},
'type': 'ns1:recordType'}],
'type': 'ns1:recordsType'},
'resultSetIdleTime': {'type': 'xsd:positiveInteger',
'value': 1},
'schemaLocation': 'http://www.loc.gov/zing/srw/ '
'http://www.loc.gov/standards/sru/sru1-1archive/xml-files/srw-types.xsd',
'version': {'type': 'xsd:string',
'value': 1.1}}}}],
'linking_config': {'add_viaf_info': True,
'context': None,
'entity': 'Damon Albarn',
'fuzziness': 0,
'min_similarity': 0.9,
'prefix_length': 1},
'n_linked': 1,
'original_entity': 'Damon Albarn',
'similarity_score': 1.0}
Example 2: Multiple matches are returned
from rara_linker.linkers.linker import Linker
import logging
logging.disable(logging.CRITICAL)
linker = Linker(add_viaf_info=True, vectorizer_or_dir_path="./vectorizer_data")
# The input contains some typos,
# we are actually trying to find
# Paul Keres, the chess grandmaster
entity = "Paul Keers"
linked = linker.link(entity)
format_output(linked)
Output:
Original entity: Paul Keers
Entity type: PER
Number of matches: 3
Similarity: 0.98
Linked entity: Kees, Paul
Description: Eesti pedagoogikateadlane ja tõlkija
Linked entity: Keres, Paul
Description: Eesti maletaja ja maleteoreetik
Linked entity: Keres, Paul
Description: Eesti advokaat
Example 3: Using vector search for additional filtering
from rara_linker.linkers.linker import Linker
import logging
logging.disable(logging.CRITICAL)
linker = Linker(add_viaf_info=True, vectorizer_or_dir_path="./vectorizer_data")
# The input contains some typos,
# we are actually trying to find
# Paul Keres, the chess grandmaster
entity = "Paul Keers"
# The context can be any short text that
# might bare some contextual resemblance to
# the entity. In practice, it will most likely
# be a title or a short paragraph,
# where the name was mentioned, let's try
# something similar:
context = "Viljandis selgusid 56. maleturniiri võitjad"
linked = linker.link(entity, context=context)
format_output(linked)
Output:
Original entity: Paul Keers
Entity type: PER
Number of matches: 1
Similarity: 0.98
Linked entity: Keres, Paul
Description: Eesti maletaja ja maleteoreetik
Example 4: Link a keyword / subject index
from rara_linker.linkers.linker import Linker
import logging
logging.disable(logging.CRITICAL)
linker = Linker(add_viaf_info=True, vectorizer_or_dir_path="./vectorizer_data")
entity = "alajahtumine"
linked = linker.link(entity)
format_output(linked)
Output:
Original entity: alajahtumine
Entity type: EMS_KEYWORD
Number of matches: 1
Similarity: 1.0
Linked entity: hüpotermia
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rara_norm_linker-1.2.20.tar.gz.
File metadata
- Download URL: rara_norm_linker-1.2.20.tar.gz
- Upload date:
- Size: 115.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aaa80a3bd735d5f8f24479627a7a837fa0460d70d1eb546f49ee10eaa04de469
|
|
| MD5 |
bd67f0d0572fbdca5d23449cb7bd5e2c
|
|
| BLAKE2b-256 |
b1c31714b578bd0216971d216527c7951088fcef9a4a26bb4d4a8490838d8919
|
File details
Details for the file rara_norm_linker-1.2.20-py3-none-any.whl.
File metadata
- Download URL: rara_norm_linker-1.2.20-py3-none-any.whl
- Upload date:
- Size: 73.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
087a2650a2043c90e5d7f294bae6f0095ec64d4733995ba8186695656a22d64c
|
|
| MD5 |
771cc68de39c91749a271ae2a6ca5a8c
|
|
| BLAKE2b-256 |
5cf5db9e7f8cf5351cb1bd3ea1347d5f4a20062279983569f2294e883841468a
|