Skip to main content

PERDIDO Geoparser python library

Project description

Perdido Geoparser Python library

PyPI PyPI - License PyPI - Python Version

Installation

To install the latest stable version, you can use:

pip install --upgrade perdido

Quick start

Geoparsing

Binder Open In Colab

Import

from perdido.geoparser import Geoparser

Run geoparser

text = "J'ai rendez-vous proche de la place Bellecour, de la place des Célestins, au sud de la fontaine des Jacobins et près du pont Bonaparte."
geoparser = Geoparser(version='Standard')
doc = geoparser(text)
  • The version parameter can take 2 values: Standard (default), Encyclopedie.

Get tokens

  • Access token attributes:
for token in doc:
    print(f'{token.text}\tlemma: {token.lemma}\tpos: {token.pos}')
  • Get the IOB format:
for token in doc:
    print(token.iob_format())
  • Get a TSV-IOB format:
for token in doc:
    print(token.tsv_format())

Print the XML-TEI output

print(doc.tei)

Print the XML-TEI output with XML syntax highlighting

from display_xml import XML
XML(doc.tei, style='lovelace')

Print the GeoJSON output

print(doc.geojson)

Get the list of named entities

for entity in doc.named_entities:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponym_candidates:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource {t.source}')

Get the list of nested named entities

for nested_entity in doc.nested_named_entities:
    print(f'entity: {nested_entity.text}\ttag: {nested_entity.tag}')
    if nested_entity.tag == 'place':
        for t in nested_entity.toponym_candidates:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource {t.source}')

Get the list of spatial relations

for sp_relation in doc.sp_relations:
    print(f'spatial relation: {sp_relation.text}\ttag: {sp_relation.tag}')

Shows named entities and nested named entities using the displacy library from spaCy

displacy.render(doc.to_spacy_doc(), style="ent", jupyter=True)
displacy.render(doc.to_spacy_doc(), style="span", jupyter=True)

Display the map (using folium library)

doc.get_folium_map()

Saving results

doc.to_xml('filename.xml')
doc.to_geojson('filename.geojson')
doc.to_iob('filename.tsv')
doc.to_csv('filename.csv')

Geocoding

Binder Open In Colab

Import

from perdido.geocoder import Geocoder

Geocode a single place name

geocoder = Geocoder()
doc = geocoder('Lyon')

Geocode a list of place names

geocoder = Geocoder()
doc = geocoder(['Lyon', 'la place des Célestins', 'la fontaine des Jacobins'])

Get the geojson result

print(doc.geojson)

Get the list of toponym candidates

for t in doc.toponyms: 
    print(f'lat: {t.lat}\tlng: {t.lng}\tsource {t.source}\tsourceName {t.source_name}')

Get the toponym candidates as a GeoDataframe

print(doc.to_geodataframe())

Perdido Geoparser REST APIs

http://choucas.univ-pau.fr/docs#

Example: call REST API in Python

import requests

url = 'http://choucas.univ-pau.fr/PERDIDO/api/'
service = 'geoparsing'
data = {'content': 'Je visite la ville de Lyon, Annecy et le Mont-Blanc.'}
parameters = {'api_key': 'demo'}

r = requests.post(url+service, params=parameters, json=data)

print(r.text)

Tutorials

Cite this work

Moncla, L. and Gaio, M. (2023). Perdido: Python library for geoparsing and geocoding French texts. In proceedings of the First International Workshop on Geographic Information Extraction from Texts (GeoExT'23), ECIR Conference, Dublin, Ireland.

Acknowledgements

Perdido is an active project still under developpement.

This work was partially supported by the following projects:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

perdido-0.1.45.tar.gz (61.3 MB view details)

Uploaded Source

Built Distribution

perdido-0.1.45-py3-none-any.whl (94.0 MB view details)

Uploaded Python 3

File details

Details for the file perdido-0.1.45.tar.gz.

File metadata

  • Download URL: perdido-0.1.45.tar.gz
  • Upload date:
  • Size: 61.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for perdido-0.1.45.tar.gz
Algorithm Hash digest
SHA256 00d1bc79c25d130c75ea332fbedd9322afe22e19cef1cea954a8504fb3bf19f6
MD5 c6dc0928f67529a4b8be9535820f8785
BLAKE2b-256 9d66fc3d760f3e2b421c826f07bac8acc3084dfea8a9a7164421f380e13d83bf

See more details on using hashes here.

File details

Details for the file perdido-0.1.45-py3-none-any.whl.

File metadata

  • Download URL: perdido-0.1.45-py3-none-any.whl
  • Upload date:
  • Size: 94.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for perdido-0.1.45-py3-none-any.whl
Algorithm Hash digest
SHA256 c017f8f64c32e9bf002609755ae68a38a60342d7a2981b96fffe0e799f6bc8a9
MD5 eb120772f5e629037f60d7ed682baa9c
BLAKE2b-256 49d98beb9951525a52513b04609d53e471dbcbd0e5784db18b072eab5aa0020a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page