Skip to main content

A fast geo toolkit for academic affiliation strings

Project description

PinPoint

PyPI PyPI - License Codeship Bitbucket issues PyPI - Status

PinPoint is a fast geo toolkit for academic affiliation strings. It provides the following base functions:

  • find a location (information about mapped city and country)
  • calculate the apparent location and cooperation distance for a list of weighted affiliation strings

Install

Install and update using pip

pip install pinpoint

Usage

from pinpoint import Locator
loc = Locator()

The first time Locator is initialized the lookup tables and databases need to be created. For this four files are downloaded from GeoNames dump (~ 150MB) and optimized:

  • cities1000.zip
  • admin1CodesASCII.txt
  • countryInfo.txt
  • alternateNames.zip

It is possible to rebuild the database at a later date:

from pinpoint import Locator
loc = Locator(refresh=True)

The data will not be downloaded again from GeoNames if the cached files are younger than a week, to avoid unnecessary load on their servers. The databases and cached files are stored in the appropriate folders depending on your operating system. If necessary, you can empty them by hand.

from pinpoint import Locator
print(Locator.resources_dir)
print(Locator.resources_cache_dir)

Find a location

test_string = "Department of Chemical and Biomolecular Engineering, Rice University, Houston, TX, United States"
country, region, city = loc.find(test_string)

This function returns either a dict() or None for each the country, region, and city. The following information is returned based on the data from GeoNames:

  • county
    • 'a2' ISO 3166-1 alpha-2 counry code
    • 'a3' ISO 3166-1 alpha-3 counry code
    • 'n3' ISO 3166-1 numeric counry code
    • 'name'
    • 'short_name_list' short name variants
    • 'name_list' name in different languages
    • 'capital'
    • 'continent'
    • 'area' in square kilometer
    • ''population'
    • 'geonameid' unique id given by GeoNames
  • region (just used for USA and Canada at the moment)
    • 'name'
    • 'short_name_list' short name variants
    • 'name_list' name in different languages
    • 'region_code'
    • 'a2' ISO 3166-1 alpha-2 counry code
    • 'geonameid' unique id given by GeoNames
  • city
    • 'name'
    • 'asciiname'
    • 'name_list' name in different languages
    • 'latitude'
    • 'longitude'
    • 'a2' ISO 3166-1 alpha-2 counry code
    • 'admin1_code'
    • 'elevation' and 'dem' are linked to the elevation in meter
    • 'timezone'
    • 'geonameid' unique id given by GeoNames

Calculate the apparent location and cooperation distance

Based on a weighted list of affiliations, an apparent location for a scientific document can be calculated.

from pinpoint import Locator
loc = Locator()

weighted_affiliations = {
    "Dresden Center for Computational Material Science, Technische Universität Dresden, Dresden, Germany": 2,
    "Department of Chemical and Biomolecular Engineering, Rice University, Houston, TX, United States": 1,
    "Nanoscience and Nanotechnology Center, Institute of Scientific and Industrial Research (ISIR), Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka, Japan": 0.5,
    "Centro/Departamento de Física da Universidade do Minho, Campus de Gualtar, 4710-057 Braga, Portugal": 0.5,
    }

cooperation_distance, apparent_location = loc.calculate_str(weighted_affiliations)

The cooperation distance is returned in kilometers. If the coordinates are already known, the calculation can be done directly, without the need to initialize the resources.

Locator.calculate_coordinates(weighted_coordinates)

redis subsystem

The underlying architecture of pinpoint is not well suited for the use in a system that spawns many processes or threads. To enable its use under such conditions, the application data can be separated from the search logic.

The lookup tables and location databases are then stored in a redis database (>4.0). After the installation two additional python packages are needed:

pip install redis
pip install hiredis

The way to interact with pinpoint does not change by using the redis subsystem. When Locator is initialized the value of server needs to be set to True.

from pinpoint import Locator
loc = Locator(server=True)

If different settings for redis server are needed, server can be set to a dictionary containing the settings. The allowed keys are the same as listed in the redis-py documentation.

from pinpoint import Locator
loc = Locator(server={'host': 'localhost', 'port': 6379, 'db': 0})

This approach is noticeable slower when directly compared to the default implementation. It should be just used if multiple instances of pinpoint need to run in parallel.

Examples

Various examples can be found in the extra folder of the source distribution.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PinPoint-0.2.1.tar.gz (52.7 kB view details)

Uploaded Source

Built Distribution

PinPoint-0.2.1-py3-none-any.whl (52.6 kB view details)

Uploaded Python 3

File details

Details for the file PinPoint-0.2.1.tar.gz.

File metadata

  • Download URL: PinPoint-0.2.1.tar.gz
  • Upload date:
  • Size: 52.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for PinPoint-0.2.1.tar.gz
Algorithm Hash digest
SHA256 97309b150b51ec29ec6e1027597728373095ffebba767e6c5306e48cb8d58cd6
MD5 f873ba14c4a726801fa0c2a6dc3e3b31
BLAKE2b-256 19b4aad254036844977a8b68add2c441e7cbbf39cec05d5db1bc8cb0043f823f

See more details on using hashes here.

File details

Details for the file PinPoint-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: PinPoint-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 52.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.5

File hashes

Hashes for PinPoint-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c9561130e402088ef99a5c9eb08c6be03138b49b1dd937bd9b1c1a8cb6f23eeb
MD5 d16dc9bc1cbb05a34dce5ee6b55c4cad
BLAKE2b-256 3b84e9e27c5995b700356906905af1b15262bdaf5d978cf05294337434d213fa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page