Skip to main content

Tool no normalize orcids

Project description

1. OrcidNormalizer

Maintainer: taeger@dzd-ev.de
Status: RC1

The purpose for this module is to normalize ORCIDs and bring them into a coherent (and thefore the official) ISNI format:

http://orcid.org/0000-0000-0000-000(0/X)

What is ORCID?

2. Table of content

3. Introduction

3.1. Overview

This small python project is part of our pipeline to integrate a large number of PUBMED-articles (free database for medical journal articles etc.) into a database. ORCID stands for 'Open Researcher and Contributor ID' and is used to accurately connect an author to their work. This is usefull/important in cases where two or more researchers/scients share the same the name, which leads to the problem which author wrote which paper.

3.2. Problems

When registering your article at PUBMED the ORCID-parameter is an optional textfield, which leads to multiple challenges. Due to the fact that we are dealing with user input everything is possible, from no numbers, to email addresses to abstracts etc. Therefore a tool to clean valid entries and skip invalid entries seems usefull.

3.3. Solution

For the sake of performance the tests that are performed on the entry are fairly simple and straight forward.

Remember: The officials ORCID consists of 16 digits in groups of 4 or 15 digits and an 'X' due to the checksum. If you are interested you can refer to the following documentaion: why 'X' and how to calculate the checksum

  • If the input is not a string the input is invalid
  • If there are more than 16 digits in the input string the input is invalid
  • If there are 16 or more digits and an 'x' or 'X' anywhere in the input the input is invalid
  • If there is an 'x' or 'X' somewhere in the input the 'x'/'X' will be used as the checksum test (last digit) of the input
  • If there are less digits the input is padded left with 0s

Input Examples

valid:

  • OrcidID("http://orcid.org/0000-0001-5000-0074") --> valid
  • OrcidID("0001-5000-0074") --> valid, padded with 0s
  • OrcidID("0001-5000-0074 peter123@net") --> vaild 15 digits + padding

invalid (will raise ValueError):


The valid inputs will then be tested via checksum test (https://support.orcid.org/hc/en-us/articles/360006897674-Structure-of-the-ORCID-Identifier) The chance of a false positive is 1 in 11

4. Usage

Requirements:

  • Python3 with pip installed

4.1 Install

pip3 install OrcidNormalizer

4.2 Apply

Create an instance for every orcid id and normalize the input

from OrcidNormalizer import Orcid

id = OrcidID("0000000150000074")
id.uri()

> "https://orcid.org/0000-0001-5000-0074"

4.3 API

Orcid.uri - Uniform Resource Identifier

Return the full INSI formated OCRID

from OrcidNormalizer import Orcid

id = Orcid("0000000150000074")
id.uri()

https://orcid.org/0000-0001-5000-0074

Orcid.urn - Uniform Resource Name

Return the Uniform Resource Name part only

from OrcidNormalizer import Orcid

id = Orcid("0000000150000074")
id.uri()

0000-0001-5000-0074

Orcid.is_valid()

Does a checksum validation according to https://support.orcid.org/hc/en-us/articles/360006897674-Structure-of-the-ORCID-Identifier#checksum

from OrcidNormalizer import Orcid

id = Orcid("https://orcid.org/1-5000-0074")
id.is_valid()

True

Orcid.RAISE_EXCEPTION_ON_UNPARSABLE_ORCID_STRING

If a string is unparsable OrcidNormalizer.Orcid will raise an exception. In large batch operations it can be conveineint to override this behaviour. Set to False to return OrcidNormalizer.Orcid.RETURN_VAL_ON_UNPARSABLE instead of raising an exception.

Orcid.RETURN_VAL_ON_UNPARSABLE

See Orcid.RAISE_EXCEPTION_ON_UNPARSABLE_ORCID_STRING

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

OrcidNormalizer-0.0.1.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

OrcidNormalizer-0.0.1-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file OrcidNormalizer-0.0.1.tar.gz.

File metadata

  • Download URL: OrcidNormalizer-0.0.1.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for OrcidNormalizer-0.0.1.tar.gz
Algorithm Hash digest
SHA256 82c280e8cc997ff85ee3652d2f3e6ad3b6c730439eeab786e77c28b71eaf606b
MD5 521f3e757eaf3528a557b2578c4aea5b
BLAKE2b-256 5884fb390c72b20bc0846038a18d09ca7439a56874c89af3429ad27105360562

See more details on using hashes here.

File details

Details for the file OrcidNormalizer-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for OrcidNormalizer-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7cd12f6193204f99019f831eb113ea47099a10a4149362b2d34322236892399e
MD5 aa7035016920521a0bcb6ed4529ef70b
BLAKE2b-256 b18dcb0eabc437296dd8267420a2ee7125057259dfbff8ec9b35d0a2a8f645b9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page