Tool no normalize orcids
Project description
1. OrcidNormalizer
Maintainer: taeger@dzd-ev.de
Status: RC1
The purpose for this module is to normalize ORCIDs and bring them into a coherent (and thefore the official) ISNI format:
http://orcid.org/0000-0000-0000-000(0/X)
What is ORCID?
2. Table of content
3. Introduction
3.1. Overview
This small python project is part of our pipeline to integrate a large number of PUBMED-articles (free database for medical journal articles etc.) into a database. ORCID stands for 'Open Researcher and Contributor ID' and is used to accurately connect an author to their work. This is usefull/important in cases where two or more researchers/scients share the same the name, which leads to the problem which author wrote which paper.
3.2. Problems
When registering your article at PUBMED the ORCID-parameter is an optional textfield, which leads to multiple challenges. Due to the fact that we are dealing with user input everything is possible, from no numbers, to email addresses to abstracts etc. Therefore a tool to clean valid entries and skip invalid entries seems usefull.
3.3. Solution
For the sake of performance the tests that are performed on the entry are fairly simple and straight forward.
Remember: The officials ORCID consists of 16 digits in groups of 4 or 15 digits and an 'X' due to the checksum. If you are interested you can refer to the following documentaion: why 'X' and how to calculate the checksum
- If the input is not a string the input is invalid
- If there are more than 16 digits in the input string the input is invalid
- If there are 16 or more digits and an 'x' or 'X' anywhere in the input the input is invalid
- If there is an 'x' or 'X' somewhere in the input the 'x'/'X' will be used as the checksum test (last digit) of the input
- If there are less digits the input is padded left with 0s
Input Examples
valid:
- OrcidID("http://orcid.org/0000-0001-5000-0074") --> valid
- OrcidID("0001-5000-0074") --> valid, padded with 0s
- OrcidID("0001-5000-0074 peter123@net") --> vaild 15 digits + padding
invalid (will raise ValueError):
- OrcidID("http://orcid.org/0000-0001-5000-0074-0235") --> invalid, too many digits
- OrcidID("http://orcid.org/0000-0001-5000-0074 pete123@mail.net") --> invalid, too many digits
- OrcidID("http://orcid.org/0000-0001-5000-0074X") --> invalid, too many digits and 'x'/'X'
- OrcidID(1234123412341234) --> invalid, input is not a string
The valid inputs will then be tested via checksum test (https://support.orcid.org/hc/en-us/articles/360006897674-Structure-of-the-ORCID-Identifier) The chance of a false positive is 1 in 11
4. Usage
Requirements:
- Python3 with pip installed
4.1 Install
pip3 install OrcidNormalizer
4.2 Apply
Create an instance for every orcid id and normalize the input
from OrcidNormalizer import Orcid
id = OrcidID("0000000150000074")
id.uri()
> "https://orcid.org/0000-0001-5000-0074"
4.3 API
Orcid.uri - Uniform Resource Identifier
Return the full INSI formated OCRID
from OrcidNormalizer import Orcid
id = Orcid("0000000150000074")
id.uri()
Orcid.urn - Uniform Resource Name
Return the Uniform Resource Name part only
from OrcidNormalizer import Orcid
id = Orcid("0000000150000074")
id.uri()
0000-0001-5000-0074
Orcid.is_valid()
Does a checksum validation according to https://support.orcid.org/hc/en-us/articles/360006897674-Structure-of-the-ORCID-Identifier#checksum
from OrcidNormalizer import Orcid
id = Orcid("https://orcid.org/1-5000-0074")
id.is_valid()
True
Orcid.RAISE_EXCEPTION_ON_UNPARSABLE_ORCID_STRING
If a string is unparsable OrcidNormalizer
.Orcid
will raise an exception. In large batch operations it can be conveineint to override this behaviour. Set to False
to return OrcidNormalizer
.Orcid
.RETURN_VAL_ON_UNPARSABLE
instead of raising an exception.
Orcid.RETURN_VAL_ON_UNPARSABLE
See Orcid
.RAISE_EXCEPTION_ON_UNPARSABLE_ORCID_STRING
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file OrcidNormalizer-0.0.1.tar.gz
.
File metadata
- Download URL: OrcidNormalizer-0.0.1.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82c280e8cc997ff85ee3652d2f3e6ad3b6c730439eeab786e77c28b71eaf606b |
|
MD5 | 521f3e757eaf3528a557b2578c4aea5b |
|
BLAKE2b-256 | 5884fb390c72b20bc0846038a18d09ca7439a56874c89af3429ad27105360562 |
File details
Details for the file OrcidNormalizer-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: OrcidNormalizer-0.0.1-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7cd12f6193204f99019f831eb113ea47099a10a4149362b2d34322236892399e |
|
MD5 | aa7035016920521a0bcb6ed4529ef70b |
|
BLAKE2b-256 | b18dcb0eabc437296dd8267420a2ee7125057259dfbff8ec9b35d0a2a8f645b9 |