A tool to import Retraction Watch data
Project description
Retraction Watch Database Importer
This Airflow script imports the Retraction Watch database into the annotations system of the Crossref Labs API.
Input Format
The script expects an S3 folder that contains CSV files with Retraction Watch data.
The CSV file should have the headings (with this capitalization):
DOI
RetractionDOI
Reason
RetractionNature
Notes
URLS
The first row of the CSV should be the headings. Multiple entries are possible (e.g. an expression of concern and a retraction), but only one type of each, for each DOI, will be imported. (I.e. you cannot have two retractions or two expressions of concern.)
Idempotency
The script is idempotent. If you run it multiple times, it will only import new data and the results should be the same after multiple runs.
Archiving
After processing a JSON input file, the script will move it to an archive folder in the same S3 bucket.
Periodic Runs and Missing Input Files
The script is designed to be run periodically. If it does not find any input files, it will raise an exception. This is by design.
© Crossref 2023
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file retraction_watch_import-0.0.4.tar.gz
.
File metadata
- Download URL: retraction_watch_import-0.0.4.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 436878bcf2ae1ed2825538143883278ef0a1136b1ac3e50bfae0df09447dc71f |
|
MD5 | 1891723f89e6a23160fa48ee1c292c1f |
|
BLAKE2b-256 | 0ba91955a4097f36d4a667d2b5b37c8c197e0793f85ff6fb4f07064c1cdde953 |
File details
Details for the file retraction_watch_import-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: retraction_watch_import-0.0.4-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2af73dc486c550ac5e276bbb093b6d6ee7a2382ba9614cb9c003c12d422988c3 |
|
MD5 | b4e4e46fbf7d3abd063f2cedd6917f0c |
|
BLAKE2b-256 | 2a3ae5a2ab63e8bc89dbbe5c335d99774c8f4c4fd8382e06af4e5674d0eac06a |