Skip to main content

A tool to import Retraction Watch data

Project description

Retraction Watch Database Importer

This Airflow script imports the Retraction Watch database into the annotations system of the Crossref Labs API.

license activity

Airflow AWS Linux Python

Input Format

The script expects an S3 folder that contains CSV files with Retraction Watch data.

The CSV file should have the headings (with this capitalization):

  • DOI
  • RetractionDOI
  • Reason
  • RetractionNature
  • Notes
  • URLS

The first row of the CSV should be the headings. Multiple entries are possible (e.g. an expression of concern and a retraction), but only one type of each, for each DOI, will be imported. (I.e. you cannot have two retractions or two expressions of concern.)

Idempotency

The script is idempotent. If you run it multiple times, it will only import new data and the results should be the same after multiple runs.

Archiving

After processing a JSON input file, the script will move it to an archive folder in the same S3 bucket.

Periodic Runs and Missing Input Files

The script is designed to be run periodically. If it does not find any input files, it will raise an exception. This is by design.

© Crossref 2023

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retraction_watch_import-0.0.4.tar.gz (8.3 kB view hashes)

Uploaded Source

Built Distribution

retraction_watch_import-0.0.4-py3-none-any.whl (7.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page