Skip to main content

A command-line application and Python library for obtaining the metadata and full-text of published journal articles for text data mining (TDM) purposes.

Project description

doiget-tdm

doiget-tdm is a command-line application and Python library for obtaining the metadata and full-text of published journal articles.

[!WARNING] This package is primarily intended for use in text data mining projects where the user has subscriptions to full-text content and has organised data exchange agreements. Acquisition for most publishers will not work without configuration - see Available publishers.

Features

  • Acquire full-text of published articles, with built-in support for multiple publishers and their acquisition methods (e.g., network or local files).
  • Currently supported publishers (given appropriate access and configuration):
    • American Medical Association (AMA)
    • American Psychological Association (APA)
    • Elsevier
    • Frontiers
    • IOP
    • PeerJ
    • PLoS
    • PNAS
    • Royal Society
    • Sage
    • Springer-Nature
    • Taylor & Francis
    • Wiley
  • Customise acquisition and add additional publishers.
  • Retrieve article metadata from Crossref, optionally using a Lightning key:value (DOI:metadata) database formed from a Crossref public data export via crossref-lmdb.

Installation

The package can be installed using pip:

pip install doiget-tdm

Quickstart

Show the default configuration settings:

doiget-tdm show-config

Download the full-text (XML) of the journal article with DOI 10.1371/journal.pbio.1002611 to the default directory:

doiget-tdm acquire '10.1371/journal.pbio.1002611'

Next, you can read through the Workflow document to understand how to use the package in a text data mining project and the Concepts document to learn more about the approach taken by doiget-tdm.

Documentation

See the documentation for detailed information about how to use doiget-tdm.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doiget_tdm-0.1.0.tar.gz (137.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doiget_tdm-0.1.0-py3-none-any.whl (45.6 kB view details)

Uploaded Python 3

File details

Details for the file doiget_tdm-0.1.0.tar.gz.

File metadata

  • Download URL: doiget_tdm-0.1.0.tar.gz
  • Upload date:
  • Size: 137.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for doiget_tdm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5678d030f7fa7ff07ff809219a33a9b4d1ea545ec77c1668f8552ea47129d6bc
MD5 87dcd0ab231ddd090cb6988b3b20e80b
BLAKE2b-256 7ea472bf518a1e47e221ef925b38724b61e6f56c5d7e014029327a7fe9bf2c3a

See more details on using hashes here.

File details

Details for the file doiget_tdm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: doiget_tdm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 45.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for doiget_tdm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3af69cf1390672cca71aaa22cf078dec0b0e46df9dc5a81dd7ac1e85a069f3e7
MD5 38c8a3c2bc0d06d139ce9ef09f252ef4
BLAKE2b-256 3f9f5d834a7a3f78b6b2b7c865ae51eb6b51a6a88e3cebbe1c8fbcd20d0a343a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page