A command-line application and Python library for obtaining the metadata and full-text of published journal articles for text data mining (TDM) purposes.
Project description
doiget-tdm
doiget-tdm is a command-line application and Python library for obtaining the metadata and full-text of published journal articles.
[!WARNING] This package is primarily intended for use in text data mining projects where the user has subscriptions to full-text content and has organised data exchange agreements. Acquisition for most publishers will not work without configuration - see Available publishers.
Features
- Acquire full-text of published articles, with built-in support for multiple publishers and their acquisition methods (e.g., network or local files).
- Currently supported publishers (given appropriate access and configuration):
- American Medical Association (AMA)
- American Psychological Association (APA)
- Elsevier
- Frontiers
- IOP
- PeerJ
- PLoS
- PNAS
- Royal Society
- Sage
- Springer-Nature
- Taylor & Francis
- Wiley
- Customise acquisition and add additional publishers.
- Retrieve article metadata from Crossref, optionally using a Lightning key:value (DOI:metadata) database formed from a Crossref public data export via
crossref-lmdb.
Installation
The package can be installed using pip:
pip install doiget-tdm
Quickstart
Show the default configuration settings:
doiget-tdm show-config
Download the full-text (XML) of the journal article with DOI 10.1371/journal.pbio.1002611 to the default directory:
doiget-tdm acquire '10.1371/journal.pbio.1002611'
Next, you can read through the Workflow document to understand how to use the package in a text data mining project and the Concepts document to learn more about the approach taken by doiget-tdm.
Documentation
See the documentation for detailed information about how to use doiget-tdm.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doiget_tdm-0.1.0.tar.gz.
File metadata
- Download URL: doiget_tdm-0.1.0.tar.gz
- Upload date:
- Size: 137.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5678d030f7fa7ff07ff809219a33a9b4d1ea545ec77c1668f8552ea47129d6bc
|
|
| MD5 |
87dcd0ab231ddd090cb6988b3b20e80b
|
|
| BLAKE2b-256 |
7ea472bf518a1e47e221ef925b38724b61e6f56c5d7e014029327a7fe9bf2c3a
|
File details
Details for the file doiget_tdm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: doiget_tdm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 45.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3af69cf1390672cca71aaa22cf078dec0b0e46df9dc5a81dd7ac1e85a069f3e7
|
|
| MD5 |
38c8a3c2bc0d06d139ce9ef09f252ef4
|
|
| BLAKE2b-256 |
3f9f5d834a7a3f78b6b2b7c865ae51eb6b51a6a88e3cebbe1c8fbcd20d0a343a
|