Documentation¶
doiget-tdm
is a command-line application and Python library for obtaining the metadata and full-text of published journal articles.
Warning
This package is primarily intended for use in text data mining projects where the user has subscriptions to full-text content and has organised data exchange agreements. Acquisition for most publishers will not work without configuration - see Available publishers for configuration details.
Features¶
Acquire full-text of published articles, with built-in support for multiple publishers and their acquisition methods (e.g., network or local files).
- Currently supported publishers (given appropriate access and configuration):
American Medical Association (AMA)
American Psychological Association (APA)
Elsevier
Frontiers
IOP
PeerJ
PLoS
PNAS
Royal Society
Springer-Nature
Taylor & Francis
Wiley
Customise acquisition and add additional publishers.
Retrieve article metadata from Crossref, optionally using a Lightning key:value (DOI:metadata) database formed from a Crossref public data export via crossref-lmdb.
Installation¶
The package can be installed using pip
:
pip install doiget-tdm
Quickstart¶
Show the default configuration settings:
doiget-tdm show-config
Download the full-text (XML) of the journal article with DOI 10.1371/journal.pbio.1002611 to the default directory:
doiget-tdm acquire '10.1371/journal.pbio.1002611'
Next, you can read through the Workflow document to understand how to use the package in a text data mining project and the Concepts document to learn more about the approach taken by doiget-tdm
.
Documentation guide¶
- Workflow
Describes a typical workflow for using
doiget-tdm
.- Configuration
Describes the available configuration options and how they can be specified.
- Publishers
Lists the details of publishers with built-in support and describes the process of adding additional publishers.
- Concepts
Outlines the conceptual approach taken by
doiget-tdm
.- Command-line reference
Provides a reference to the
doiget-tdm
command-line application and its options.- API reference
Provides a reference to the
doiget-tdm
Python code.
Contact¶
Issues can be raised via the Github repository.