Documentation

doiget-tdm is a command-line application and Python library for obtaining the metadata and full-text of published journal articles.

Warning

This package is primarily intended for use in text data mining projects where the user has subscriptions to full-text content and has organised data exchange agreements. Acquisition for most publishers will not work without configuration - see Available publishers for configuration details.

Features

  • Acquire full-text of published articles, with built-in support for multiple publishers and their acquisition methods (e.g., network or local files).

  • Currently supported publishers (given appropriate access and configuration):
    • American Medical Association (AMA)

    • American Psychological Association (APA)

    • Elsevier

    • Frontiers

    • IOP

    • PeerJ

    • PLoS

    • PNAS

    • Royal Society

    • Springer-Nature

    • Taylor & Francis

    • Wiley

  • Customise acquisition and add additional publishers.

  • Retrieve article metadata from Crossref, optionally using a Lightning key:value (DOI:metadata) database formed from a Crossref public data export via crossref-lmdb.

Installation

The package can be installed using pip:

pip install doiget-tdm

Quickstart

Show the default configuration settings:

doiget-tdm show-config

Download the full-text (XML) of the journal article with DOI 10.1371/journal.pbio.1002611 to the default directory:

doiget-tdm acquire '10.1371/journal.pbio.1002611'

Next, you can read through the Workflow document to understand how to use the package in a text data mining project and the Concepts document to learn more about the approach taken by doiget-tdm.

Documentation guide

Workflow

Describes a typical workflow for using doiget-tdm.

Configuration

Describes the available configuration options and how they can be specified.

Publishers

Lists the details of publishers with built-in support and describes the process of adding additional publishers.

Concepts

Outlines the conceptual approach taken by doiget-tdm.

Command-line reference

Provides a reference to the doiget-tdm command-line application and its options.

API reference

Provides a reference to the doiget-tdm Python code.

Contact

Issues can be raised via the Github repository.

Authors

Please feel free to email if you find this package to be useful or have any suggestions or feedback.