Configuration

In addition to runtime options when executing doiget-tdm, as documented in Command-line reference, there are configuration options for doiget-tdm.

Note

There are also per-publisher configuration options that are described in Available publishers.

The active configuration settings can be shown by running doiget-tdm show-config.

Options

data_dir

The directory to store the acquired metadata and full-text content.

The default is platform-dependent and is determined using the platformdirs package. Typical directories are:

Linux

~/.local/share/doiget_tdm/

Mac

~/Library/Application Support/doiget_tdm/data/

Windows

~\AppData\Local\doiget_tdm\data

cache_dir

The directory to store temporary but persistent files, such as full-text data archives that span multiple items.

The default is platform-dependent and is determined using the platformdirs package. Typical directories are:

Linux

~/.cache/doiget_tdm/

Mac

~/Library/Caches/doiget_tdm/

Windows

~\AppData\Local\doiget-tdm\doiget-tdm\Cache

data_dir_n_groups

The acquired data is stored on the filesystem (within data_dir) with one directory per DOI. This results in a large number of directories, which can become prohibitively large for particular collections or storage systems. This option inserts an additional layer of directories, with each DOI pseudorandomly allocated to one of data_dir_n_groups subdirectories in data_dir.

The default is to not have this intermediate grouping layer.

email_address

The DOI metadata is typically acquired from the Crossref web API, which asks that users provide their email address as part of the request header.

The default is to not specify an email address.

encryption_passphrase

The full-text content acquired from publishers can optionally be encrypted in data_dir, using this passphrase.

The default is to store the full-text data unencrypted.

log_level

This sets the level at which messages are printed to the console. The accepted values are, in increasing degrees of verbosity, DEBUG, INFO, WARNING, ERROR, CRITICAL (see logging levels).

The default is WARNING.

file_log_level

As per the log_level option, but for the messages written to the log file.

The default is INFO.

crossref_lmdb_path

The path to a directory containing a local LMDB version of the Crossref public data. If present, it will be used to obtain the DOI metadata in preference to a web API call.

The default is to not have a LMDB available.

format_preference_order

Full-text content can be provided in multiple formats, and this option allows the search order for formats to be set. Additionally, formats can be excluded from acquisition by not including them in this list.

The default is ["xml", "pdf", "html", "txt", "tiff"].

skip_remaining_formats

This option sets the approach when a particular format for a DOI has been successfully acquired. If False, the remaining formats in format_preference_order are attempted to be acquired; if True, the attempted acquisition of the remaining formats is skipped.

The default is True.

extra_handlers_path

A directory from which to import additional publisher handlers. This directory needs to contain one or more .py files, which are imported after the built-in publisher handlers have been imported.

The default is to not have any additional publisher handlers.

metadata_compression_level

Level to compress the JSON metadata when storing. Set it to -1 to use the default compression level, 0 for no compression, or a number between 1 (least compression) to 9 (most compression).

The default is -1.

Setting the configuration

The configuration for doiget-tdm can be set via three ways:

Files in a configuration directory

The directory in which doiget-tdm will search for configuration settings varies by platform. Typical directories are:

Linux

~/.config/doiget_tdm/

Mac

~/Library/Application Support/doiget_tdm/config/

Windows

~\AppData\Local\doiget-tdm\doiget_tdm\config

A configuration option can be set by creating a file inside the config directory with a name that has the form doiget_tdm_${OPTION} and the contents are the option setting. For example, the log_level option can be set to WARNING by creating a file called doiget_tdm_log_level that contains the text WARNING.

Within a .env file

Configuration settings can be read from a file named .env that is contained in the directory in which doiget-tdm is executed. This file contains one option per line, in the form DOIGET_TDM_${OPTION}=${VALUE}. For example, the log_level option can be set to WARNING by having a line in .env that is DOIGET_TDM_LOG_LEVEL=INFO.

Using environment variables

Configuration options can be set by using system environment variables. These follow the same convention as options set using the .env file approach.