Configuration¶
In addition to runtime options when executing doiget-tdm, as documented in Command-line reference, there are configuration options for doiget-tdm.
Note
There are also per-publisher configuration options that are described in Available publishers.
The active configuration settings can be shown by running doiget-tdm show-config.
Options¶
data_dirThe directory to store the acquired metadata and full-text content.
The default is platform-dependent and is determined using the
platformdirspackage. Typical directories are:- Linux
~/.local/share/doiget_tdm/- Mac
~/Library/Application Support/doiget_tdm/data/- Windows
~\AppData\Local\doiget_tdm\data
cache_dirThe directory to store temporary but persistent files, such as full-text data archives that span multiple items.
The default is platform-dependent and is determined using the
platformdirspackage. Typical directories are:- Linux
~/.cache/doiget_tdm/- Mac
~/Library/Caches/doiget_tdm/- Windows
~\AppData\Local\doiget-tdm\doiget-tdm\Cache
data_dir_n_groupsThe acquired data is stored on the filesystem (within
data_dir) with one directory per DOI. This results in a large number of directories, which can become prohibitively large for particular collections or storage systems. This option inserts an additional layer of directories, with each DOI pseudorandomly allocated to one ofdata_dir_n_groupssubdirectories indata_dir.The default is to not have this intermediate grouping layer.
email_addressThe DOI metadata is typically acquired from the Crossref web API, which asks that users provide their email address as part of the request header.
The default is to not specify an email address.
encryption_passphraseThe full-text content acquired from publishers can optionally be encrypted in
data_dir, using this passphrase.The default is to store the full-text data unencrypted.
log_levelThis sets the level at which messages are printed to the console. The accepted values are, in increasing degrees of verbosity,
DEBUG,INFO,WARNING,ERROR,CRITICAL(see logging levels).The default is
WARNING.file_log_levelAs per the
log_leveloption, but for the messages written to the log file.The default is
INFO.crossref_lmdb_pathThe path to a directory containing a local LMDB version of the Crossref public data. If present, it will be used to obtain the DOI metadata in preference to a web API call.
The default is to not have a LMDB available.
format_preference_orderFull-text content can be provided in multiple formats, and this option allows the search order for formats to be set. Additionally, formats can be excluded from acquisition by not including them in this list.
The default is
["xml", "pdf", "html", "txt", "tiff"].skip_remaining_formatsThis option sets the approach when a particular format for a DOI has been successfully acquired. If
False, the remaining formats informat_preference_orderare attempted to be acquired; ifTrue, the attempted acquisition of the remaining formats is skipped.The default is
True.extra_handlers_pathA directory from which to import additional publisher handlers. This directory needs to contain one or more
.pyfiles, which are imported after the built-in publisher handlers have been imported.The default is to not have any additional publisher handlers.
metadata_compression_levelLevel to compress the JSON metadata when storing. Set it to
-1to use the default compression level,0for no compression, or a number between1(least compression) to9(most compression).The default is
-1.
Setting the configuration¶
The configuration for doiget-tdm can be set via three ways:
Files in a configuration directory¶
The directory in which doiget-tdm will search for configuration settings varies by platform.
Typical directories are:
- Linux
~/.config/doiget_tdm/- Mac
~/Library/Application Support/doiget_tdm/config/- Windows
~\AppData\Local\doiget-tdm\doiget_tdm\config
A configuration option can be set by creating a file inside the config directory with a name that has the form doiget_tdm_${OPTION} and the contents are the option setting.
For example, the log_level option can be set to WARNING by creating a file called doiget_tdm_log_level that contains the text WARNING.
Within a .env file¶
Configuration settings can be read from a file named .env that is contained in the directory in which doiget-tdm is executed.
This file contains one option per line, in the form DOIGET_TDM_${OPTION}=${VALUE}.
For example, the log_level option can be set to WARNING by having a line in .env that is DOIGET_TDM_LOG_LEVEL=INFO.
Using environment variables¶
Configuration options can be set by using system environment variables.
These follow the same convention as options set using the .env file approach.