Configuration¶
In addition to runtime options when executing doiget-tdm
, as documented in Command-line reference, there are configuration options for doiget-tdm
.
Note
There are also per-publisher configuration options that are described in Available publishers.
The active configuration settings can be shown by running doiget-tdm show-config
.
Options¶
data_dir
The directory to store the acquired metadata and full-text content.
The default is platform-dependent and is determined using the
platformdirs
package. Typical directories are:- Linux
~/.local/share/doiget_tdm/
- Mac
~/Library/Application Support/doiget_tdm/data/
- Windows
~\AppData\Local\doiget_tdm\data
cache_dir
The directory to store temporary but persistent files, such as full-text data archives that span multiple items.
The default is platform-dependent and is determined using the
platformdirs
package. Typical directories are:- Linux
~/.cache/doiget_tdm/
- Mac
~/Library/Caches/doiget_tdm/
- Windows
~\AppData\Local\doiget-tdm\doiget-tdm\Cache
data_dir_n_groups
The acquired data is stored on the filesystem (within
data_dir
) with one directory per DOI. This results in a large number of directories, which can become prohibitively large for particular collections or storage systems. This option inserts an additional layer of directories, with each DOI pseudorandomly allocated to one ofdata_dir_n_groups
subdirectories indata_dir
.The default is to not have this intermediate grouping layer.
email_address
The DOI metadata is typically acquired from the Crossref web API, which asks that users provide their email address as part of the request header.
The default is to not specify an email address.
encryption_passphrase
The full-text content acquired from publishers can optionally be encrypted in
data_dir
, using this passphrase.The default is to store the full-text data unencrypted.
log_level
This sets the level at which messages are printed to the console. The accepted values are, in increasing degrees of verbosity,
DEBUG
,INFO
,WARNING
,ERROR
,CRITICAL
(see logging levels).The default is
WARNING
.file_log_level
As per the
log_level
option, but for the messages written to the log file.The default is
INFO
.crossref_lmdb_path
The path to a directory containing a local LMDB version of the Crossref public data. If present, it will be used to obtain the DOI metadata in preference to a web API call.
The default is to not have a LMDB available.
format_preference_order
Full-text content can be provided in multiple formats, and this option allows the search order for formats to be set. Additionally, formats can be excluded from acquisition by not including them in this list.
The default is
["xml", "pdf", "html", "txt", "tiff"]
.skip_remaining_formats
This option sets the approach when a particular format for a DOI has been successfully acquired. If
False
, the remaining formats informat_preference_order
are attempted to be acquired; ifTrue
, the attempted acquisition of the remaining formats is skipped.The default is
True
.extra_handlers_path
A directory from which to import additional publisher handlers. This directory needs to contain one or more
.py
files, which are imported after the built-in publisher handlers have been imported.The default is to not have any additional publisher handlers.
metadata_compression_level
Level to compress the JSON metadata when storing. Set it to
-1
to use the default compression level,0
for no compression, or a number between1
(least compression) to9
(most compression).The default is
-1
.
Setting the configuration¶
The configuration for doiget-tdm
can be set via three ways:
Files in a configuration directory¶
The directory in which doiget-tdm
will search for configuration settings varies by platform.
Typical directories are:
- Linux
~/.config/doiget_tdm/
- Mac
~/Library/Application Support/doiget_tdm/config/
- Windows
~\AppData\Local\doiget-tdm\doiget_tdm\config
A configuration option can be set by creating a file inside the config directory with a name that has the form doiget_tdm_${OPTION}
and the contents are the option setting.
For example, the log_level
option can be set to WARNING
by creating a file called doiget_tdm_log_level
that contains the text WARNING
.
Within a .env
file¶
Configuration settings can be read from a file named .env
that is contained in the directory in which doiget-tdm
is executed.
This file contains one option per line, in the form DOIGET_TDM_${OPTION}=${VALUE}
.
For example, the log_level
option can be set to WARNING
by having a line in .env
that is DOIGET_TDM_LOG_LEVEL=INFO
.
Using environment variables¶
Configuration options can be set by using system environment variables.
These follow the same convention as options set using the .env
file approach.