Command-line reference

Interact with Crossref data via a Lightning database.

usage: crossref-lmdb [-h] [--debug] {create,update} ...

Positional Arguments

command

Possible choices: create, update

Named Arguments

--debug

Print error tracebacks.

Default: False

Sub-commands

create

Create a Lightning database from Crossref public data.

crossref-lmdb create [-h] --public-data-dir PUBLIC_DATA_DIR --db-dir DB_DIR
                     [--start-from-file-num START_FROM_FILE_NUM]
                     [--commit-frequency COMMIT_FREQUENCY]
                     [--compression-level {-1,0,1,2,3,4,5,6,7,8,9}]
                     [--filter-path FILTER_PATH]
                     [--show-progress | --no-show-progress]
                     [--max-db-size-gb MAX_DB_SIZE_GB]

Named Arguments

--public-data-dir

Path to the Crossref public data directory.

--db-dir

Path to the directory to write the database files.

--start-from-file-num

Begin processing from this file number in the public data archive.

Default: 0

--commit-frequency

How often to commit changes to the database, in units of number of items.

Default: 20000

--compression-level

Possible choices: -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Level of compression to use for metadata; 0 is no compression, -1 is the default level of compression (6), and between 1 and 9 is the level where 1 is the least and 9 is the most.

Default: -1

--filter-path

Path to a Python module file containing a function for filtering DOIs. This function must be called filter_func and accept one parameter, which contains a dict-like interface to item metadata. The function returns False if the item is to be filtered out and True otherwise.

--show-progress, --no-show-progress

Enable or disable a progress bar. (default: True)

Default: True

--max-db-size-gb

Maximum size that the database can grow to, in GB units. Note that this is set to a smaller default on Windows (2 GB), due to it pre-allocating space.

Default: 2000

update

Update a Lighting database with new data from the web API.

crossref-lmdb update [-h] [--commit-frequency COMMIT_FREQUENCY]
                     [--compression-level {-1,0,1,2,3,4,5,6,7,8,9}]
                     [--filter-path FILTER_PATH]
                     [--show-progress | --no-show-progress]
                     [--max-db-size-gb MAX_DB_SIZE_GB] --db-dir DB_DIR
                     --email-address EMAIL_ADDRESS [--from-date FROM_DATE]
                     [--filter-arg FILTER_ARG]

Named Arguments

--commit-frequency

How often to commit changes to the database, in units of number of items.

Default: 20000

--compression-level

Possible choices: -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Level of compression to use for metadata; 0 is no compression, -1 is the default level of compression (6), and between 1 and 9 is the level where 1 is the least and 9 is the most.

Default: -1

--filter-path

Path to a Python module file containing a function for filtering DOIs. This function must be called filter_func and accept one parameter, which contains a dict-like interface to item metadata. The function returns False if the item is to be filtered out and True otherwise.

--show-progress, --no-show-progress

Enable or disable a progress bar. (default: True)

Default: True

--max-db-size-gb

Maximum size that the database can grow to, in GB units. Note that this is set to a smaller default on Windows (2 GB), due to it pre-allocating space.

Default: 2000

--db-dir

Path to the directory containing the LMDB database files.

--email-address

Email address to provide to the Crossref web API so as to be able to use the polite pool.

--from-date

A date from which to search for updated records, specified in YYYY[-MM[-DD]] format (i.e., month and day are optional).

--filter-arg

A Crossref web API filter string for restricting DOIs.