API reference

This provides a reference for the key functionality that is available after import doiget_tdm. For detailed information about the broader functionality in the package, see the source code.

class doiget_tdm.DOI(doi, unquote=False)

A representation of a Digital Object Identifier (DOI).

Parameters:
  • doi (object) – The DOI, as anything that can be converted to a string (via str).

  • unquote (bool) – Converts special characters in doi from ‘quoted’ form (e.g., where the ‘/’ character is represented by ‘%2F’) into ‘unquoted’.

property parts

Decompose a DOI into its prefix and suffix.

Return type:

A namedtuple with prefix and suffix attributes.

Return type:

doiget_tdm.doi.DOIParts

property quoted

A ‘quoted’ version of the DOI in which special characters are replaced.

Return type:

str

static from_url(url, unquote=True)

Create a DOI object from a URL containing a DOI.

Parameters:
  • url (str) – The URL containing the DOI.

  • unquote (bool) – Converts special characters in doi from ‘quoted’ form (e.g., where the ‘/’ character is represented by ‘%2F’) into ‘unquoted’.

Returns:

A DOI object from the extracted DOI.

Return type:

DOI

Notes

  • This uses a regular expression to find the DOI in the URL. This is successful for most but not all DOIs.

Return type:

doiget_tdm.doi.DOI

Parameters:
  • url (str)

  • unquote (bool)

get_group(n_groups)

Determine the ‘group’ to which a DOI belongs.

This assigns a given DOI to a group between 0 and n_groups - 1 based on a hash of its characters.

Parameters:

n_groups (int | None) – The total number of groups that can be assigned.

Returns:

The group as a string of numbers or an empty string if n_groups is None.

Return type:

str

class doiget_tdm.Work(doi)

Representation of a single item of work.

Parameters:

doi (doiget_tdm.doi.DOI) – Item DOI.

doi

Type:    DOI

Item DOI

metadata

Type:    Metadata

Item metadata

fulltext

Type:    FullText

Item full-text

path

Item path in data directory

doiget_tdm.iter_unsorted_works(test_if_valid_work=None)

Iterator through the works in the data directory, in unsorted order.

Parameters:

test_if_valid_work (Callable[[Work], bool] | None) – Function to test whether work is skipped (if returns False).

Returns:

An iterable that yields works within the data directory.

Return type:

typing.Iterable[doiget_tdm.work.Work]