Tool

    Derived dataset

    Derived dataset

    About derived datasets

    Derived datasets are citable records of GBIF-mediated occurrence data derived either from:

    • a GBIF.org download that has been filtered/reduced significantly, or
    • data accessed through a cloud service, e.g. Microsoft AI for Earth (Azure), or
    • data obtained by any means for which no DOI was assigned, but one is required (e.g. third-party tools accessing the GBIF search API)

    When created, a derived dataset is assigned a unique DOI that can be used to cite the data. To create a derived dataset you will need to authenticate using a GBIF.org account and provide:

    • a title of the dataset,
    • a list of the GBIF datasets (by DOI or datasetKey) from which the data originated, ideally with counts of how many records each dataset contributed,
    • a persistent URL of where the extracted dataset can be accessed,
    • a description of how the dataset was prepared,
    • (optional) the GBIF download DOI, if the dataset is derived from an existing download , and
    • (optional) a date for when the derived dataset should be registered if not immediately .

    You can also use the GBIF API or the rgbif R package to create derived datasets.