Skip to content Skip to footer

Persistent Identifiers (PIDs): Making Research Findable and Connected

Topic, definition and scope

  • Persistent identifiers (PIDs) are globally unique, actionable and machine-resolvable strings that act as a long-lasting reference to a digital object (e.g. a dataset).
  • Six examples of PIDs are included in the table below (more are available within FAIRsharing’s identifier schema sub-registry and within the Bioregistry).
  • Find out more about standards including PIDs with FAIRsharing’s standards factsheet
  • Bioregistry catalogues identifier resources (e.g., DOI, ORCID, ROR) that assign PIDs. It stores metadata such as their base URL, local unique identifier regular expression pattern, preferred prefix for semantic web contexts, mappings to other registries (e.g., FAIRsharing, BARTOC), and more. Bioregistry has the benefit of being fully open source and having fully open CC0 data to promote community curation and maintenance.
Full name Acronym PID for Registration Resolver / base URL FAIRsharing record
Digital Object Identifier DOI Digital objects (e.g. research data, text publication) Through a DOI Registration Agency https://dx.doi.org/ https://doi.org/10.25504/FAIRsharing.hFLKCn
Open Researcher Contributor Identification Initiative ORCID Scientists (independent of name, institutional and country changes) Self-registration https://orcid.org/ https://doi.org/10.25504/FAIRsharing.nx58jg
Research Organization Registry ROR ID Research institutions On request via form: https://docs.google.com/forms/d/e/1FAIpQLSdJYaMTCwS7muuTa-B_CnAtCSkKzt19lkirAKG4u7umH9Nosg/viewform https://ror.org/ https://doi.org/10.25504/FAIRsharing.f73143
Data Management Plan ID DMP-ID Data Management Plans (DMPs) As a DOI (resourceTypeGeneral = “OutputsManagementPlan”) https://dx.doi.org/ ? /
International Generic Sample Number IGSN ID Physical objects Through DataCite https://www.igsn.org/ https://doi.org/10.25504/FAIRsharing.c7f365
Research Activity Identifier RAiD Research projects API or manual minting? https://www.igsn.org/ https://doi.org/10.25504/FAIRsharing.dc702a
  • The benefits of assigning PIDs are numerous:
    • Disambiguation (e.g. between two researchers who have the same first and last names, using their ORCID ID)
    • Increase research citation and reach of research outputs
    • Contribute to making research data FAIR (see section “FAIR element(s)” for more details)
    • Permanent identifiability/referencability/linkage of scientific output/people/institutions/funders

FAIR element(s)

(from the FAIR data maturity model: https://doi.org/10.5334/dsj-2020-041)

  • Findable
    • F1 RDA-F1-01M Metadata is identified by a persistent identifier (essential)
    • F1 RDA-F1-01D Data is identified by a persistent identifier (essential)
    • F3 RDA-F3-01M Metadata includes the identifier of the data (essential)
  • Accessible
    • A1 RDA-A1-03M Metadata identifier resolves to a metadata record (essential)
    • A1 RDA-A1-03D Data identifier resolves to a digital object (essential)

Summary of Tasks / Actions

  1. Present an example where disambiguation is needed (e.g. two authors with the same name). Identify additional entities that might benefit from being assigned a PID (e.g. research data, text publication, institutions). Finally, define PIDs together.
  2. Present widely-used PIDs and how their syntax can look like: * DOI * ORCID * ROR * DMP-ID * IGSN ID * RAiD * More are available within FAIRsharing’s identifier schema sub-registry
  3. Examples of use cases * Data repositories
  4. Show the importance of PIDs for FAIR data by referring to the FAIR elements mentioned in the section “FAIR element(s)”.
  5. How to use PIDs to access research data and other resources? * Dataset (e.g. DOI: 10.5281/zenodo.3333025) * Text publication (e.g. DOI: 10.5281/zenodo.6674301) * Data management plan (e.g. DOI: 10.5281/zenodo.5995707) * Physical sample (e.g. IGSN ID: AU1243) * Resource descriptions (e.g.Databases, standards, policies) through FAIRsharing DOIs e.g. Dryad https://doi.org/10.25504/FAIRsharing.wkggtx * Organisations (https://ror.org/) - ROR IDs and associated incl metadata and parent-child relationships e.g. Harvard https://ror.org/03vek6s52 * Research project
  6. Explain how to receive a PID for research outputs * Repositories (e.g. Zenodo) * PID minting
  7. Show that proper use of PIDs supports collaboration across facilities, disciplines, institutions and countries. Examples: * Pesant, S. et al. Open science resources for the discovery and analysis of Tara Oceans data. Sci. Data 2:150023 doi: 10.1038/sdata.2015.23 (2015) * Where PIDs are used for terminologies (e.g. ontologies), they allow unambiguous naming/labelling of things, which in term allows for useful/practical data sharing/integrating and, for instance, knowledge graphs.
  8. Provenance and versioning (see R1.2) * Define resource and metadata provenance. * Explain why provenance information is an important aspect of FAIR data. * Find out together how PIDs can contribute to provenance. * Define dataset versioning and dynamic datasets. * Explain how PIDs are used in relation to different versions of a dataset or dynamic datasets. * Versioning exercise.
  9. Introduce PID graphs and explain their importance with a use case (e.g. of use cases can be found here: https://github.com/datacite/freya/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+label%3A%22PID+Graph%22++label%3A%22user+story%22+)

Materials / Equipment

  • Personal computer
  • Internet connection
  • Browser

References


Take home tasks/preparation


Lesson content

LO
Activity
Time
Type
Level
Before the lesson
1

As an instructor you would have participants read the FAIR Principles FAIR Principles - GO FAIR. This should ideally take place before the lesson takes place.

15
Individual Activity
During the lesson
3

The participating groups will each get various data sets to perform the exercise:

  • Group 1 gets a dataset without a DOI;
  • Group 2 gets an author without ORCID;
  • Group 3 gets a date set with author, ORCID, and DOI

Participants experience what can go wrong without PIDs (without necessarily knowing which PIDs was needed).

15
Group Activity
1

The participants will learn about:

  • Demonstrate the importance of PIDs
  • Emphasize that PIDs are required for data and metadata to be Findable (tie in with FAIR principles)
  • Show examples of PIDs within different settings (datasets, publications, people, organizations)
  • Demonstrate how a PID can be created (use an example commonly used within your institute, e.g., Zenodo, DataverseNL, 4TU.ResearchData)

Additionally this activity would help participants reach the following learning outcomes:

#1 Define and recognise PIDs (basic level); 

#2 Explain the syntax of widely-used PIDs (basic level);

#3 Explain the different use cases for PIDs (basic level)

#4 Explain the importance of PIDs for FAIR data (basic level)

#5  Use PIDs to access research data or other resources (basic level)

20
Lecture
6

The participants will engage and highlighting how PIDs are connected to their research, OS policies, funder requirements, long-term preservation, collaborations with partners, etc.

This activity involves identifying research outputs

  • Ask participants to think about the types of outputs or entities involved in their research, for example:
    • datasets
    • publications
    • software or code
    • protocols or workflows
    • researchers (themselves or collaborators)
    • institutions or projects
  • Discuss:
    • Which of these outputs already have persistent identifiers?
    • Which ones could or should have one but currently do not?

Reflect on collaboration and sharing

  • How could PIDs help when sharing data with collaborators?
  • How could they help others find, cite, or reuse their work?
  • Are there any funder or institutional requirements/guidelines related to PIDs?

Additionally participants will also achieve the level of

#6 Apply PIDs to their own research outputs (intermediate level);

#7 Use PIDs to collaborate with others (intermediate level);

20
Group exercise
After the lesson