Persistent Identifiers (PIDs): Making Research Findable and Connected

Status

FAIR elements

Findability

For this lesson plan, participants should have a foundational understanding of:

Basic knowledge of FAIR principles and what can be done to improve data FAIRness.

After completing this lesson plan, the participants are capable of:

Beginner

Define what a Persistent Identifier (PID) is.

Beginner

Recognize common PID systems used in research (e.g., DOI, ORCID, ROR).

Beginner

Explain the role of PIDs in making research outputs Findable within the FAIR principles.

Beginner

Describe how PIDs help address issues such as broken links, ambiguous authorship, and inaccessible datasets.

Intermediate

Explain the basic structure and syntax of commonly used identifiers (e.g., DOI or ORCID).

Intermediate

Use a PID to locate a research output, dataset, or researcher.

Beginner

Identify which type of PID should be assigned in different research scenarios (e.g., dataset, author, institution).

Expert

Analyze research scenarios to determine how missing or incorrect identifiers affect research visibility, reproducibility, and reuse.

Expert

Apply knowledge of PIDs to identify where they can be used in their own research workflows.

Topic, definition and scope

Persistent identifiers (PIDs) are globally unique, actionable and machine-resolvable strings that act as a long-lasting reference to a digital object (e.g. a dataset).
Six examples of PIDs are included in the table below (more are available within FAIRsharing’s identifier schema sub-registry and within the Bioregistry).
Find out more about standards including PIDs with FAIRsharing’s standards factsheet
Bioregistry catalogues identifier resources (e.g., DOI, ORCID, ROR) that assign PIDs. It stores metadata such as their base URL, local unique identifier regular expression pattern, preferred prefix for semantic web contexts, mappings to other registries (e.g., FAIRsharing, BARTOC), and more. Bioregistry has the benefit of being fully open source and having fully open CC0 data to promote community curation and maintenance.

Full name	Acronym	PID for	Registration	Resolver / base URL	FAIRsharing record
Digital Object Identifier	DOI	Digital objects (e.g. research data, text publication)	Through a DOI Registration Agency	https://dx.doi.org/	https://doi.org/10.25504/FAIRsharing.hFLKCn
Open Researcher Contributor Identification Initiative	ORCID	Scientists (independent of name, institutional and country changes)	Self-registration	https://orcid.org/	https://doi.org/10.25504/FAIRsharing.nx58jg
Research Organization Registry	ROR ID	Research institutions	On request via form: https://docs.google.com/forms/d/e/1FAIpQLSdJYaMTCwS7muuTa-B_CnAtCSkKzt19lkirAKG4u7umH9Nosg/viewform	https://ror.org/	https://doi.org/10.25504/FAIRsharing.f73143
Data Management Plan ID	DMP-ID	Data Management Plans (DMPs)	As a DOI (resourceTypeGeneral = “OutputsManagementPlan”)	https://dx.doi.org/ ?	/
International Generic Sample Number	IGSN ID	Physical objects	Through DataCite	https://www.igsn.org/	https://doi.org/10.25504/FAIRsharing.c7f365
Research Activity Identifier	RAiD	Research projects	API or manual minting?	https://www.igsn.org/	https://doi.org/10.25504/FAIRsharing.dc702a

The benefits of assigning PIDs are numerous:
- Disambiguation (e.g. between two researchers who have the same first and last names, using their ORCID ID)
- Increase research citation and reach of research outputs
- Contribute to making research data FAIR (see section “FAIR element(s)” for more details)
- Permanent identifiability/referencability/linkage of scientific output/people/institutions/funders

FAIR element(s)

(from the FAIR data maturity model: https://doi.org/10.5334/dsj-2020-041)

Findable
- F1 RDA-F1-01M Metadata is identified by a persistent identifier (essential)
- F1 RDA-F1-01D Data is identified by a persistent identifier (essential)
- F3 RDA-F3-01M Metadata includes the identifier of the data (essential)
Accessible
- A1 RDA-A1-03M Metadata identifier resolves to a metadata record (essential)
- A1 RDA-A1-03D Data identifier resolves to a digital object (essential)

Summary of Tasks / Actions

Present an example where disambiguation is needed (e.g. two authors with the same name). Identify additional entities that might benefit from being assigned a PID (e.g. research data, text publication, institutions). Finally, define PIDs together.
Present widely-used PIDs and how their syntax can look like: * DOI * ORCID * ROR * DMP-ID * IGSN ID * RAiD * More are available within FAIRsharing’s identifier schema sub-registry
Examples of use cases * Data repositories
- Example of actionable PID (= resolver + PID):
- Resolver: https://dx.doi.org/
  - DOI: 10.5281/zenodo.3333025
- Resolve to the landing page of the repository showing metadata: https://zenodo.org/record/3333025. “Real” data can be downloaded from this page.
  - Enabling compute workflows (e.g. https://doi.org/10.12688/f1000research.12168.1)
  - Identifying (chunks) of code
Show the importance of PIDs for FAIR data by referring to the FAIR elements mentioned in the section “FAIR element(s)”.
How to use PIDs to access research data and other resources? * Dataset (e.g. DOI: 10.5281/zenodo.3333025) * Text publication (e.g. DOI: 10.5281/zenodo.6674301) * Data management plan (e.g. DOI: 10.5281/zenodo.5995707) * Physical sample (e.g. IGSN ID: AU1243) * Resource descriptions (e.g.Databases, standards, policies) through FAIRsharing DOIs e.g. Dryad https://doi.org/10.25504/FAIRsharing.wkggtx * Organisations (https://ror.org/) - ROR IDs and associated incl metadata and parent-child relationships e.g. Harvard https://ror.org/03vek6s52 * Research project
Explain how to receive a PID for research outputs * Repositories (e.g. Zenodo) * PID minting
Show that proper use of PIDs supports collaboration across facilities, disciplines, institutions and countries. Examples: * Pesant, S. et al. Open science resources for the discovery and analysis of Tara Oceans data. Sci. Data 2:150023 doi: 10.1038/sdata.2015.23 (2015) * Where PIDs are used for terminologies (e.g. ontologies), they allow unambiguous naming/labelling of things, which in term allows for useful/practical data sharing/integrating and, for instance, knowledge graphs.
Provenance and versioning (see R1.2) * Define resource and metadata provenance. * Explain why provenance information is an important aspect of FAIR data. * Find out together how PIDs can contribute to provenance. * Define dataset versioning and dynamic datasets. * Explain how PIDs are used in relation to different versions of a dataset or dynamic datasets. * Versioning exercise.
Introduce PID graphs and explain their importance with a use case (e.g. of use cases can be found here: https://github.com/datacite/freya/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+label%3A%22PID+Graph%22++label%3A%22user+story%22+)

Materials / Equipment

Personal computer
Internet connection
Browser

References

Bobrov E. et al. 2021-10-07. Workshop on Research Data. Berlin University Alliance and ZB MED - Information Centre for Life Sciences. Google Slides.
Cozatl R. et al. _2021-11. _Workshop on Research Data Management. Martin Luther University of Halle-Wittenberg and ZB MED - Information Centre for Life Sciences. Google Slides.
https://support.orcid.org/hc/en-us/articles/360006971013-What-are-persistent-identifiers-PIDs-
https://pidforum.org/t/persistent-identifier-pid-definition/1502
https://doi.org/10.5438/7z70-1155
https://doi.org/10.5438/j22a-5d79
https://www.tib.eu/en/publishing-archiving/pid-service
https://doi.org/10.5334/dsj-2020-041
https://doi.org/10.5281/zenodo.6674301
Staiger C. 2019-11-04. Introduction to persistent identifiers. DTL. PowerPoint Slides (https://doi.org/10.5281/zenodo.3539188)
https://pidforum.org/t/why-use-persistent-identifiers/714
https://www.raid.org.au/
23 identifier schemas registered within FAIRsharing - let us know if any are missing!
FAIRsharing’s educational factsheet about standards

Take home tasks/preparation

Lesson content

Activity

Time

Type

Level

Before the lesson

As an instructor you would have participants read the FAIR Principles FAIR Principles - GO FAIR. This should ideally take place before the lesson takes place.

Individual Activity

During the lesson

The participating groups will each get various data sets to perform the exercise:

Group 1 gets a dataset without a DOI;
Group 2 gets an author without ORCID;
Group 3 gets a date set with author, ORCID, and DOI

Participants experience what can go wrong without PIDs (without necessarily knowing which PIDs was needed).

Group Activity

The participants will learn about:

Demonstrate the importance of PIDs
Emphasize that PIDs are required for data and metadata to be Findable (tie in with FAIR principles)
Show examples of PIDs within different settings (datasets, publications, people, organizations)
Demonstrate how a PID can be created (use an example commonly used within your institute, e.g., Zenodo, DataverseNL, 4TU.ResearchData)

Additionally this activity would help participants reach the following learning outcomes:

#1 Define and recognise PIDs (basic level);

#2 Explain the syntax of widely-used PIDs (basic level);

#3 Explain the different use cases for PIDs (basic level)

#4 Explain the importance of PIDs for FAIR data (basic level)

#5 Use PIDs to access research data or other resources (basic level)

Lecture

The participants will engage and highlighting how PIDs are connected to their research, OS policies, funder requirements, long-term preservation, collaborations with partners, etc.

This activity involves identifying research outputs

Ask participants to think about the types of outputs or entities involved in their research, for example:
- datasets
- publications
- software or code
- protocols or workflows
- researchers (themselves or collaborators)
- institutions or projects
Discuss:
- Which of these outputs already have persistent identifiers?
- Which ones could or should have one but currently do not?

Reflect on collaboration and sharing

How could PIDs help when sharing data with collaborators?
How could they help others find, cite, or reuse their work?
Are there any funder or institutional requirements/guidelines related to PIDs?

Additionally participants will also achieve the level of

#6 Apply PIDs to their own research outputs (intermediate level);

#7 Use PIDs to collaborate with others (intermediate level);

Group exercise

After the lesson

Additional resources

NWO PID strategy arrow_outward Maria Cruz and Tatum Clifford
Get a persistent identifier for your training material arrow_outward Elixir-Europe Training FAIR
ORCID home page arrow_outward
Research Organization Registry (ROR) A global, community-led registry of open persistent identifiers for research and funding organizations arrow_outward ROR Research Organization Registry
DOI Foundation arrow_outward
F-UJI Automated FAIR Data Assessment tool arrow_outward

Allyson Lister

Anne-Françoise Adam-Blondon

Justine Vandendorpe

Mijke Jetten

The terms4FAIRskills project has created a formalised terminology that describes the competencies, skills and knowledge associated with making and keeping data FAIR.

Data steward Data curator Data librarian Data manager researcher	wants competency in	understanding persistent identifiers data discovery data citation
Online documentation	confers competency about	understanding persistent identifiers data discovery data citation
Online documentation	confers knowledge about	persistent identifier citable data
Online documentation	supports implementation of	F1. (meta)data are assigned a globally unique and persistent identifier F3. metadata clearly and explicitly include the identifier of the data they describe A1. (meta)data are retrievable by their identifier using a standardised communications protocol