Topic, definition and scope
- Persistent identifiers (PIDs) are globally unique, actionable and machine-resolvable strings that act as a long-lasting reference to a digital object (e.g. a dataset).
- Six examples of PIDs are included in the table below (more are available within FAIRsharing’s identifier schema sub-registry and within the Bioregistry).
- Find out more about standards including PIDs with FAIRsharing’s standards factsheet
- Bioregistry catalogues identifier resources (e.g., DOI, ORCID, ROR) that assign PIDs. It stores metadata such as their base URL, local unique identifier regular expression pattern, preferred prefix for semantic web contexts, mappings to other registries (e.g., FAIRsharing, BARTOC), and more. Bioregistry has the benefit of being fully open source and having fully open CC0 data to promote community curation and maintenance.
| Full name | Acronym | PID for | Registration | Resolver / base URL | FAIRsharing record |
| Digital Object Identifier | DOI | Digital objects (e.g. research data, text publication) | Through a DOI Registration Agency | https://dx.doi.org/ | https://doi.org/10.25504/FAIRsharing.hFLKCn |
| Open Researcher Contributor Identification Initiative | ORCID | Scientists (independent of name, institutional and country changes) | Self-registration | https://orcid.org/ | https://doi.org/10.25504/FAIRsharing.nx58jg |
| Research Organization Registry | ROR ID | Research institutions | On request via form: https://docs.google.com/forms/d/e/1FAIpQLSdJYaMTCwS7muuTa-B_CnAtCSkKzt19lkirAKG4u7umH9Nosg/viewform | https://ror.org/ | https://doi.org/10.25504/FAIRsharing.f73143 |
| Data Management Plan ID | DMP-ID | Data Management Plans (DMPs) | As a DOI (resourceTypeGeneral = “OutputsManagementPlan”) | https://dx.doi.org/ ? | / |
| International Generic Sample Number | IGSN ID | Physical objects | Through DataCite | https://www.igsn.org/ | https://doi.org/10.25504/FAIRsharing.c7f365 |
| Research Activity Identifier | RAiD | Research projects | API or manual minting? | https://www.igsn.org/ | https://doi.org/10.25504/FAIRsharing.dc702a |
- The benefits of assigning PIDs are numerous:
- Disambiguation (e.g. between two researchers who have the same first and last names, using their ORCID ID)
- Increase research citation and reach of research outputs
- Contribute to making research data FAIR (see section “FAIR element(s)” for more details)
- Permanent identifiability/referencability/linkage of scientific output/people/institutions/funders
FAIR element(s)
(from the FAIR data maturity model: https://doi.org/10.5334/dsj-2020-041)
- Findable
- F1 RDA-F1-01M Metadata is identified by a persistent identifier (essential)
- F1 RDA-F1-01D Data is identified by a persistent identifier (essential)
- F3 RDA-F3-01M Metadata includes the identifier of the data (essential)
- Accessible
- A1 RDA-A1-03M Metadata identifier resolves to a metadata record (essential)
- A1 RDA-A1-03D Data identifier resolves to a digital object (essential)
Summary of Tasks / Actions
- Present an example where disambiguation is needed (e.g. two authors with the same name). Identify additional entities that might benefit from being assigned a PID (e.g. research data, text publication, institutions). Finally, define PIDs together.
- Present widely-used PIDs and how their syntax can look like: * DOI * ORCID * ROR * DMP-ID * IGSN ID * RAiD * More are available within FAIRsharing’s identifier schema sub-registry
- Examples of use cases
* Data repositories
- Example of actionable PID (= resolver + PID):
- Resolver: https://dx.doi.org/
- DOI: 10.5281/zenodo.3333025
- Resolve to the landing page of the repository showing metadata: https://zenodo.org/record/3333025. “Real” data can be downloaded from this page.
- Enabling compute workflows (e.g. https://doi.org/10.12688/f1000research.12168.1)
- Identifying (chunks) of code
- Show the importance of PIDs for FAIR data by referring to the FAIR elements mentioned in the section “FAIR element(s)”.
- How to use PIDs to access research data and other resources? * Dataset (e.g. DOI: 10.5281/zenodo.3333025) * Text publication (e.g. DOI: 10.5281/zenodo.6674301) * Data management plan (e.g. DOI: 10.5281/zenodo.5995707) * Physical sample (e.g. IGSN ID: AU1243) * Resource descriptions (e.g.Databases, standards, policies) through FAIRsharing DOIs e.g. Dryad https://doi.org/10.25504/FAIRsharing.wkggtx * Organisations (https://ror.org/) - ROR IDs and associated incl metadata and parent-child relationships e.g. Harvard https://ror.org/03vek6s52 * Research project
- Explain how to receive a PID for research outputs * Repositories (e.g. Zenodo) * PID minting
- Show that proper use of PIDs supports collaboration across facilities, disciplines, institutions and countries. Examples: * Pesant, S. et al. Open science resources for the discovery and analysis of Tara Oceans data. Sci. Data 2:150023 doi: 10.1038/sdata.2015.23 (2015) * Where PIDs are used for terminologies (e.g. ontologies), they allow unambiguous naming/labelling of things, which in term allows for useful/practical data sharing/integrating and, for instance, knowledge graphs.
- Provenance and versioning (see R1.2) * Define resource and metadata provenance. * Explain why provenance information is an important aspect of FAIR data. * Find out together how PIDs can contribute to provenance. * Define dataset versioning and dynamic datasets. * Explain how PIDs are used in relation to different versions of a dataset or dynamic datasets. * Versioning exercise.
- Introduce PID graphs and explain their importance with a use case (e.g. of use cases can be found here: https://github.com/datacite/freya/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+label%3A%22PID+Graph%22++label%3A%22user+story%22+)
Materials / Equipment
- Personal computer
- Internet connection
- Browser
References
- Bobrov E. et al. 2021-10-07. Workshop on Research Data. Berlin University Alliance and ZB MED - Information Centre for Life Sciences. Google Slides.
- Cozatl R. et al. _2021-11. _Workshop on Research Data Management. Martin Luther University of Halle-Wittenberg and ZB MED - Information Centre for Life Sciences. Google Slides.
- https://support.orcid.org/hc/en-us/articles/360006971013-What-are-persistent-identifiers-PIDs-
- https://pidforum.org/t/persistent-identifier-pid-definition/1502
- https://doi.org/10.5438/7z70-1155
- https://doi.org/10.5438/j22a-5d79
- https://www.tib.eu/en/publishing-archiving/pid-service
- https://doi.org/10.5334/dsj-2020-041
- https://doi.org/10.5281/zenodo.6674301
- Staiger C. 2019-11-04. Introduction to persistent identifiers. DTL. PowerPoint Slides (https://doi.org/10.5281/zenodo.3539188)
- https://pidforum.org/t/why-use-persistent-identifiers/714
- https://www.raid.org.au/
- 23 identifier schemas registered within FAIRsharing - let us know if any are missing!
- FAIRsharing’s educational factsheet about standards
Take home tasks/preparation
- …
- …
Lesson content
As an instructor you would have participants read the FAIR Principles FAIR Principles - GO FAIR. This should ideally take place before the lesson takes place.
The participating groups will each get various data sets to perform the exercise:
- Group 1 gets a dataset without a DOI;
- Group 2 gets an author without ORCID;
- Group 3 gets a date set with author, ORCID, and DOI
Participants experience what can go wrong without PIDs (without necessarily knowing which PIDs was needed).
The participants will learn about:
- Demonstrate the importance of PIDs
- Emphasize that PIDs are required for data and metadata to be Findable (tie in with FAIR principles)
- Show examples of PIDs within different settings (datasets, publications, people, organizations)
- Demonstrate how a PID can be created (use an example commonly used within your institute, e.g., Zenodo, DataverseNL, 4TU.ResearchData)
Additionally this activity would help participants reach the following learning outcomes:
#1 Define and recognise PIDs (basic level);
#2 Explain the syntax of widely-used PIDs (basic level);
#3 Explain the different use cases for PIDs (basic level)
#4 Explain the importance of PIDs for FAIR data (basic level)
#5 Use PIDs to access research data or other resources (basic level)
The participants will engage and highlighting how PIDs are connected to their research, OS policies, funder requirements, long-term preservation, collaborations with partners, etc.
This activity involves identifying research outputs
- Ask participants to think about the types of outputs or entities involved in their research, for example:
- datasets
- publications
- software or code
- protocols or workflows
- researchers (themselves or collaborators)
- institutions or projects
- Discuss:
- Which of these outputs already have persistent identifiers?
- Which ones could or should have one but currently do not?
Reflect on collaboration and sharing
- How could PIDs help when sharing data with collaborators?
- How could they help others find, cite, or reuse their work?
- Are there any funder or institutional requirements/guidelines related to PIDs?
Additionally participants will also achieve the level of
#6 Apply PIDs to their own research outputs (intermediate level);
#7 Use PIDs to collaborate with others (intermediate level);
Additional resources
- NWO PID strategy arrow_outward
- Get a persistent identifier for your training material arrow_outward
- ORCID home page arrow_outward
- Research Organization Registry (ROR) A global, community-led registry of open persistent identifiers for research and funding organizations arrow_outward
- DOI Foundation arrow_outward
- F-UJI Automated FAIR Data Assessment tool arrow_outward