Findable data

Placeholder

Lesson plans

A repository (often shortened to "repo") is essentially a dedicated folder or directory where all the files, folders, and history related to a specific project are stored. It is the heart of a version control system like Git, serving two main functions: Storage: It holds the latest, working copy of the project code. History: It records every single change made to those files over time, acting like a time machine for your project. This allows teams to rewind to any previous state, see who changed what, and collaborate safely. * Interfaces for external services like [OAI-PMH](https://www.openarchives.org/pmh/) allow harvesting of metadata for stored records\*\* \*\* * **Background:** * A number of previous projects and working groups have been discussing what a common set of attributes should be to enable FAIR data, and to allow repository stakeholders to make their own decisions about which repository is best for them. Details of these previous efforts are summarised in the [case statement](https://www.rd-alliance.org/group/data-repository-attributes-wg/case-statement/data-repository-attributes-wg-case-statement) of one existing cross-domain, worldwide effort under the auspices of the RDA: the [RDA Data repository attributes Working Group](https://www.rd-alliance.org/groups/data-repository-attributes-wg). Therefore, how FAIR is implemented in a repository, and how each FAIR principle aligns with a particular data attribute, can be discovered from these efforts.

Persistent Identifiers (PIDs): Making Research Findable and Connected

This lesson introduces **Persistent Identifiers (PIDs)** and their role in making research outputs more **findable, accessible, and reliably linked** within the research ecosystem. The session situates PIDs within the **FAIR principles**, focusing especially on their importance for the *Findable* principle. Participants are introduced to widely used PID systems and explore how these identifiers support discovery, citation, attribution, and interoperability in research workflows. Persistent identifiers (PIDs) are globally unique and long-lasting references assigned to digital research objects and entities, such as datasets, publications, software, researchers, and organizations. This lesson introduces the concept and practical use of PIDs within the context of **FAIR data and Open Science practices**. Using a problem-based scenario, participants examine typical issues that arise when persistent identifiers are missing, for example when datasets cannot be found, links no longer work, or authors cannot be uniquely identified. Participants reflect on how such situations affect research transparency, reproducibility, and reuse. The session then introduces commonly used PID systems and demonstrates how they enable reliable identification and linking of research outputs and contributors. The lesson follows an **interactive, problem-based learning approach**. It starts with a short scenario illustrating common issues researchers encounter when persistent identifiers are missing, for example: * When a dataset cited in a publication cannot be found * When the same dataset appears under multiple names or locations * When a dataset link leads to a “404 Page Not Found” error. Participants discuss what may have gone wrong, who is affected by these issues, and how they impact trust, reproducibility, and reuse of research outputs. Building on this discussion, the instructor introduces the concept of PIDs and their role in the FAIR framework. Participants then work in small groups on fictional research cases to identify missing identifiers and determine which PIDs should have been assigned. The session concludes with a short reflection linking PIDs to participants’ own research practices, institutional Open Science policies, and funder requirements.

Metadata standards for findability

## **Topic, definition and scope** This lesson addresses the critical need for rich metadata in the sciences, where complex datasets require detailed context to be truly useful. It centers on the "Findable" aspect of the FAIR principles—specifically principle F2—which mandates that data be described with rich metadata. By exploring the significance of these standards, the lesson plan bridges the gap between broad accessibility and the highly specific needs of domain researchers. The core theme is that effective dataset discovery is not accidental; it is the result of intentional, standardized description that allows both humans and machines to locate relevant biological and biomedical data within vast repositories. In this context, metadata is defined as structured information that describes, explains, locates, or facilitates the retrieval and use of an information resource. The scope of the lesson covers the practical application of metadata from two distinct perspectives: generic standards for broad interoperability and domain-specific standards for granular precision. Participants will learn to assess metadata richness, utilize semantic annotations, and navigate the tools required to create "generous" descriptions. This scope emphasizes that the more comprehensively datasets are described, the more specifically findable they become, allowing for refined searches that go beyond simple keywords to facilitate sophisticated data brokering and machine-actionable validation. ### **Impact for research** The adoption of high-quality metadata standards significantly enhances the visibility and longevity of research outputs. By mastering these concepts, researchers ensure that their datasets are not only archived but are actively discoverable by search engines and aggregators, preventing data isolation. Rich, semantically annotated metadata enables sophisticated query retrieval and facilitates machine-to-machine communication, allowing software agents to validate and process data without human intervention. Ultimately, this streamlining of data brokering and validation accelerates scientific discovery by making it easier for the global community to find, cite, and build upon existing research. ## **FAIR element(s)** * Findable: Data should be available in a discoverable resource (i.e. repository), have appropriate description (i.e. metadata) and have a persistent identifier (PID) * Data are described with rich metadata