Findable data

Placeholder

Lesson plans

Scientific repositories are databases established to collect, disseminate, and preserve research outputs such as scientific articles, datasets, software, and documentation. By depositing research outputs in repositories, these materials become more easily findable and accessible to others. Depending on policies and regulations, authors can make their work available through Open Access or restricted access. Repositories are quite diverse in scope. They can be general, meaning domain‑agnostic, or focused on specific types of data and research domains, known as subject repositories. They may also be associated with international organisations, institutions, or specific departments. Repositories offer different levels of FAIRness and trustworthiness. It is therefore important to promote awareness of best practices and to guide the scientific community in selecting the most appropriate repository for each type of research output.

Persistent Identifiers: Making Research Findable and Connected

This lesson introduces **Persistent Identifiers (PIDs)** and their role in making research outputs more **findable, accessible, and reliably linked** within the research ecosystem. The session situates PIDs within the **FAIR principles**, focusing especially on their importance for the *Findable* principle. Participants are introduced to widely used PID systems and explore how these identifiers support discovery, citation, attribution, and interoperability in research workflows. Persistent identifiers (PIDs) are globally unique and long-lasting references assigned to digital research objects and entities, such as datasets, publications, software, researchers, and organizations. This lesson introduces the concept and practical use of PIDs within the context of **FAIR data and Open Science practices**. Using a problem-based scenario, participants examine typical issues that arise when persistent identifiers are missing, for example when datasets cannot be found, links no longer work, or authors cannot be uniquely identified. Participants reflect on how such situations affect research transparency, reproducibility, and reuse. The session then introduces commonly used PID systems and demonstrates how they enable reliable identification and linking of research outputs and contributors. The lesson follows an **interactive, problem-based learning approach**. It starts with a short scenario illustrating common issues researchers encounter when persistent identifiers are missing, for example: * When a dataset cited in a publication cannot be found * When the same dataset appears under multiple names or locations * When a dataset link leads to a “404 Page Not Found” error. Participants discuss what may have gone wrong, who is affected by these issues, and how they impact trust, reproducibility, and reuse of research outputs. Building on this discussion, the instructor introduces the concept of PIDs and their role in the FAIR framework. Participants then work in small groups on fictional research cases to identify missing identifiers and determine which PIDs should have been assigned. The session concludes with a short reflection linking PIDs to participants’ own research practices, institutional Open Science policies, and funder requirements.

Metadata standards for findability

## **Topic, definition and scope** This lesson addresses the critical need for rich metadata in the sciences, where complex datasets require detailed context to be truly useful. It centers on the "Findable" aspect of the FAIR principles—specifically principle F2—which mandates that data be described with rich metadata. By exploring the significance of these standards, the lesson plan bridges the gap between broad accessibility and the highly specific needs of domain researchers. The core theme is that effective dataset discovery is not accidental; it is the result of intentional, standardized description that allows both humans and machines to locate relevant biological and biomedical data within vast repositories. In this context, metadata is defined as structured information that describes, explains, locates, or facilitates the retrieval and use of an information resource. The scope of the lesson covers the practical application of metadata from two distinct perspectives: generic standards for broad interoperability and domain-specific standards for granular precision. Participants will learn to assess metadata richness, utilize semantic annotations, and navigate the tools required to create "generous" descriptions. This scope emphasizes that the more comprehensively datasets are described, the more specifically findable they become, allowing for refined searches that go beyond simple keywords to facilitate sophisticated data brokering and machine-actionable validation. ### **Impact for research** The adoption of high-quality metadata standards significantly enhances the visibility and longevity of research outputs. By mastering these concepts, researchers ensure that their datasets are not only archived but are actively discoverable by search engines and aggregators, preventing data isolation. Rich, semantically annotated metadata enables sophisticated query retrieval and facilitates machine-to-machine communication, allowing software agents to validate and process data without human intervention. Ultimately, this streamlining of data brokering and validation accelerates scientific discovery by making it easier for the global community to find, cite, and build upon existing research. ## **FAIR element(s)** * Findable: Data should be available in a discoverable resource (i.e. repository), have appropriate description (i.e. metadata) and have a persistent identifier (PID) * Data are described with rich metadata