Skip to content Skip to footer

Data Repositories and FAIR

Topic, definition and scope

A repository (often shortened to “repo”) is essentially a dedicated folder or directory where all the files, folders, and history related to a specific project are stored.

It is the heart of a version control system like Git, serving two main functions:

Storage: It holds the latest, working copy of the project code.

History: It records every single change made to those files over time, acting like a time machine for your project. This allows teams to rewind to any previous state, see who changed what, and collaborate safely.

  • Interfaces for external services like OAI-PMH allow harvesting of metadata for stored records** **
  • Background:
    • A number of previous projects and working groups have been discussing what a common set of attributes should be to enable FAIR data, and to allow repository stakeholders to make their own decisions about which repository is best for them. Details of these previous efforts are summarised in the case statement of one existing cross-domain, worldwide effort under the auspices of the RDA: the RDA Data repository attributes Working Group. Therefore, how FAIR is implemented in a repository, and how each FAIR principle aligns with a particular data attribute, can be discovered from these efforts.

Summary of Tasks / Actions

  • Check lists
  • Analyze and Discover one Public repository. This could be a repository like Zenodo or DataverseNL.
  • Is any of these above mentioned repositories a good option for data FAIRification?
  • Do you know of any challenges and how to remedy them?
  • Assess data FAIRness using F-UJI in different repositories and data resources and explain the differences among them:
  • Go through FAIR_principles_translation_SNSF_logo (snf.ch) sheet to get get familiar with FAIR requirements that can be be fulfilled by the repository ( keep in mind that not all requirements are manageable by the repositories, some are Researcher’s responsibility!)
  • Repositories and their implemented data standards (as well as the data policies that recommend their use) can all be discovered in FAIRsharing (documentation on searching within FAIRsharing).
  • Use Case:
    • Here is a link to the FAIRsharing documentation page that created specifically for this lesson plan; navigation of FAIRsharing to discover suitable resources for a particular researcher’s use case (a user story emerge of a multi-omics RA who is using the library services to help them figure out how to implement FAIR according to a articular funder’s data policy).

    https://fairsharing.gitbook.io/fairsharing/how-to/unsure-where-to-start

  • Match the following requirements to their corresponding FAIR principle/sub-principles:
    • A “form” needs to be filled –metadata by default.
    • A persistent identifier for the data is automatically generated.
    • References to other data or metadata can be included.
    • Access can be regulated from closed to open.
    • The use of standards and controlled vocabularies is enforced.
    • A DOI is issued to every published record.
    • The form complies with a specific metadata standard (DataCite)
    • Metadata contains the PID
    • Create a user account in a repository.
FAIR Principle FAIR Sub-Principle FAIR implementation in a Repository
Findable F1: (meta)data are assigned a globally unique and persistent identifier
F2: data are described with rich metadata (defined by R1 below)
F3: metadata clearly and explicitly include the identifier of the data it describes
F4: (meta)data are registered or indexed in a searchable resource
Accessable A1: (meta)data are retrievable by their identifier using a standardised communications protocol
A1.1: the protocol is open, free, and universally implementable
A1.2: the protocol allows for an authentication and authorization procedure, where necessary
A2: metadata are accessible, even when the data are no longer available
Interoperable I1: (meta)data uses a formal, accessible, shared, and broadly applicable language for knowledge representation.
I3: (meta)data include qualified references to other (meta)data
I2: (meta)data use vocabularies that follow FAIR principles
Reusable R1: (meta)data are richly described with a plurality of accurate and relevant attributes
R1.1: (meta)data are released with a clear and accessible data usage licence
R1.2: (meta)data are associated with detailed provenance
R1.3: (meta)data meet domain-relevant community standards

Materials / Equipment


References


Take home tasks/preparation

  • Test a repository with FAIRification of one data using the above Handout
  • Think about an example similar to what we explained in the above use case; of how to find what a particular role (e.g. Data Steward) needs in FAIRsharing.

    For example, start with a requirement they have, e.g. a funder data policy, and move them step-by-step from that data policy to a shortlist of standards and/or databases that they will need to align with and/or submit to. This example has now been written here: https://fairsharing.gitbook.io/fairsharing/how-to/unsure-where-to-start