Skip to content Skip to footer

Data/Repository discovery

Topic, definition and scope

  • “Everyone has the right to share in scientific advancement and its benefits”
    Article 27, Universal Declaration of Human Rights
  • Data discovery is a process of understanding data and extracting valuable insight from multiple data streams according to data uses and purposes.

Image: https://phaidra.univie.ac.at/download/o:1201054


FAIR element(s)

  • Findable: Data should be available in a discoverable resource (i.e. repository), have appropriate description (i.e. metadata) and have a persistent identifier (PID)
  • Accessible: Data should be retrievable and understandable for both humans and machines
  • Interoperable: Machines and humans can interpret and use the data in different settings and will be able to distinguish the metadata from the data file
  • Reusable: The ultimate goal of FAIR is to advance the reuse of data in the future research and allow integration with other compatible data sources.

Summary of Tasks / Actions

  • Discussing reproducibility: why FAIR principles are important for data discovery?
  • How do you search for data? See also the FAIRsharing educational factsheet for databases

Research data cycle

  • Present a researcher’s story in any life science field and set up a search strategy. The story can be something like:

“A Bio-Chemistry researcher needs some enzymology data for a research question: how enzymes are key factors to increase the rate of metabolism in the human body?”

  • How did the researcher discover and access such data?
  • Did the researcher list the characteristics of the data you want to discover
  • Evaluate the quality of data
  • Check the terms and conditions of access and use

  • Let’s take the scenario above and look for any type of data you are interested about (e.g.‘mitochondrial beta-oxidation”) in different data sources:
  • Of these resources, * Which one provided the most relevant data for your search terms? Which one provides facilities to refine your search ( i.e. filters)? * Try to search for more detailed search terms. How did the search results improve? * Is there a citation clarification for your selected data?Are there any differences in citation clarification between these data sources? * Can you find a licence for selected data? Is there any clarification how the data can be reused?

  • How can data resources make data more discoverable by linking data to publications?
  • Identifying innovative search tools for data discovery: demo on how to find the data behind a publication using Europe PMC, a literature database.
  • Citation, licences and copyrights help to clarify the “R” in the FAIR principles.
    • How to understand database conditions and attributes when choosing a repository (FAIRsharing documentation)
    • How to licence data (openaire.eu)
    • [How to Cite Datasets and Link to Publications DCC](https://www.dcc.ac.uk/guidance/how-guides/cite-datasets)

Materials / Equipment


Take home tasks/preparation

  • Hands-on exercise: Find the data behind a publication of your interest using Europe PMC and answer the questions:
    • Could you find the data citation on the publication?
    • Is the data linked to the data repository?
    • Could you access the data? Is the data format machine-readable?
    • Could you easily find the licensing for the data of interest?
    • How do you believe the use of FAIR principles contributed for your data discovery?