Data/Repository discovery

Status

Ready for review

FAIR elements

Findability
Accessibility
Interoperability
Reusability

There are no prerequisites defined for this lesson plan.

After completing this lesson plan, the participants are capable of:

Explain why data discovery is important and how researchers **Find** and **Reuse** data that they do not create themselves

Recognize new ways to discover data (i.e: visualisation, semantic, annotation, ….etc): Importance of metadata and semantic annotations for data findability, importance of reusability for data annotation, importance of data crosslink

Develop a strategy to search for data and link it with the research data lifecycle.

Extract datasets and build their own work on them.

Search for data in different resources and identify the differences among them

Recognize the purpose for data citation and the relation with the FAIR

Recognize the purpose for data licence and the relation with FAIR

How to cite and licence your data

Topic, definition and scope

“Everyone has the right to share in scientific advancement and its benefits”
Article 27, Universal Declaration of Human Rights
Data discovery is a process of understanding data and extracting valuable insight from multiple data streams according to data uses and purposes.

Image: https://phaidra.univie.ac.at/download/o:1201054

FAIR element(s)

Findable: Data should be available in a discoverable resource (i.e. repository), have appropriate description (i.e. metadata) and have a persistent identifier (PID)
Accessible: Data should be retrievable and understandable for both humans and machines
Interoperable: Machines and humans can interpret and use the data in different settings and will be able to distinguish the metadata from the data file
Reusable: The ultimate goal of FAIR is to advance the reuse of data in the future research and allow integration with other compatible data sources.

Summary of Tasks / Actions

Discussing reproducibility: why FAIR principles are important for data discovery?
How do you search for data? See also the FAIRsharing educational factsheet for databases
- Speaking about the process of data discovery, from developing a clear picture of the data to evaluating data quality.
- Use lesson plan in (Unit 1: Topic 3: Data Life Cycle approach to FAIR/FAIR right from the start) to go through the data life cycle in the following scenario.

Research data cycle

Present a researcher’s story in any life science field and set up a search strategy. The story can be something like:

“A Bio-Chemistry researcher needs some enzymology data for a research question: how enzymes are key factors to increase the rate of metabolism in the human body?”

How did the researcher discover and access such data?
Did the researcher list the characteristics of the data you want to discover
Evaluate the quality of data
Check the terms and conditions of access and use

Let’s take the scenario above and look for any type of data you are interested about (e.g.‘mitochondrial beta-oxidation”) in different data sources:

OpenAIRE - Research Graph

[OpenAIRE

Open Access](https://explore.openaire.eu/search/find?resultbestaccessright=%22Open%2520Access%22&fv0=miksa&f0=q&active=result)

DataCite
Re3data.org
Dataset Search (google.com)
FAIRsharing

Of these resources, * Which one provided the most relevant data for your search terms? Which one provides facilities to refine your search ( i.e. filters)? * Try to search for more detailed search terms. How did the search results improve? * Is there a citation clarification for your selected data?Are there any differences in citation clarification between these data sources? * Can you find a licence for selected data? Is there any clarification how the data can be reused?
How can data resources make data more discoverable by linking data to publications?
- Cross-linking between journal publications and data repositories: a selection of examples
- Service for data resources: Europe PMC external links service
Identifying innovative search tools for data discovery: demo on how to find the data behind a publication using Europe PMC, a literature database.
- Finding the data behind the publication with Europe PMC
- Discovering data using Europe PMC SciLite annotations
Citation, licences and copyrights help to clarify the “R” in the FAIR principles.
- How to understand database conditions and attributes when choosing a repository (FAIRsharing documentation)
- How to licence data (openaire.eu)
- [How to Cite Datasets and Link to Publications DCC](https://www.dcc.ac.uk/guidance/how-guides/cite-datasets)

Materials / Equipment

Internet and browser
https://europepmc.org/

Take home tasks/preparation

Hands-on exercise: Find the data behind a publication of your interest using Europe PMC and answer the questions:
- Could you find the data citation on the publication?
- Is the data linked to the data repository?
- Could you access the data? Is the data format machine-readable?
- Could you easily find the licensing for the data of interest?
- How do you believe the use of FAIR principles contributed for your data discovery?

Additional resources

The FAIR Guiding Principles for scientific data management and stewardship arrow_outward
FAIR data arrow_outward Swedish University of Agricultural Sciences
Lost or Found? Discovering Data Needed for Research arrow_outward Kathleen Gregory, Paul Groth, Andrea Scharnhorst, Sally Wyatt
GOFAIR Discovery Implementation Network arrow_outward
What is Data Mining arrow_outward techtarget.com
Discover - Data Management Expert Guide arrow_outward CESSDA ERIC
Citing your data - Data Management Expert Guide arrow_outward CESSDA ERIC
Data reuse and the open data citation advantage arrow_outward Heather A. Piwowar, Todd J. Vision

Jolanda Strubel

Saskia Lawson-Tovey

Anne-Françoise Adam-Blondon

The terms4FAIRskills project has created a formalised terminology that describes the competencies, skills and knowledge associated with making and keeping data FAIR.

Data steward Data manager researcher	wants competency in	data discovery data citation
Online documentation	confers competency about	data discovery data citation
Online documentation	confers knowledge about	repository citable data data lifecycle metadata access
Online documentation	supports implementation of	the FAIR Principles