Metadata standards for findability

Status

Ready for review

FAIR elements

For this lesson plan, participants should have a foundational understanding of:

Aimed at researchers and data stewards. Participants do not need a prior knowledge of metadata, but should be familiar with the research process and what FAIR is

After completing this lesson plan, the participants are capable of:

Understand

Understand what metadata is and how it relates to FAIR

Understand

Understand that good and discipline specific metadata enhances findability and FAIRness of data

Evaluate

Evaluate how different standards of metadata influence findability and FAIRness of data

This lesson plan has been created with the aim to educate PhD students and researchers on metadata standards using tangible examples and practical activities. It assumes a low level of prior knowledge regarding metadata but assumes experience in research and familiarity with the FAIR principles. Resources can be provided asynchronous to bring everyone up to the same level.

We recommend starting by building up a theoretical baseline of all participants before giving the opportunity for participants to practice and start working directly with metadata, metadata standards and general concepts.

Note that many activities can be done as an individual or a group. This can be changed based on what type of session you are giving. Working in pairs or small groups can increase peer-learning and reduce the threshold for asking questions when confused (as these participants can discuss with their peers rather than having to ask the instructor). Working individually can also be advantageous when doing asynchronous work, hybrid or online sessions or with self-paced study.

Lesson content

Activity

Time

Type

Level

Before the lesson

Activity: Watch or read summary of what metadata is.

Resources

Purpose: General background information on what metadata is

individual

During the lesson

Activity: Ask participants what they already know about metadata and if they can form a definition. Can be group discussion or think-share-pair

Purpose: Get participants to talk to each other (ice-breaker) and gain insight on prior knowledge

group

Activity: Short lecture covering basics of metadata, including definition and how it influences findability and relates to FAIR data. Show how metadata may be different when it needs to be interpreted by computers or humans. (slides to be created).

Show examples e.g., manuscript + fields to fill in as well as documentation and description of data

Resources: Slides Interactive poll for formative assessment

Purpose: Get everyone to same level of knowledge on metadata

Individual

Activity: Short lecture: Provide examples of metadata (standards) used by researchers in different fields (three: biomedical, social sciences and humanities, and science and technology domains) examples of domain specific metadata standards. If domain specific metadata standards don’t exist, provide some examples about the development/use of custom metadata.

Resources: Examples of published data with good (and bad) metadata, custom metadata fields examples

Purpose: Give tangible examples of metadata and how it is used in research in different fields

Individual

Activity: Personal Data “audit” (Richness & Standards)

Participants have 10 minutes to write a description of some of their data (without looking at it). After 10 minutes they can self-assess or peer review with a partner whether their description was sufficient to understand what the data is and how to use it

Resources: Participants should bring a dataset that they have worked with over 6 months ago. Participants should have their own laptop

Purpose: This highlights the need for “generous” description and semantic annotation.

Individual or group

Activity: Free Text vs. Controlled Vocabulary. Ask participants how they would describe the sex of a female mouse in a data spreadsheet. Have them write in a shared document (e.g., google doc) or share with the group. Introduce the concept of ontology terms and semantic annotation maps

Resources: Shared doc. Ontology terms. Participants should have their own laptop

Purpose: You will likely get variations like: “female,” “F,” “Female,” “fem,” “doe.”

group

Activity: The “Mystery Data” Challenge (Motivation & Concept). Send participants a messy dataset and ask them to spend 5 minutes guessing what it represents.

Resources: Messy dataset e.g., an Excel sheet with ambiguous column headers (e.g., “Temp,” “T,” “Val1”) and no units or context. Participants should have their own laptop

Purpose: They will likely fail or guess incorrectly. This sets the stage for defining metadata as the “missing context” required for understanding

individual or group

Activity: Findability scavenger hunt. Assign a specific, niche life-sciences query (e.g., “RNA-seq data for Arabidopsis thaliana under drought stress”). Get participants to either search for this using google or domain specific repository.

Resources: Specialised repository e.g. EBI ArrayExpress or NCBI GEO. Participants should have their own laptop

Purpose: Participants should reflect which method was faster and why (i.e., the repository offered filters/facets based on metadata fields like “organism” or “study type”)

individual or group

Activity: Registry Exploration (Standards Identification). Do a short demo on how to use FAIRsharing and what kind of keywords might be best (starting from specific, moving towards generic). Remind them that keywords are not only scientific fields but also methods and species. Get participants to find a relevant metadata standard to their field.

Resources: Participants should have their own laptop

Purpose: Focus finding minimum requirement/minimum information standards.

individual or group

Activity: “Facet Filtering Challenge” (Retrieval). Go to a Life Sciences portal (e.g., EBI Search or NCBI Datasets). Challenge: “Find a dataset about Breast Cancer.” (Too many results). Refinement: “Now, use metadata filters to narrow it down to: RNA-Seq data, published within the last 2 years, involving Homo sapiens.” Discuss how structured metadata contributed to easy filtering and searching.

Resources: Participants should have their own laptop. Life sciences portal

group group

Additional resources

Metadata standards in FAIRsharing arrow_outward
FAIRsharing’s educational factsheet on standards
Dataedo Data Cartoon cartoons on e.g. metadata arrow_outward
FAIR4Software arrow_outward
RDM 1-day workshop, life science early career arrow_outward
DMP course ELIXIR Norway arrow_outward
FAIR course ELIXIR Norway arrow_outward
The role of metadata in reproducible computational research
FAIRsharing arrow_outward
FAIR Cookbook arrow_outward
RDMkit arrow_outward
GOFAIR M4M arrow_outward
Blooms taxonomy arrow_outward
Self-assessment for FAIR research software arrow_outward
Software metadata - RSQKit arrow_outward
DDI Metadata standards presentation arrow_outward

Sara El-Gebali

Federico Bianchini

Pascal de Boer

Niek van Ulzen

Naeem Muhammad

Fieke Schoots

Anne-Françoise Adam-Blondon

Naeem Muhammad

The terms4FAIRskills project has created a formalised terminology that describes the competencies, skills and knowledge associated with making and keeping data FAIR.

Data steward Data curator Data librarian Data manager
researcher	wants competency in	knowledge of theories underlying fair implementation data sharing
Online documentation	confers competency about	knowledge of theories underlying fair implementation data sharing choosing the appropriate model or format for your data choosing the appropriate reporting guideline for your data choosing the appropriate terminology for your data
Online documentation	confers knowledge about	metadata standard record standardisation semantic interoperability
Online documentation	supports implementation of	the FAIR Principles