Skip to content Skip to footer

Metadata standards for findability

This lesson plan has been created with the aim to educate PhD students and researchers on metadata standards using tangible examples and practical activities. It assumes a low level of prior knowledge regarding metadata but assumes experience in research and familiarity with the FAIR principles. Resources can be provided asynchronous to bring everyone up to the same level. 

We recommend starting by building up a theoretical baseline of all participants before giving the opportunity for participants to practice and start working directly with metadata, metadata standards and general concepts. 

Note that many activities can be done as an individual or a group. This can be changed based on what type of session you are giving. Working in pairs or small groups can increase peer-learning and reduce the threshold for asking questions when confused (as these participants can discuss with their peers rather than having to ask the instructor). Working individually can also be advantageous when doing asynchronous work, hybrid or online sessions or with self-paced study.

Lesson content

LO
Activity
Time
Type
Level
Before the lesson
1

Activity: Watch or read summary of what metadata is.

Resources

Purpose: General background information on what metadata is

60
individual
During the lesson
1

Activity: Ask participants what they already know about metadata and if they can form a definition. Can be group discussion or think-share-pair

Purpose: Get participants to talk to each other (ice-breaker) and gain insight on prior knowledge

10
group
1

Activity: Short lecture covering basics of metadata, including definition and how it influences findability and relates to FAIR data. Show how metadata may be different when it needs to be interpreted by computers or humans. (slides to be created).

Show examples e.g., manuscript + fields to fill in as well as documentation and description of data

Resources: Slides Interactive poll for formative assessment

Purpose: Get everyone to same level of knowledge on metadata

25
Individual
2

Activity: Short lecture: Provide examples of metadata (standards) used by researchers in different fields (three: biomedical, social sciences and humanities, and science and technology domains) examples of domain specific metadata standards. If domain specific metadata standards don’t exist, provide some examples about the development/use of custom metadata.

Resources: Examples of published data with good (and bad) metadata, custom metadata fields examples

Purpose: Give tangible examples of metadata and how it is used in research in different fields

15
Individual
2

Activity: Personal Data “audit” (Richness & Standards)

Participants have 10 minutes to write a description of some of their data (without looking at it). After 10 minutes they can self-assess or peer review with a partner whether their description was sufficient to understand what the data is and how to use it

Resources: Participants should bring a dataset that they have worked with over 6 months ago. Participants should have their own laptop

Purpose: This highlights the need for “generous” description and semantic annotation.

30
Individual or group
3

Activity: Free Text vs. Controlled Vocabulary. Ask participants how they would describe the sex of a female mouse in a data spreadsheet. Have them write in a shared document (e.g., google doc) or share with the group. Introduce the concept of ontology terms and semantic annotation maps

Resources: Shared doc. Ontology terms. Participants should have their own laptop

Purpose: You will likely get variations like: “female,” “F,” “Female,” “fem,” “doe.”

20
group
2

Activity: The “Mystery Data” Challenge (Motivation & Concept). Send participants a messy dataset and ask them to spend 5 minutes guessing what it represents.

Resources: Messy dataset e.g., an Excel sheet with ambiguous column headers (e.g., “Temp,” “T,” “Val1”) and no units or context. Participants should have their own laptop

Purpose: They will likely fail or guess incorrectly. This sets the stage for defining metadata as the “missing context” required for understanding

15
individual or group
3

Activity: Findability scavenger hunt. Assign a specific, niche life-sciences query (e.g., “RNA-seq data for Arabidopsis thaliana under drought stress”). Get participants to either search for this using google or domain specific repository.

Resources: Specialised repository e.g. EBI ArrayExpress or NCBI GEO. Participants should have their own laptop

Purpose: Participants should reflect which method was faster and why (i.e., the repository offered filters/facets based on metadata fields like “organism” or “study type”)

30
individual or group
2

Activity: Registry Exploration (Standards Identification). Do a short demo on how to use FAIRsharing and what kind of keywords might be best (starting from specific, moving towards generic). Remind them that keywords are not only scientific fields but also methods and species. Get participants to find a relevant metadata standard to their field.

Resources: Participants should have their own laptop

Purpose: Focus finding minimum requirement/minimum information standards.

20
individual or group
2

Activity: “Facet Filtering Challenge” (Retrieval). Go to a Life Sciences portal (e.g., EBI Search or NCBI Datasets). Challenge: “Find a dataset about Breast Cancer.” (Too many results). Refinement: “Now, use metadata filters to narrow it down to: RNA-Seq data, published within the last 2 years, involving Homo sapiens.” Discuss how structured metadata contributed to easy filtering and searching.

Resources: Participants should have their own laptop. Life sciences portal

20
group group