This lesson plan has been created with the aim to educate PhD students and researchers on metadata standards using tangible examples and practical activities. It assumes a low level of prior knowledge regarding metadata but assumes experience in research and familiarity with the FAIR principles. Resources can be provided asynchronous to bring everyone up to the same level.
We recommend starting by building up a theoretical baseline of all participants before giving the opportunity for participants to practice and start working directly with metadata, metadata standards and general concepts.
Note that many activities can be done as an individual or a group. This can be changed based on what type of session you are giving. Working in pairs or small groups can increase peer-learning and reduce the threshold for asking questions when confused (as these participants can discuss with their peers rather than having to ask the instructor). Working individually can also be advantageous when doing asynchronous work, hybrid or online sessions or with self-paced study.
Lesson content
Activity: Watch or read summary of what metadata is.
Resources
Purpose: General background information on what metadata is
Activity: Ask participants what they already know about metadata and if they can form a definition. Can be group discussion or think-share-pair
Purpose: Get participants to talk to each other (ice-breaker) and gain insight on prior knowledge
Activity: Short lecture covering basics of metadata, including definition and how it influences findability and relates to FAIR data. Show how metadata may be different when it needs to be interpreted by computers or humans. (slides to be created).
Show examples e.g., manuscript + fields to fill in as well as documentation and description of data
Resources: Slides Interactive poll for formative assessment
Purpose: Get everyone to same level of knowledge on metadata
Activity: Short lecture: Provide examples of metadata (standards) used by researchers in different fields (three: biomedical, social sciences and humanities, and science and technology domains) examples of domain specific metadata standards. If domain specific metadata standards don’t exist, provide some examples about the development/use of custom metadata.
Resources: Examples of published data with good (and bad) metadata, custom metadata fields examples
Purpose: Give tangible examples of metadata and how it is used in research in different fields
Activity: Personal Data “audit” (Richness & Standards)
Participants have 10 minutes to write a description of some of their data (without looking at it). After 10 minutes they can self-assess or peer review with a partner whether their description was sufficient to understand what the data is and how to use it
Resources: Participants should bring a dataset that they have worked with over 6 months ago. Participants should have their own laptop
Purpose: This highlights the need for “generous” description and semantic annotation.
Activity: Free Text vs. Controlled Vocabulary. Ask participants how they would describe the sex of a female mouse in a data spreadsheet. Have them write in a shared document (e.g., google doc) or share with the group. Introduce the concept of ontology terms and semantic annotation maps
Resources: Shared doc. Ontology terms. Participants should have their own laptop
Purpose: You will likely get variations like: “female,” “F,” “Female,” “fem,” “doe.”
Activity: The “Mystery Data” Challenge (Motivation & Concept). Send participants a messy dataset and ask them to spend 5 minutes guessing what it represents.
Resources: Messy dataset e.g., an Excel sheet with ambiguous column headers (e.g., “Temp,” “T,” “Val1”) and no units or context. Participants should have their own laptop
Purpose: They will likely fail or guess incorrectly. This sets the stage for defining metadata as the “missing context” required for understanding
Activity: Findability scavenger hunt. Assign a specific, niche life-sciences query (e.g., “RNA-seq data for Arabidopsis thaliana under drought stress”). Get participants to either search for this using google or domain specific repository.
Resources: Specialised repository e.g. EBI ArrayExpress or NCBI GEO. Participants should have their own laptop
Purpose: Participants should reflect which method was faster and why (i.e., the repository offered filters/facets based on metadata fields like “organism” or “study type”)
Activity: Registry Exploration (Standards Identification). Do a short demo on how to use FAIRsharing and what kind of keywords might be best (starting from specific, moving towards generic). Remind them that keywords are not only scientific fields but also methods and species. Get participants to find a relevant metadata standard to their field.
Resources: Participants should have their own laptop
Purpose: Focus finding minimum requirement/minimum information standards.
Activity: “Facet Filtering Challenge” (Retrieval). Go to a Life Sciences portal (e.g., EBI Search or NCBI Datasets). Challenge: “Find a dataset about Breast Cancer.” (Too many results). Refinement: “Now, use metadata filters to narrow it down to: RNA-Seq data, published within the last 2 years, involving Homo sapiens.” Discuss how structured metadata contributed to easy filtering and searching.
Resources: Participants should have their own laptop. Life sciences portal
Additional resources
- Metadata standards in FAIRsharing arrow_outward
- FAIRsharing’s educational factsheet on standards
- Dataedo Data Cartoon cartoons on e.g. metadata arrow_outward
- FAIR4Software arrow_outward
- RDM 1-day workshop, life science early career arrow_outward
- DMP course ELIXIR Norway arrow_outward
- FAIR course ELIXIR Norway arrow_outward
- The role of metadata in reproducible computational research
- FAIRsharing arrow_outward
- FAIR Cookbook arrow_outward
- RDMkit arrow_outward
- GOFAIR M4M arrow_outward
- Blooms taxonomy arrow_outward
- Self-assessment for FAIR research software arrow_outward
- Software metadata - RSQKit arrow_outward
- DDI Metadata standards presentation arrow_outward