Skip to content Skip to footer

Overview

Topic

This lesson plan will help to better understand what ontologies and vocabularies are, how they relate to the FAIR principles, and why they are important for research. At the end of this lesson, participants will be able to define and distinguish ontologies and vocabularies, understand how they are used in repositories, explain their role in metadata organisation, use ontology lookup services, and evaluate the process of creating an ontology or vocabulary.


Added value

Ontologies and vocabularies enhance standardisation and foster a shared understanding between people and machines, thereby increasing the semantic interoperability of digital resources. By providing a common framework for organising knowledge, they promote the discovery of relevant scientific information, improve data mining and analysis, and support the development of intelligent systems.

References


Content - Details & information for the activities of the lesson plan

Introductory presentation

Provide a clear overview of what ontologies and vocabularies are, how they differ, and why they matter for structuring and enriching metadata, using examples tailored to the audience’s domain (e.g., biology, social sciences, cultural heritage). Explain how ontologies support consistency, interoperability, and meaningful metadata. The lecture should explain semantic heterogeneity and show examples of how vocabularies and ontologies solve it.

Information to include in the presentation:

1. Why metadata needs semantic structure - short explanation of why metadata alone is not enough

  • Metadata can be ambiguous without standardised terms (e.g., “mouse” the animal vs. “mouse” the device).
  • Different disciplines may use different labels for the same concept (semantic heterogeneity)
  • Machines cannot make assumptions - semantic structure makes data understandable, linkable, and interoperable.
  • Controlled vocabularies and ontologies enable FAIR-aligned metadata (Interoperable & Reusable)
  • Additional resources: FAIR Principles

2. Semantic heterogeneity

  • Different, independently developed, or used data systems represent the same real-world concept with different meanings, interpretations, or terminology. 
  • Ontologies help solve the problem by (i) defining the relations in terms such as is-a, part-of, causes, measured-by, (ii) enabling semantic search (e.g. find all participants that are born in the Netherlands), (iii) supporting data integration across disciplines and repositories, (iv) allowing machines to infer new knowledge through automated reasoning
  • Possible examples: Synonyms: “physician” vs. “doctor”, Different levels of detail: “disease” vs. “infectious disease”, Domain-specific variation: “culture” (biology) vs. “culture” (anthropology), Different coding systems: ICD-11 vs. SNOMED CT

3. From vocabulary to ontology - increasing structure increases machine interpretability of data

  • Figure 1 – From Vocabulary to Ontology Diagram illustrating Flat list → Hierarchical taxonomy → Ontology with relationships Key Concepts: A controlled vocabulary is a predetermined list of terms used consistently to describe data elements. A taxonomy organises terms hierarchically (e.g., broader and narrower concepts), supporting classification and navigation. An ontology formally describes concepts and their interrelationships. Ontologies facilitate machine-readable representations of knowledge and support automated reasoning, semantic querying, and data integration. Together, these tools tackle semantic heterogeneity, the issue where different systems or communities use different terms to describe similar concepts.
  • Figure 2 – Semantic Interoperability Challenge Illustration showing:  Dataset A (uses free text),  Dataset B (uses standard codes),  Mapping layer using ontology
  • Additional resources: Ontology pipeline

4. How to find and explore vocabularies/ontologies - prepare the learners for the practical activities

  • There are look-up tools, e.g. BioPortal, Ontology Lookup Service, FAIRSharing, Linked Open Vocabularies, OBO Foundry, Wikidata
  • Good vocabulary or ontology follows best practices

Additional resources: W3C Best practices for publishing vocabularies, OBO Best practices

Workshop: Searching for Ontology Terms Using the BioPortal API

Implementation Routes

The workshop can be adapted to different audience levels. The following three routes vary in depth, technical complexity, and learning focus. Educators may select the route that best matches the participants’ experience level and available time.

Route 1 – Beginner

Target audience: Early career researchers, PhD students, research support staff

Focus: Understanding how ontologies can be accessed programmatically and why persistent identifiers are important for semantic interoperability.

Outcome emphasis: Primarily Understanding ★, light Application ★★

Suggested duration: 60–90 minutes

Structure

Concept introduction (15 min, lecture)

Ontology lookup services allow researchers to find standardised concepts and identifiers that describe data consistently across systems.

Instead of manually browsing ontology portals, these services can also be accessed programmatically using an API (Application Programming Interface).

An API enables software systems to communicate with each other in a structured way. The BioPortal API allows users to send search queries and retrieve ontology concepts, their definitions, and their persistent identifiers.

A typical API request includes three components:

Request URI

Example search query: http://data.bioontology.org/search?q=diabetes

The parameter q=diabetes instructs the API to search for ontology concepts related to the term diabetes.

Headers

The BioPortal API requires an authentication header containing the user’s API key:

Authorization: apikey YOUR_API_KEY

The API key identifies the user and allows controlled access to the service.

Response format

The API returns results in JSON (JavaScript Object Notation), a structured data format commonly used for exchanging data between systems.

Example BioPortal API Response (simplified)

Below is a simplified example of a result returned for the query “diabetes”:

{   “collection”: [ {   “prefLabel”: “Diabetes mellitus”,   “@id”: “http://purl.bioontology.org/ontology/SNOMEDCT/73211009”,   “definition”: [     “A metabolic disease characterised by hyperglycaemia resulting from defects in insulin secretion, insulin action, or both.”   ],   “ontology”: “SNOMEDCT” }   ] }

Interpreting the Response

Key elements in the JSON response include:

Field Meaning
prefLabel Preferred human-readable name of the concept
@id Persistent identifier (URI) for the ontology concept
definition Formal description of the concept
ontology Source ontology providing the concept

 The URI is the most important element for semantic interoperability. 

Instead of storing free-text descriptions such as “diabetes”, datasets can store the URI:

http://purl.bioontology.org/ontology/SNOMEDCT/73211009

This ensures that the concept’s meaning remains consistent, unambiguous, and machine-interpretable across systems.

Figure X – BioPortal API Workflow

Caption: Programmatic access to ontology terms enables scalable semantic annotation.

API exploration (20 min, instructor demonstration)

The instructor demonstrates a simple API request.

Example:

curl -X GET “http://data.bioontology.org/search?q=diabetes”  -H “Authorization: apikey YOUR_API_KEY”

Participants observe the returned JSON response and identify key elements such as:

  • preferred term label
  • ontology source
  • concept URI

Discussion questions:

  • What information does the API return?
  • Which ontology defines the concept?
  • Why is the URI important for interoperability?

Guided exploration (25 min, hands-on activity)

Participants perform ontology searches using one of the following tools:

  • curl (command line)
  • Postman
  • Hoppscotch or another browser-based API client

Participants search for several terms such as:

  • diabetes
  • asthma
  • hypertension

For each concept they identify:

  • preferred label
  • ontology source
  • concept URI

Participants record results in a table:

| Term | Preferred Label | Ontology | URI | | —- | ————— | ——– | — |

Practical exercise – dataset annotation (20 min)

Participants annotate a simple dataset using ontology identifiers.

Example dataset:

ID Diagnosis
1 diabetes
2 asthma
3 hypertension

Task:

  1. Use the BioPortal API to search for each diagnosis.
  2. Retrieve the corresponding ontology URI.
  3. Add a new column Ontology_URI.

Result:

| ID | Diagnosis | Ontology_URI | | – | ——— | ————- |

Discussion questions:

  • Why is storing URIs better than free-text labels?
  • How does this improve FAIR interoperability?
  • What risks arise when automatically selecting the first search result?

Reflection discussion (10 min)

Participants discuss:

  • challenges encountered when selecting ontology terms
  • how incorrect annotations may affect data integration
  • when new ontology terms may need to be requested

The discussion should emphasise validation, documentation, and governance of ontology mappings.

Route 2 – Intermediate

Target audience: Data stewards, research software engineers, and experienced researchers

Focus: Applying ontology lookup workflows and analysing how semantic identifiers can be integrated into research data pipelines.

Outcome emphasis: Application ★★, Analysis ★★, some Evaluation ★★★

Suggested duration: 2–2.5 hours

Structure

Concept recap (10 min)

Brief overview of semantic interoperability and the role of shared vocabularies.

Explain how ontology APIs support:

  • automated metadata annotation
  • dataset harmonisation
  • semantic data integration

API exploration exercise (30 min)

Participants experiment with ontology searches using the BioPortal API.

Example query:

curl -X GET “http://data.bioontology.org/search?q=diabetes” -H “Authorization: apikey YOUR_API_KEY”

Participants explore:

  • different search terms
  • restricting results to specific ontologies

Example: http://data.bioontology.org/search?q=diabetes\&ontologies=SNOMEDCT

Participants analyse differences between ontologies and discuss concept ambiguity.

Python implementation exercise (45 min)

Participants run a Python script to retrieve ontology identifiers.

Example:

import requests

API_KEY = "YOUR_API_KEY"

url = "http://data.bioontology.org/search"

params = {   "q": "diabetes",   "ontologies": "SNOMEDCT" }

headers = {   "Authorization": f"apikey {API_KEY}" }

response = requests.get(url, params=params, headers=headers)

data = response.json()

for result in data["collection"]:   print("Label:", result["prefLabel"])   print("URI:", result["@id"])

Participants modify the script to query different concepts. 

Dataset annotation exercise (40 min)

Participants annotate a dataset using ontology URIs retrieved via the API.

They compare results and discuss differences in ontology selection.

 Challenge discussion (20 min)

Discussion topics:

  • ambiguous concepts
  • missing ontology terms
  • ontology selection criteria
  • documentation of mapping decisions

Route 3 – Advanced

Target audience: Senior data stewards, infrastructure architects, interoperability specialists

Focus: Evaluating ontology lookup strategies and designing scalable semantic annotation workflows.

Outcome emphasis: Evaluation and strategic reasoning ★★★

Suggested duration: Half-day workshop

Structure

Rapid framing (10 min)

Discuss challenges of implementing semantic interoperability at scale.

Topics include:

  • automated annotation
  • ontology versioning
  • governance of semantic mappings

 Advanced API exploration (45 min)

Participants explore advanced BioPortal API usage:

  • Comparing ontology coverage
  • retrieving additional metadata
  • analysing concept hierarchies

Pipeline design exercise (60 min)

Participants design a conceptual workflow integrating ontology lookup into a data pipeline.

Example workflow:

Dataset → Concept extraction → BioPortal API query → URI retrieval → Semantic annotation → Data integration

Groups propose validation and documentation steps.

Governance simulation (45 min)

Participants discuss governance questions such as:

  • who approves ontology mappings
  • how mapping decisions are documented
  • how ontology updates are managed

Strategic reflection (30 min)

Participants reflect on trade-offs between:

  • automation and manual validation
  • scalability and precision
  • ontology reuse and extension

Lesson content

LO
Activity
Time
Type
Level
Before the lesson
1

Have participants read the FAIR Cookbook’s Introducing the FAIR Principles to get an idea of what the FAIR principles entail.

20 min
Individual exercise
1

There should also be an activity to ensure learners are familiar with metadata, since it is not addressed in this lesson plan.

20 min
Individual exercise
During the lesson
1

Introductory presentation

Present what ontologies and vocabularies are, how they differ, and why they matter for structuring and enriching metadata. Show how semantic heterogeneity arises and demonstrate how vocabularies and ontologies address it.

15-30 min
Lecture
4

Demo of ontology search

Demonstrate how to look up ontology terms using an ontology lookup platform (e.g., OLS, BioPortal), showing preferred labels, URIs, and ontology sources.

15 min
Demo
4

Look up relevant ontologies

In pairs or trios, have participants search for relevant ontology terms for their extracted concepts across multiple ontology platforms and compare the results.

30-45 min
Group exercise
3

Build vocabulary

Using selected concepts, groups identify essential terms in their domain, resolve duplicates, clarify definitions, and group terms into a structured vocabulary.

30 min
Group exercise
4

Link the vocabulary to existing standards

Ask participants to link their vocabulary terms to existing ontology concepts using URIs or identifiers where available.

60 min
Group exercise
4

Harmonize terminologies

Provide small datasets with different terminologies and have groups harmonise them by mapping terms to shared ontology identifiers.

30-40 min
Group exercise
After the lesson
5

Reflection on applying ontologies in practice 

Invite participants to reflect on how ontology use could influence their own research workflows or metadata practices.

15-20 min
Group discussion
5

Closure and Q\&A

10 min
Discussion
5

Reflection on ontology challenges

Facilitate a discussion about challenges in selecting ontology terms, semantic ambiguity, and gaps in existing vocabularies.

15-20
Group discussion