Anatomy of a Semantic DataFragment

Part of the WebMEM Protocol
Location: /specification/sdt/yaml-in-html/structure/
Last Updated: 2025-07-28

Introduction to the SDT DataFragment

A DataFragment is the smallest unit of structured memory in WebMEM. Think of it as a semantic memory tile—a machine-visible statement of fact that’s scoped to a real-world entity (like a healthcare plan or record) and paired with trust-layer metadata.

DataFragments are atomic. They can be retrieved, scored, validated, and reasoned over independently by AI agents and indexing systems.

How Do You Write a DataFragment in HTML?

Start with an inert <template> tag containing YAML. This tag should include:

<template
  data-visibility-fragment
  data-type="text/yaml"
  data-sdt-class="DataFragment"
  data-entity="plan:H5521-290-0"
  data-digest="2025-cms-ma-mapd-plan"
  data-glossary-scope="cms_landscape"
  data-fragment-scope="semantic-digest">

Inside the tag, add the YAML block that declares fragment metadata, trust provenance, and facts.

YAML Header: Declaring Context

The YAML block should begin with a header that mirrors the outer attributes for trust-layer indexing:

data-sdt-class: DataFragment
entity: plan:H5521-290-0
digest: 2025-cms-ma-mapd-plan
glossary_scope: cms_landscape
fragment_scope: semantic-digest

ProvenanceMeta Block: Declaring Source & Trust

This is the trust anchor—used for scoring, citation, and retrieval explanation:

ProvenanceMeta:
  ID: 2025-cms-ma-landscape
  Title: CMS MA Landscape File, 2025
  Description: CMS-published dataset listing Medicare Advantage plans.
  Creator: Centers for Medicare & Medicaid Services (CMS)
  Home: https://www.cms.gov/medicare-health-drug-plans-data
  License: Public Domain
  Published: 2025-06-01
  Retrieved: 2025-06-28
  Digest: 2025-cms-ma-mapd-plan
  Entity: plan:H5521-290-0
  FragmentScope: semantic-digest
  GlossaryScope: cms_landscape

Fields Block: Expressing Facts

Each fact in a DataFragment is a Semantic Data Atom—a field-level claim that is trust-scored, glossary-aligned, and independently citeable:

Fields:
  - id: in_primary
    defined_term: Primary Care Visit
    description: Out-of-pocket cost for a PCP visit
    value: "$0"
    unit: usd
    confidence: high
    derived: false
    glossary: term-in_primary
    source: 2025-cms-pbp
    provenance_ref: "#provenance-meta"

Complex Values in Data Atoms

Most Data Atoms use simple values (strings, numbers, booleans), but more complex values are supported when semantically justified.

Supported Complex Types

Format Type	YAML Example	Use Case
Array	`["H5521-290-0", "H2406-129-0"]`	List of plan IDs (IndexFragment)
Map / Object	`{ "min": "$0", "max": "$50" }`	Derived value ranges
Named Entities	`[{"name": "...", "enrollment": 1234}]`	Top N plans, rich metadata sets

Examples

Plan ID List (IndexFragment):

- id: plans_indexed
  value:
    - H5521-290-0
    - H2406-129-0
    - H3931-129-0

Value Range (DerivedStatsFragment):

- id: moop_range
  value:
    min: "$2,500"
    max: "$8,300"

Top 3 Plans (IndexFragment):

- id: top_enrolled_plans
  value:
    - name: AARP Medicare Advantage
      enrollment: 6823
      type: PPO
    - name: Aetna Medicare Platinum
      enrollment: 3386
      type: HMO

Constraints

Only use complex value: structures when meaningful and reproducible.
Avoid deeply nested objects. Keep fragments explainable.
Prefer named keys over ambiguous JSON blobs or positional lists.

Best Practice

Include a description: and/or glossary reference when using structured values to improve interpretability.

Why It Matters

Retrievability: Each fact can be independently indexed and cited.
Trust: All values are source-backed and traceable.
Transparency: Glossary and provenance clarify meaning and origin.
Modularity: Fragments can be composed into digests or reused across systems.