Part of the WebMEM Protocol
Location: /specification/sdt/yaml-in-html/classes/datafragment/
Last Updated: 2025-07-28
Overview
A DataFragment is the most fundamental unit in the Semantic Data Template system. It represents a single structured fact or data point, typically extracted directly from a trusted dataset without transformation.
Each DataFragment is scoped to a specific entity and includes metadata such as source, description, glossary term alignment, and provenance. Unlike a DerivedStatsFragment, which reflects calculated or summarized values, a DataFragment is atomic, raw, and truth-preserving — making it ideal for fragment-level retrieval, citation, and trust scoring.
It serves as the building block for more complex memory structures within SDP and GTP.
Required Provenance Fields
| Field | Description |
|---|---|
ID |
Unique identifier for the provenance block. Used by provenance_ref. |
Title |
Human-readable name of the dataset or source. |
Description |
Explanation of the dataset’s content and purpose. |
Creator |
Entity or organization that produced the dataset (e.g., CMS). |
Home |
Canonical homepage or documentation URL for the dataset. |
License |
Usage rights declaration (e.g., Public Domain, CC0). |
Published |
ISO 8601 date the dataset was first released. |
Retrieved |
ISO 8601 date the data was accessed or imported. |
Digest |
ID for the fragment set this belongs to. |
Entity |
Subject the data applies to (e.g., plan ID). |
FragmentScope |
Declares the publishing or semantic scope of the fragment. |
Optional Provenance Fields
| Field | Description |
|---|---|
Format |
File format of the dataset (e.g., CSV, JSON, ZIP). |
Scope |
Dataset classification: single-source, multi-source, derived. |
Archive |
Direct link to a downloadable copy of the dataset. |
Guidelines |
URL pointing to regulatory or interpretive guidance. |
GlossaryScope |
Glossary namespace for term alignment. |
Year |
Calendar year of dataset release. |
Version |
Release tag (e.g., v2025.1). |
Checksum |
Cryptographic hash for source integrity. |
Tags |
List of freeform tags for classification and filtering. |
Default fragment_scope
This fragment class defaults to semantic-digest unless otherwise scoped. It is used to group fragments within the Semantic Digest Protocol (SDP) and governs export routing, AI retrievability, and memory-layer assignment.
Full Example: DataFragment Template
<template
id="fragment-h5521-290-0-primary-care"
data-visibility-fragment
data-sdt-class="DataFragment"
data-type="text/yaml"
data-entity="plan:H5521-290-0"
data-digest="2025-cms-ma-mapd-plan"
data-glossary-scope="cms_landscape"
data-fragment-scope="semantic-digest">
# YAML Header
data-sdt-class: DataFragment
entity: plan:H5521-290-0
digest: 2025-cms-ma-mapd-plan
glossary_scope: cms_landscape
fragment_scope: semantic-digest
# Semantic Data Atom
Fields:
- id: in_primary
defined_term: Primary Care Visit
glossary: term-in_primary
description: Out-of-pocket cost for a PCP visit.
value: "$0"
unit: usd
confidence: high
derived: false
provenance_ref: "#provenance-meta"
# Provenance Block
ProvenanceMeta:
ID: 2025-cms-ma-landscape
Title: CMS MA Landscape File, 2025
Description: CMS-published dataset listing all approved Medicare Advantage plans and their service areas by contract ID and plan ID.
Creator: Centers for Medicare & Medicaid Services (CMS)
Home: https://www.cms.gov/medicare-health-drug-plans-data
Archive: https://www.cms.gov/files/zip/2025-ma-landscape.zip
License: Public Domain
Published: 2025-06-01
Retrieved: 2025-06-28
Digest: 2025-cms-ma-mapd-plan
Entity: plan:H5521-290-0
FragmentScope: semantic-digest
GlossaryScope: cms_landscape
Format: ZIP (XLSX)
Scope: single-dataset
Version: v2025.1
Year: 2025
Checksum: b3f6f5d98b32a3c68a73397f3495c9ec
Tags:
- cms
- medicare-advantage
- landscape
Guidelines: https://www.cms.gov/medicare/health-drug-plans/managed-care-marketing/medicare-guidelines
</template>